cmflib.cmf.Cmf.DataSlice¶
A data slice represents a named subset of data. It can be used to track performance of an ML model on different slices of the training or testing dataset splits. This can be useful from different perspectives, for instance, to mitigate model bias.
Instances of data slices are not meant to be created manually by users. Instead, use Cmf.create_dataslice method.
Source code in cmflib/cmf.py
1688 1689 1690 1691 |
|
add_data(path, custom_properties=None)
¶
Add data to create the dataslice. Currently supported only for file abstractions. Pre-condition - the parent folder, containing the file should already be versioned. Example:
dataslice.add_data(f"data/raw_data/{j}.xml)
Source code in cmflib/cmf.py
1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 |
|
commit(custom_properties=None)
¶
Commit the dataslice. The created dataslice is versioned and added to underneath data versioning software. Example:
dataslice.commit()
```
Args: custom_properties: Dictionary to store key value pairs associated with Dataslice Example{"mean":2.5, "median":2.6}
Source code in cmflib/cmf.py
1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 |
|