cmflib.cmf.Cmf.DataSlice¶
A data slice represents a named subset of data. It can be used to track performance of an ML model on different slices of the training or testing dataset splits. This can be useful from different perspectives, for instance, to mitigate model bias.
Instances of data slices are not meant to be created manually by users. Instead, use Cmf.create_dataslice method.
Source code in cmflib/cmf.py
1806 1807 1808 1809 |
|
add_data(path, custom_properties=None)
¶
Add data to create the dataslice. Currently supported only for file abstractions. Pre-condition - the parent folder, containing the file should already be versioned. Example:
#dataslice.add_data(f"data/raw_data/{j}.xml)
Source code in cmflib/cmf.py
1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 |
|
commit(custom_properties=None)
¶
Commit the dataslice. The created dataslice is versioned and added to underneath data versioning software. Example:
dataslice.commit()
```
Args: custom_properties: Dictionary to store key value pairs associated with Dataslice Example{"mean":2.5, "median":2.6}
Source code in cmflib/cmf.py
1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 |
|