Skip to content

cmflib.cmf.Cmf.DataSlice

cmflib.cmf.Cmf.DataSlice(name, writer)

A data slice represents a named subset of data. It can be used to track performance of an ML model on different slices of the training or testing dataset splits. This can be useful from different perspectives, for instance, to mitigate model bias.

Instances of data slices are not meant to be created manually by users. Instead, use Cmf.create_dataslice method.

add_data(path, custom_properties=None)

Add data to create the dataslice. Currently supported only for file abstractions. Pre-condition - the parent folder, containing the file should already be versioned.

dataslice.add_data(f"data/raw_data/{j}.xml)

Parameters:

Name Type Description Default
path str

Name to identify the file to be added to the dataslice.

required
custom_properties Optional[Dict]

Properties associated with this datum.

None

commit(custom_properties=None)

Commit the dataslice. The created dataslice is versioned and added to underneath data versioning software.

dataslice.commit()

Parameters:

Name Type Description Default
custom_properties Optional[Dict]

Dictionary to store key value pairs associated with Dataslice

None
Example{"mean"

2.5, "median":2.6}

required