cmflib.cmf.Cmf.DataSlice¶
cmflib.cmf.Cmf.DataSlice(name, writer)
¶
A data slice represents a named subset of data. It can be used to track performance of an ML model on different slices of the training or testing dataset splits. This can be useful from different perspectives, for instance, to mitigate model bias.
Instances of data slices are not meant to be created manually by users. Instead, use Cmf.create_dataslice method.
add_data(path, custom_properties=None)
¶
Add data to create the dataslice. Currently supported only for file abstractions. Pre-condition - the parent folder, containing the file should already be versioned.
dataslice.add_data(f"data/raw_data/{j}.xml)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
str
|
Name to identify the file to be added to the dataslice. |
required |
custom_properties
|
Optional[Dict]
|
Properties associated with this datum. |
None
|
commit(custom_properties=None)
¶
Commit the dataslice. The created dataslice is versioned and added to underneath data versioning software.
dataslice.commit()
Parameters:
Name | Type | Description | Default |
---|---|---|---|
custom_properties
|
Optional[Dict]
|
Dictionary to store key value pairs associated with Dataslice |
None
|
Example{"mean"
|
2.5, "median":2.6} |
required |