Complete API Documentation
Logging API'S
1. Library init call - Cmf()
This calls initiates the library and also creates a pipeline object with the name provided.
Arguments to be passed CMF:
cmf = cmf.Cmf(filename="mlmd", pipeline_name="Test-env")
# Returns a Context object of mlmd.proto.Context
Argument |
Type |
Description |
filename |
String |
Path to the sqlite file to store the metadata |
pipeline_name |
String |
Name to uniquely identify the pipeline. Note that name is the unique identification for a pipeline. If a pipeline already exists with the same name, the existing pipeline object is reused |
custom_properties |
Dictionary (Optional) |
Additional properties of the pipeline that needs to be stored |
graph |
Bool (Optional) |
If set to true, the library also stores the relationships in the provided graph database. Following environment variables should be set: NEO4J_URI, NEO4J_USER_NAME, NEO4J_PASSWD |
Return Object: mlmd.proto.Context
Attribute |
Type |
Description |
create_time_since_epoch |
int64 |
Creation timestamp |
custom_properties |
repeated CustomPropertiesEntry |
Custom properties |
id |
int64 |
Unique identifier |
last_update_time_since_epoch |
int64 |
Last update timestamp |
name |
string |
Context name |
properties |
repeated PropertiesEntry |
Properties |
type |
string |
Context type |
type_id |
int64 |
Type identifier |
2. create_context - Creates a Stage with properties
A pipeline may include multiple stages. A unique name should be provided for every Stage in a pipeline.
Arguments to be passed CMF:
context = cmf.create_context(pipeline_stage="Prepare", custom_properties={"user-metadata1":"metadata_value"})
Argument |
Type |
Description |
pipeline_stage |
String |
Name of the pipeline Stage |
custom_properties |
Dictionary (Optional) |
Key value pairs of additional properties of the stage that needs to be stored |
Return Object: mlmd.proto.Context
Attribute |
Type |
Description |
create_time_since_epoch |
int64 |
Creation timestamp |
custom_properties |
repeated CustomPropertiesEntry |
Custom properties |
id |
int64 |
Unique identifier |
last_update_time_since_epoch |
int64 |
Last update timestamp |
name |
string |
Context name |
properties |
repeated PropertiesEntry |
Properties |
type |
string |
Context type |
type_id |
int64 |
Type identifier |
3. create_execution - Creates an Execution with properties
A stage can have multiple executions. A unique name should ne provided for exery execution.
Properties of the execution can be paased as key value pairs in the custom properties. Eg: The hyper parameters used for the execution can be passed.
execution = cmf.create_execution(execution_type="Prepare",
custom_properties={"Split": split, "Seed": seed})
# execution_type: String - Name of the execution
# custom_properties: Dictionary (Optional Parameter)
# Returns: Execution object of type mlmd.proto.Execution
Argument |
Type |
Description |
execution_type |
String |
Name of the execution |
custom_properties |
Dictionary (Optional) |
Additional properties for the execution |
Return Object: mlmd.proto.Execution
Attribute |
Type |
Description |
create_time_since_epoch |
int64 |
Creation timestamp |
custom_properties |
repeated CustomPropertiesEntry |
Custom properties |
id |
int64 |
Unique identifier |
last_known_state |
State |
Last known execution state |
last_update_time_since_epoch |
int64 |
Last update timestamp |
name |
string |
Execution name |
properties |
repeated PropertiesEntry |
Properties (Git_Repo, Context_Type, Git_Start_Commit, Pipeline_Type, Context_ID, Git_End_Commit, Execution Command, Pipeline_id) |
type |
string |
Execution type |
type_id |
int64 |
Type identifier |
4. log_dataset - Logs a Dataset and its properties
Tracks a Dataset and its version. The version of the dataset is automatically obtained from the versioning software(DVC) and tracked as a metadata.
artifact = cmf.log_dataset("/repo/data.xml", "input", custom_properties={"Source": "kaggle"})
Argument |
Type |
Description |
url |
String |
The path to the dataset |
event |
String |
Takes arguments INPUT or OUTPUT |
custom_properties |
Dictionary |
The Dataset properties |
Return Object: mlmd.proto.Artifact
Attribute |
Type |
Description |
create_time_since_epoch |
int64 |
Creation timestamp |
custom_properties |
repeated CustomPropertiesEntry |
Custom properties |
id |
int64 |
Unique identifier |
last_update_time_since_epoch |
int64 |
Last update timestamp |
name |
string |
Artifact name |
properties |
repeated PropertiesEntry |
Properties (Commit, Git_Repo) |
state |
State |
Artifact state |
type |
string |
Artifact type |
type_id |
int64 |
Type identifier |
uri |
string |
Artifact URI |
5. log_model - Logs a model and its properties.
cmf.log_model(path="path/to/model.pkl",
event="output",
model_framework="SKlearn",
model_type="RandomForestClassifier",
model_name="RandomForestClassifier:default")
# Returns an Artifact object of type mlmd.proto.Artifact
Argument |
Type |
Description |
path |
String |
Path to the model file |
event |
String |
Takes arguments INPUT or OUTPUT |
model_framework |
String |
Framework used to create model |
model_type |
String |
Type of Model Algorithm used |
model_name |
String |
Name of the Algorithm used |
custom_properties |
Dictionary |
The model properties |
Return Object: mlmd.proto.Artifact
Attribute |
Type |
Description |
create_time_since_epoch |
int64 |
Creation timestamp |
custom_properties |
repeated CustomPropertiesEntry |
Custom properties |
id |
int64 |
Unique identifier |
last_update_time_since_epoch |
int64 |
Last update timestamp |
name |
string |
Artifact name |
properties |
repeated PropertiesEntry |
Properties (commit, model_framework, model_type, model_name) |
state |
State |
Artifact state |
type |
string |
Artifact type |
type_id |
int64 |
Type identifier |
uri |
string |
Artifact URI |
6. log_execution_metrics Logs the metrics for the execution
cmf.log_execution_metrics(metrics_name="Training_Metrics", custom_properties={"auc": auc, "loss": loss})
Arguments |
|
metrics_name |
String Name to identify the metrics |
custom_properties |
Dictionary Metrics |
7. log_metrics Logs the per Step metrics for fine grained tracking
The metrics provided is stored in a parquet file. The commit_metrics call add the parquet file in the version control framework.
The metrics written in the parquet file can be retrieved using the read_metrics call
# Can be called at every epoch or every step in the training.
# This is logged to a parquet file and committed at the commit stage.
while True: # Inside training loop
metawriter.log_metric("training_metrics", {"loss": loss})
metawriter.commit_metrics("training_metrics")
Arguments for log_metric |
|
metrics_name |
String Name to identify the metrics |
custom_properties |
Dictionary Metrics |
Arguments for commit_metrics |
|
metrics_name |
String Name to identify the metrics |
8. create_dataslice
This helps to track a subset of the data. Currently supported only for file abstractions.
For eg- Accuracy of the model for a slice of data(gender, ethnicity etc)
dataslice = cmf.create_dataslice("slice-a")
Arguments for create_dataslice |
|
name |
String Name to identify the dataslice |
Returns a Dataslice object |
|
9. add_data Adds data to a dataslice.
Currently supported only for file abstractions.
Pre condition - The parent folder, containing the file should already be versioned.
dataslice.add_data("data/raw_data/" + str(j) + ".xml")
Arguments |
|
name |
String Name to identify the file to be added to the dataslice |
10. Dataslice Commit - Commits the created dataslice
The created dataslice is versioned and added to underneath data versioning softwarre