Skip to content

Complete API Documentation

Logging API'S

1. Library init call - Cmf()

This calls initiates the library and also creates a pipeline object with the name provided.
Arguments to be passed CMF:

cmf = cmf.Cmf(filename="mlmd", pipeline_name="Test-env")

# Returns a Context object of mlmd.proto.Context

Argument Type Description
filename String Path to the sqlite file to store the metadata
pipeline_name String Name to uniquely identify the pipeline. Note that name is the unique identification for a pipeline. If a pipeline already exists with the same name, the existing pipeline object is reused
custom_properties Dictionary (Optional) Additional properties of the pipeline that needs to be stored
graph Bool (Optional) If set to true, the library also stores the relationships in the provided graph database. Following environment variables should be set: NEO4J_URI, NEO4J_USER_NAME, NEO4J_PASSWD

Return Object: mlmd.proto.Context

Attribute Type Description
create_time_since_epoch int64 Creation timestamp
custom_properties repeated CustomPropertiesEntry Custom properties
id int64 Unique identifier
last_update_time_since_epoch int64 Last update timestamp
name string Context name
properties repeated PropertiesEntry Properties
type string Context type
type_id int64 Type identifier

2. create_context - Creates a Stage with properties

A pipeline may include multiple stages. A unique name should be provided for every Stage in a pipeline.
Arguments to be passed CMF:

context = cmf.create_context(pipeline_stage="Prepare", custom_properties={"user-metadata1":"metadata_value"})

Argument Type Description
pipeline_stage String Name of the pipeline Stage
custom_properties Dictionary (Optional) Key value pairs of additional properties of the stage that needs to be stored

Return Object: mlmd.proto.Context

Attribute Type Description
create_time_since_epoch int64 Creation timestamp
custom_properties repeated CustomPropertiesEntry Custom properties
id int64 Unique identifier
last_update_time_since_epoch int64 Last update timestamp
name string Context name
properties repeated PropertiesEntry Properties
type string Context type
type_id int64 Type identifier

3. create_execution - Creates an Execution with properties

A stage can have multiple executions. A unique name should ne provided for exery execution. Properties of the execution can be paased as key value pairs in the custom properties. Eg: The hyper parameters used for the execution can be passed.

execution = cmf.create_execution(execution_type="Prepare",
                                 custom_properties={"Split": split, "Seed": seed})

# execution_type: String - Name of the execution
# custom_properties: Dictionary (Optional Parameter)
# Returns: Execution object of type mlmd.proto.Execution

Argument Type Description
execution_type String Name of the execution
custom_properties Dictionary (Optional) Additional properties for the execution

Return Object: mlmd.proto.Execution

Attribute Type Description
create_time_since_epoch int64 Creation timestamp
custom_properties repeated CustomPropertiesEntry Custom properties
id int64 Unique identifier
last_known_state State Last known execution state
last_update_time_since_epoch int64 Last update timestamp
name string Execution name
properties repeated PropertiesEntry Properties (Git_Repo, Context_Type, Git_Start_Commit, Pipeline_Type, Context_ID, Git_End_Commit, Execution Command, Pipeline_id)
type string Execution type
type_id int64 Type identifier

4. log_dataset - Logs a Dataset and its properties

Tracks a Dataset and its version. The version of the dataset is automatically obtained from the versioning software(DVC) and tracked as a metadata.

artifact = cmf.log_dataset("/repo/data.xml", "input", custom_properties={"Source": "kaggle"})

Argument Type Description
url String The path to the dataset
event String Takes arguments INPUT or OUTPUT
custom_properties Dictionary The Dataset properties

Return Object: mlmd.proto.Artifact

Attribute Type Description
create_time_since_epoch int64 Creation timestamp
custom_properties repeated CustomPropertiesEntry Custom properties
id int64 Unique identifier
last_update_time_since_epoch int64 Last update timestamp
name string Artifact name
properties repeated PropertiesEntry Properties (Commit, Git_Repo)
state State Artifact state
type string Artifact type
type_id int64 Type identifier
uri string Artifact URI

5. log_model - Logs a model and its properties.

cmf.log_model(path="path/to/model.pkl",
              event="output",
              model_framework="SKlearn",
              model_type="RandomForestClassifier",
              model_name="RandomForestClassifier:default")

# Returns an Artifact object of type mlmd.proto.Artifact
Argument Type Description
path String Path to the model file
event String Takes arguments INPUT or OUTPUT
model_framework String Framework used to create model
model_type String Type of Model Algorithm used
model_name String Name of the Algorithm used
custom_properties Dictionary The model properties

Return Object: mlmd.proto.Artifact

Attribute Type Description
create_time_since_epoch int64 Creation timestamp
custom_properties repeated CustomPropertiesEntry Custom properties
id int64 Unique identifier
last_update_time_since_epoch int64 Last update timestamp
name string Artifact name
properties repeated PropertiesEntry Properties (commit, model_framework, model_type, model_name)
state State Artifact state
type string Artifact type
type_id int64 Type identifier
uri string Artifact URI

6. log_execution_metrics Logs the metrics for the execution

cmf.log_execution_metrics(metrics_name="Training_Metrics", custom_properties={"auc": auc, "loss": loss})
Arguments
metrics_name String Name to identify the metrics
custom_properties Dictionary Metrics

7. log_metrics Logs the per Step metrics for fine grained tracking

The metrics provided is stored in a parquet file. The commit_metrics call add the parquet file in the version control framework. The metrics written in the parquet file can be retrieved using the read_metrics call

# Can be called at every epoch or every step in the training.
# This is logged to a parquet file and committed at the commit stage.
while True:  # Inside training loop
    metawriter.log_metric("training_metrics", {"loss": loss})
metawriter.commit_metrics("training_metrics")

Arguments for log_metric
metrics_name String Name to identify the metrics
custom_properties Dictionary Metrics
Arguments for commit_metrics
metrics_name String Name to identify the metrics

8. create_dataslice

This helps to track a subset of the data. Currently supported only for file abstractions. For eg- Accuracy of the model for a slice of data(gender, ethnicity etc)

dataslice = cmf.create_dataslice("slice-a")

Arguments for create_dataslice
name String Name to identify the dataslice
Returns a Dataslice object

9. add_data Adds data to a dataslice.

Currently supported only for file abstractions. Pre condition - The parent folder, containing the file should already be versioned.

dataslice.add_data("data/raw_data/" + str(j) + ".xml")

Arguments
name String Name to identify the file to be added to the dataslice

10. Dataslice Commit - Commits the created dataslice

The created dataslice is versioned and added to underneath data versioning softwarre

dataslice.commit()