Skip to content

cmflib.cmfquery.CmfQuery

cmflib.cmfquery.CmfQuery(filepath='mlmd', is_server=False)

Bases: object

CMF Query communicates with the MLMD database and implements basic search and retrieval functionality.

This class has been designed to work with the CMF framework. CMF alters names of pipelines, stages and artifacts in various ways. This means that actual names in the MLMD database will be different from those originally provided by users via CMF API. When methods in this class accept name parameters, it is expected that values of these parameters are fully-qualified names of respective entities.

Parameters:

Name Type Description Default
filepath str

Path to the MLMD database file.

'mlmd'

get_pipeline_names()

Return names of all pipelines.

Returns:

Type Description
List[str]

List of all pipeline names.

get_pipeline_id(pipeline_name)

Return pipeline identifier for the pipeline names pipeline_name.

Parameters:

Name Type Description Default
pipeline_name str

Name of the pipeline.

required

Returns:

Type Description
int

Pipeline identifier or -1 if one does not exist.

get_pipeline_stages(pipeline_name)

Return list of pipeline stages for the pipeline with the given name.

Parameters:

Name Type Description Default
pipeline_name str

Name of the pipeline for which stages need to be returned. In CMF, there are no different pipelines with the same name.

required

Returns:

Type Description
List[str]

List of stage names associated with the given pipeline.

get_all_exe_in_stage(stage_name)

Return list of all executions for the stage with the given name.

Parameters:

Name Type Description Default
stage_name str

Name of the stage. Before stages are recorded in MLMD, they are modified (e.g., pipeline name will become part of the stage name). So stage names from different pipelines will not collide.

required

Returns:

Type Description
List[Execution]

List of executions for the given stage.

get_all_executions_by_ids_list(exe_ids)

Return executions for given execution ids list as a pandas data frame.

Parameters:

Name Type Description Default
exe_ids List[int]

List of execution identifiers.

required

Returns:

Type Description
DataFrame

Data frame with all executions for the list of given execution identifiers.

get_all_artifacts_by_context(pipeline_name)

Return artifacts for given pipeline name as a pandas data frame.

Parameters:

Name Type Description Default
pipeline_name str

Name of the pipeline.

required

Returns:

Type Description
DataFrame

Data frame with all artifacts associated with given pipeline name.

get_all_artifacts_by_ids_list(artifact_ids)

Return all artifacts for the given artifact ids list.

Parameters:

Name Type Description Default
artifact_ids List[int]

List of artifact identifiers

required

Returns:

Type Description
DataFrame

Data frame with all artifacts for the given artifact ids list.

get_all_executions_in_stage(stage_name)

Return executions of the given stage as pandas data frame.

Parameters:

Name Type Description Default
stage_name str

Stage name. See doc strings for the prev method.

required

Returns:

Type Description
DataFrame

Data frame with all executions associated with the given stage.

get_artifact_df(artifact, d=None)

Return artifact's data frame representation.

Parameters:

Name Type Description Default
artifact Artifact

MLMD entity representing artifact.

required
d Optional[Dict]

Optional initial content for data frame.

None

Returns:

Type Description
DataFrame

A data frame with the single row containing attributes of this artifact.

get_all_artifacts()

Return names of all artifacts.

Returns:

Type Description
List[str]

List of all artifact names.

get_artifact(name)

Return artifact's data frame representation using artifact name.

Parameters:

Name Type Description Default
name str

Artifact name.

required

Returns:

Type Description
Optional[DataFrame]

Pandas data frame with one row containing attributes of this artifact.

get_all_artifacts_for_execution(execution_id)

Return input and output artifacts for the given execution.

Parameters:

Name Type Description Default
execution_id int

Execution identifier.

required

Returns:

Type Description
DataFrame

Data frame containing input and output artifacts for the given execution, one artifact per row.

get_all_artifact_types()

Return names of all artifact types.

Returns:

Type Description
List[str]

List of all artifact types.

get_all_executions_for_artifact(artifact_name)

Return executions that consumed and produced given artifact.

Parameters:

Name Type Description Default
artifact_name str

Artifact name.

required

Returns:

Type Description
DataFrame

Pandas data frame containing stage executions, one execution per row.

get_one_hop_child_artifacts(artifact_name, pipeline_id=None)

Get artifacts produced by executions that consume given artifact.

Parameters:

Name Type Description Default
artifact name

Name of an artifact.

required

Returns:

Type Description
DataFrame

Output artifacts of all executions that consumed given artifact.

get_all_child_artifacts(artifact_name)

Return all downstream artifacts starting from the given artifact.

Parameters:

Name Type Description Default
artifact_name str

Artifact name.

required

Returns:

Type Description
DataFrame

Data frame containing all child artifacts.

get_one_hop_parent_artifacts(artifact_name)

Return input artifacts for the execution that produced the given artifact.

Parameters:

Name Type Description Default
artifact_name str

Artifact name.

required

Returns:

Type Description
DataFrame

Data frame containing immediate parent artifact of given artifact.

get_all_parent_artifacts(artifact_name)

Return all upstream artifacts.

Parameters:

Name Type Description Default
artifact_name str

Artifact name.

required

Returns:

Type Description
DataFrame

Data frame containing all parent artifacts.

get_all_parent_executions(artifact_name)

Return all executions that produced upstream artifacts for the given artifact.

Parameters:

Name Type Description Default
artifact_name str

Artifact name.

required

Returns:

Type Description
DataFrame

Data frame containing all parent executions.

get_metrics(metrics_name)

Return metric data frame.

Parameters:

Name Type Description Default
metrics_name str

Metrics name.

required

Returns:

Type Description
Optional[DataFrame]

Data frame containing all metrics.

dumptojson(pipeline_name, exec_uuid=None)

Return JSON-parsable string containing details about the given pipeline.

Parameters:

Name Type Description Default
pipeline_name str

Name of an AI pipelines.

required
exec_uuid Optional[str]

Optional stage execution_uuid - filter stages by this execution_uuid.

None

Returns:

Type Description
Optional[str]

Pipeline in JSON format.