cmflib.cmfquery.CmfQuery¶
cmflib.cmfquery.CmfQuery(filepath='mlmd', is_server=False)
¶
Bases: object
CMF Query communicates with the MLMD database and implements basic search and retrieval functionality.
This class has been designed to work with the CMF framework. CMF alters names of pipelines, stages and artifacts
in various ways. This means that actual names in the MLMD database will be different from those originally provided
by users via CMF API. When methods in this class accept name
parameters, it is expected that values of these
parameters are fully-qualified names of respective entities.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filepath
|
str
|
Path to the MLMD database file. |
'mlmd'
|
get_pipeline_names()
¶
Return names of all pipelines.
Returns:
Type | Description |
---|---|
List[str]
|
List of all pipeline names. |
get_pipeline_id(pipeline_name)
¶
Return pipeline identifier for the pipeline names pipeline_name
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pipeline_name
|
str
|
Name of the pipeline. |
required |
Returns:
Type | Description |
---|---|
int
|
Pipeline identifier or -1 if one does not exist. |
get_pipeline_stages(pipeline_name)
¶
Return list of pipeline stages for the pipeline with the given name.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pipeline_name
|
str
|
Name of the pipeline for which stages need to be returned. In CMF, there are no different pipelines with the same name. |
required |
Returns:
Type | Description |
---|---|
List[str]
|
List of stage names associated with the given pipeline. |
get_all_exe_in_stage(stage_name)
¶
Return list of all executions for the stage with the given name.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
stage_name
|
str
|
Name of the stage. Before stages are recorded in MLMD, they are modified (e.g., pipeline name will become part of the stage name). So stage names from different pipelines will not collide. |
required |
Returns:
Type | Description |
---|---|
List[Execution]
|
List of executions for the given stage. |
get_all_executions_by_ids_list(exe_ids)
¶
Return executions for given execution ids list as a pandas data frame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
exe_ids
|
List[int]
|
List of execution identifiers. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
Data frame with all executions for the list of given execution identifiers. |
get_all_artifacts_by_context(pipeline_name)
¶
Return artifacts for given pipeline name as a pandas data frame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pipeline_name
|
str
|
Name of the pipeline. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
Data frame with all artifacts associated with given pipeline name. |
get_all_artifacts_by_ids_list(artifact_ids)
¶
Return all artifacts for the given artifact ids list.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
artifact_ids
|
List[int]
|
List of artifact identifiers |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
Data frame with all artifacts for the given artifact ids list. |
get_all_executions_in_stage(stage_name)
¶
Return executions of the given stage as pandas data frame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
stage_name
|
str
|
Stage name. See doc strings for the prev method. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
Data frame with all executions associated with the given stage. |
get_artifact_df(artifact, d=None)
¶
Return artifact's data frame representation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
artifact
|
Artifact
|
MLMD entity representing artifact. |
required |
d
|
Optional[Dict]
|
Optional initial content for data frame. |
None
|
Returns:
Type | Description |
---|---|
DataFrame
|
A data frame with the single row containing attributes of this artifact. |
get_all_artifacts()
¶
Return names of all artifacts.
Returns:
Type | Description |
---|---|
List[str]
|
List of all artifact names. |
get_artifact(name)
¶
Return artifact's data frame representation using artifact name.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str
|
Artifact name. |
required |
Returns:
Type | Description |
---|---|
Optional[DataFrame]
|
Pandas data frame with one row containing attributes of this artifact. |
get_all_artifacts_for_execution(execution_id)
¶
Return input and output artifacts for the given execution.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
execution_id
|
int
|
Execution identifier. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
Data frame containing input and output artifacts for the given execution, one artifact per row. |
get_all_artifact_types()
¶
Return names of all artifact types.
Returns:
Type | Description |
---|---|
List[str]
|
List of all artifact types. |
get_all_executions_for_artifact(artifact_name)
¶
Return executions that consumed and produced given artifact.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
artifact_name
|
str
|
Artifact name. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
Pandas data frame containing stage executions, one execution per row. |
get_one_hop_child_artifacts(artifact_name, pipeline_id=None)
¶
Get artifacts produced by executions that consume given artifact.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
artifact
|
name
|
Name of an artifact. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
Output artifacts of all executions that consumed given artifact. |
get_all_child_artifacts(artifact_name)
¶
Return all downstream artifacts starting from the given artifact.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
artifact_name
|
str
|
Artifact name. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
Data frame containing all child artifacts. |
get_one_hop_parent_artifacts(artifact_name)
¶
Return input artifacts for the execution that produced the given artifact.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
artifact_name
|
str
|
Artifact name. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
Data frame containing immediate parent artifact of given artifact. |
get_all_parent_artifacts(artifact_name)
¶
Return all upstream artifacts.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
artifact_name
|
str
|
Artifact name. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
Data frame containing all parent artifacts. |
get_all_parent_executions(artifact_name)
¶
Return all executions that produced upstream artifacts for the given artifact.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
artifact_name
|
str
|
Artifact name. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
Data frame containing all parent executions. |
get_metrics(metrics_name)
¶
Return metric data frame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
metrics_name
|
str
|
Metrics name. |
required |
Returns:
Type | Description |
---|---|
Optional[DataFrame]
|
Data frame containing all metrics. |
dumptojson(pipeline_name, exec_uuid=None)
¶
Return JSON-parsable string containing details about the given pipeline.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pipeline_name
|
str
|
Name of an AI pipelines. |
required |
exec_uuid
|
Optional[str]
|
Optional stage execution_uuid - filter stages by this execution_uuid. |
None
|
Returns:
Type | Description |
---|---|
Optional[str]
|
Pipeline in JSON format. |