Lineage Visualization Page¤

The Lineage page provides interactive visualizations of data flow and dependencies in your ML pipelines. It helps you understand how artifacts and executions are connected, trace data provenance, and analyze pipeline structure.

Lineage tracking captures the relationships between:

Artifacts: Datasets, models, and metrics
Executions: Pipeline stage runs
Data Flow: How data moves through pipeline stages
Dependencies: Which artifacts depend on which executions

Visualization Types¤

The Lineage page offers four different visualization modes:

1. Artifact Tree¤

Purpose: Hierarchical view of artifact dependencies

Use Cases: - Understand data transformation pipeline - Trace dataset lineage from raw to final - Identify reused artifacts across stages

Features: - Tree layout showing parent-child relationships - Color-coded by artifact type (Dataset/Model/Metrics) - Expandable/collapsible nodes - Hover for artifact details

Artifact Tree Lineage

2. Execution Tree¤

Purpose: Hierarchical view of execution dependencies

Use Cases:

Understand pipeline execution flow
Debug pipeline stage ordering
Identify parallel vs sequential stages

Features:

Select specific execution type from dropdown
Shows execution order and dependencies

Execution Tree Lineage

3. Artifact-Execution Tree¤

Purpose: Combined view showing both artifacts and executions

Use Cases:

Complete end-to-end pipeline visualization
Understand which execution created which artifact
Trace full data lineage with transformations

Features:

Alternating artifact and execution nodes
Shows input/output relationships
Complete provenance trail
Filtered by pipeline

Artifact Execution Tree Lineage

Using the Lineage Page¤

Example 1: Trace Data Provenance¤

Goal: Understand where a specific model's training data came from

Navigate to Lineage page
Select your pipeline from dropdown
Choose Artifact Tree tab
Find your trained model in the tree
Trace backwards to see:
Training dataset used
Preprocessing steps applied
Original raw data source

Example 2: Debug Pipeline Execution Order¤

Goal: Verify stages executed in correct sequence

Select Execution Tree tab
Choose the execution type from dropdown
View the tree structure showing:
Which stages ran first
Which stages ran in parallel
Dependencies between stages
Identify any out-of-order executions

Example 3: Analyze Full Pipeline Flow¤

Goal: Get complete picture of data flow through pipeline

Select Artifact-Execution Tree tab
View the alternating artifact → execution → artifact pattern
Trace a specific data path:
Start from input dataset
Follow through each transformation
End at final output (model/metrics)
Hover on nodes to see details
Click to navigate to artifact or execution page

Example 4: Find Reused Artifacts¤

Goal: Identify which artifacts are used by multiple executions

Use Artifact Tree visualization
Look for artifacts with multiple outgoing edges
These artifacts are inputs to multiple stages
Useful for understanding data sharing patterns
Can help identify opportunities for caching

Artifacts Page - Detailed artifact information
Executions Page - Execution details and logs
CMF Client Commands - CLI for metadata management
Installation & Setup - Set up CMF Server

Additional Resources¤

Understanding Lineage Concepts¤

Provenance: History of an artifact's creation and transformations
Upstream: Artifacts and executions that contributed to current node
Downstream: Artifacts and executions that depend on current node
Lineage Graph: Directed acyclic graph (DAG) of dependencies