🔗 Ontology

Common Metadata Ontology¶

Common Metadata Ontology (CMO) integrates and aggregates pipeline metadata from various sources such as Papers-with-Code, OpenML, and Hugging Face. CMF's data model is a manifestation of CMO, specifically designed to capture the pipeline-centric metadata of AI pipelines. It consists of nodes to represent a pipeline, components of a pipeline (stages), relationships to capture interactions among pipeline entities, and properties. CMO offers interoperability of diverse metadata, search, and recommendation with reasoning capabilities. CMO offers flexibility to incorporate various executions implemented for each stage, such as dataset preprocessing, feature engineering, training (including hyperparameter optimization), testing, and evaluation. This enables robust search capabilities to identify the best execution path for a given pipeline. Additionally, CMO also facilitates the inclusion of additional semantic and statistical properties to enhance the richness and comprehensiveness of the metadata associated with them. An overview of CMO can be found below.

Common Metadata Ontology

The external link to arrows.app can be found here

Sample pipeline represented using CMO¶

Sample Pipeline

The sample figure shows a pipeline titled "Robust outlier detection by de-biasing VAE likelihoods" executed for the "Outlier Detection" task, focusing on the stage train/test. The model used in the pipeline was "Variational Autoencoder". Several datasets were used in the pipeline:

German Traffic Sign
Street View House Numbers
CelebFaces Attributes dataset.

The corresponding hyperparameters used and the metrics generated as a result of execution are included in the figure. The external link to the source figure created using arrows.app can be found here

Turtle Syntax¶

The Turtle format of the formal ontology can be found here

Properties of each node¶

The properties of each node can be found below.

Pipeline¶

AI pipeline executed to solve a machine or deep learning task

Properties¶

pipeline_id
pipeline_name
pipeline_source
source_id
custom_properties*

Report¶

Any published text document regarding the pipeline implementation

Properties¶

report_id
report_title
report_pdf_url
source
source_id
abstract*
custom_properties*

Task¶

The AI task for which the pipeline is implemented. Example: image classification

Properties¶

task_id
task_name
task_description
task_type
modality
category
source
custom_properties*

Framework¶

The framework used to implement the pipeline and its code repository

Properties¶

framework_id
framework_name
code_repo_url
framework_version
source

Stage¶

Various stages of the pipeline, such as data preprocessing, training, testing, or evaluation

Properties¶

stage_id
stage_name
source
pipeline_id
pipeline_name
custom_properties

Execution¶

Multiple executions of a given stage in a pipeline

Properties¶

execution_id
execution_name
stage_id
stage_name
pipeline_id
pipeline_name
source
command (CLI command to run the execution)
custom_properties

Artifact¶

Artifacts such as model, dataset, and metric generated at the end of each execution

Properties¶

artifact_id
artifact_name
pipeline_id
pipeline_name
execution_id
source
custom_properties

Dataset¶

Subclass of artifact. The dataset used in each execution of a pipeline

Properties¶

dataset_id
dataset_name
dataset_url
modality
description
source
custom_properties

Model¶

Subclass of artifact. The model used in each execution or produced as a result of an execution

Properties¶

model_id
model_name
model_class
description
artifact_id
source
custom_properties

Metric¶

Subclass of artifact. The evaluation result of each execution

Properties¶

metric_id
metric_name
artifact_id
evaluations
source
custom_properties**

Hyperparameters¶

Parameter settings used for each execution of a stage

Properties¶

parameter_id
parameter_setting (key-value pair)
source
model_id
custom_properties

NOTE: * are optional properties * There is additional information on each node, different for each source. As of now, these are included in the KG for efficient search, but they are available to be used in the future to extract the data and populate as node properties. * *For metric, there are umpteen possible metric names and values. Therefore, we capture all of them as a key-value pair under evaluations. * custom_properties are where users can enter custom properties for each node while executing a pipeline. * source is the source from which the node is obtained - Papers-with-Code, OpenML, Hugging Face.

Published works¶

R. Venkataramanan, A. Tripathy, M. Foltin, H. Y. Yip, A. Justine, and A. Sheth, "Knowledge Graph Empowered Machine Learning Pipelines for Improved Efficiency, Reusability, and Explainability," in IEEE Internet Computing, vol. 27, no. 1, pp. 81-88, 1 Jan.-Feb. 2023, doi: 10.1109/MIC.2022.3228087. Link: https://www.computer.org/csdl/magazine/ic/2023/01/10044293/1KL6TPO5huw

Publio, G. C., Esteves, D., Ławrynowicz, A., Panov, P., Soldatova, L., Soru, T., ... & Zafar, H. (2018). ML-schema: exposing the semantics of machine learning with schemas and ontologies. arXiv preprint arXiv:1807.05351. Link - http://ml-schema.github.io/documentation/ML%20Schema.html
Nguyen, A., Weller, T., Färber, M., & Sure-Vetter, Y. (2020). Making neural networks fair. In Knowledge Graphs and Semantic Web: Second Iberoamerican Conference and First Indo-American Conference, KGSWC 2020, Mérida, Mexico, November 26–27, 2020, Proceedings 2 (pp. 29-44). Springer International Publishing. Link - https://arxiv.org/pdf/1907.11569.pdf
Humm, B. G., & Zender, A. (2021). An ontology-based concept for meta AutoML. In Artificial Intelligence Applications and Innovations: 17th IFIP WG 12.5 International Conference, AIAI 2021, Hersonissos, Crete, Greece, June 25–27, 2021, Proceedings 17 (pp. 117-128). Springer International Publishing. Link - https://www.researchgate.net/profile/Alexander-Zender-2/publication/352574909_An_Ontology-Based_Concept_for_Meta_AutoML/links/619691e107be5f31b796d2fd/An-Ontology-Based-Concept-for-Meta-AutoML.pdf

🔗 Ontology

Common Metadata Ontology¶

Sample pipeline represented using CMO¶

Turtle Syntax¶

Properties of each node¶

Pipeline¶

Properties¶

Report¶

Properties¶

Task¶

Properties¶

Framework¶

Properties¶

Stage¶

Properties¶

Execution¶

Properties¶

Artifact¶

Properties¶

Dataset¶

Properties¶

Model¶

Properties¶

Metric¶

Properties¶

Hyperparameters¶

Properties¶

Published works¶

Related works¶