Skip to content

Ontology

Common Metadata Ontology

Common Metadata Ontology (CMO) is proposed to integrate and aggregate the pipeline metadata from various sources such as Papers-with-code, OpenML and Huggingface. CMF's data model is a manifestation of CMO which is specifically designed to capture the pipeline-centric metadata of AI pipelines. It consists of nodes to represent a pipeline, components of a pipeline (stages), relationships to capture interaction among pipeline entities and properties. CMO offers interoperability of diverse metadata, search and recommendation with reasoning capabilities. CMO offers flexibility to incorporate various executions implemented for each stage such as dataset preprocessing, feature engineering, training (including HPO), testing and evaluation. This enables robust search capabilities to identify the best execution path for a given pipeline. Additionally, CMO also facilitates the inclusion of additional semantic and statistical properties to enhance the richness and comprehensiveness of the metadata associated with them. The overview of CMO can be found below.

Common Metadata Ontology

The external link to arrows.app can be found here

Sample pipeline represented using CMO

Sample Pipeline

The sample figure shows a pipeline titled "Robust outlier detection by de-biasing VAE likelihoods" executed for "Outlier Detection" task for the stage train/test. The model used in the pipeline was "Variational Autoencoder". Several datasets were used in the pipeline implementation which are as follows (i) German Traffic Sign, (ii) Street View House Numbers and (iii) CelebFaces Arrtibutes dataset. The corresponding hyperparameters used and the metrics generated as a result of execution are included in the figure. The external link to source figure created using arrows.app can be found here

Turtle Syntax

The Turtle format of formal ontology can be found here

Properties of each nodes

The properties of each node can be found below.

Pipeline

AI pipeline executed to solve a machine or deep learning Task

Properties
  • pipeline_id
  • pipeline_name
  • pipeline_source
  • source_id
  • custom_properties*
Report

Any published text document regarding the pipeline implementation

Properties
  • report_id
  • report_title
  • report_pdf_url
  • source
  • source_id
  • abstract*
  • custom_properties*
Task

The AI Task for which the pipeline is implemented. Example: image classification

Properties
  • task_id
  • task_name
  • task_description
  • task_type
  • modality
  • category
  • source
  • custom_properties*
Framework

The framework used to implement the pipeline and their code repository

Properties
  • framework_id
  • framework_name
  • code_repo_url
  • framework_version
  • source
Stage

Various stages of the pipeline such as data preprocessing, training, testing or evaluation

Properties
  • stage_id
  • stage_name
  • source
  • pipeline_id
  • pipeline_name
  • custom_properties
Execution

Multiple executions of a given stage in a pipeline

Properties
  • execution_id
  • execution_name
  • stage_id
  • stage_name
  • pipeline_id
  • pipeline_name
  • source
  • command (CLI command to run the execution)
  • custom_properties
Artifact

Artifacts such as model, dataset and metric generated at the end of each execution

Properties
  • artifact_id
  • artifact_name
  • pipeline_id
  • pipeline_name
  • execution_id
  • source
  • custom_properties
Dataset

Subclass of artifact. The dataset used in each Execution of a Pipeline

Properties
  • dataset_id
  • dataset_name
  • dataset_url
  • modality
  • description
  • source
  • custom_properties
Model

Subclass of artifact. The model used in each execution or produced as a result of an execution

Properties
  • model_id
  • model_name
  • model_class
  • description
  • artifact_id
  • source
  • custom_properties
Metric

Subclass of artifact. The evaluation result of each execution

Properties
  • metric_id
  • metric_name
  • artifact_id
  • evaluations
  • source
  • custom_properties**
Hyperparameters

Parameter setting using for each Execution of a Stage

Properties
  • parameter_id
  • parameter_setting (key-value pair)
  • source
  • model_id
  • custom_properties

NOTE: * are optional properties * There additional information on each node, different for each source. As of now, there are included in the KG for efficient search. But they are available to be used in the future to extract the data and populate as node properties. * *For metric, there are umpteen possible metric names and values. Therefore, we capture all of them as a key value pair under evaluations * custom_properties are where user can enter custom properties for each node while executing a pipeline * source is the source from which the node is obtained - papers-with-code, openml, huggingface

Published works

  • R. Venkataramanan, A. Tripathy, M. Foltin, H. Y. Yip, A. Justine and A. Sheth, "Knowledge Graph Empowered Machine Learning Pipelines for Improved Efficiency, Reusability, and Explainability," in IEEE Internet Computing, vol. 27, no. 1, pp. 81-88, 1 Jan.-Feb. 2023, doi: 10.1109/MIC.2022.3228087. Link: https://www.computer.org/csdl/magazine/ic/2023/01/10044293/1KL6TPO5huw
  • Publio, G. C., Esteves, D., Ławrynowicz, A., Panov, P., Soldatova, L., Soru, T., ... & Zafar, H. (2018). ML-schema: exposing the semantics of machine learning with schemas and ontologies. arXiv preprint arXiv:1807.05351. Link - http://ml-schema.github.io/documentation/ML%20Schema.html
  • Nguyen, A., Weller, T., Färber, M., & Sure-Vetter, Y. (2020). Making neural networks fair. In Knowledge Graphs and Semantic Web: Second Iberoamerican Conference and First Indo-American Conference, KGSWC 2020, Mérida, Mexico, November 26–27, 2020, Proceedings 2 (pp. 29-44). Springer International Publishing. Link - https://arxiv.org/pdf/1907.11569.pdf
  • Humm, B. G., & Zender, A. (2021). An ontology-based concept for meta automl. In Artificial Intelligence Applications and Innovations: 17th IFIP WG 12.5 International Conference, AIAI 2021, Hersonissos, Crete, Greece, June 25–27, 2021, Proceedings 17 (pp. 117-128). Springer International Publishing.Link - https://www.researchgate.net/profile/Alexander-Zender-2/publication/352574909_An_Ontology-Based_Concept_for_Meta_AutoML/links/619691e107be5f31b796d2fd/An-Ontology-Based-Concept-for-Meta-AutoML.pdf