Ontology
Common Metadata Ontology¶
Common Metadata Ontology (CMO) is proposed to integrate and aggregate the pipeline metadata from various sources such as Papers-with-code, OpenML and Huggingface. CMF's data model is a manifestation of CMO which is specifically designed to capture the pipeline-centric metadata of AI pipelines. It consists of nodes to represent a pipeline, components of a pipeline (stages), relationships to capture interaction among pipeline entities and properties. CMO offers interoperability of diverse metadata, search and recommendation with reasoning capabilities. CMO offers flexibility to incorporate various executions implemented for each stage such as dataset preprocessing, feature engineering, training (including HPO), testing and evaluation. This enables robust search capabilities to identify the best execution path for a given pipeline. Additionally, CMO also facilitates the inclusion of additional semantic and statistical properties to enhance the richness and comprehensiveness of the metadata associated with them. The overview of CMO can be found below.
The external link to arrows.app can be found here
Sample pipeline represented using CMO¶
The sample figure shows a pipeline titled "Robust outlier detection by de-biasing VAE likelihoods" executed for "Outlier Detection" task for the stage train/test. The model used in the pipeline was "Variational Autoencoder". Several datasets were used in the pipeline implementation which are as follows (i) German Traffic Sign, (ii) Street View House Numbers and (iii) CelebFaces Arrtibutes dataset. The corresponding hyperparameters used and the metrics generated as a result of execution are included in the figure. The external link to source figure created using arrows.app can be found here
Turtle Syntax¶
The Turtle format of formal ontology can be found here
Properties of each nodes¶
The properties of each node can be found below.
Pipeline¶
AI pipeline executed to solve a machine or deep learning Task
Properties¶
- pipeline_id
- pipeline_name
- pipeline_source
- source_id
- custom_properties*
Report¶
Any published text document regarding the pipeline implementation
Properties¶
- report_id
- report_title
- report_pdf_url
- source
- source_id
- abstract*
- custom_properties*
Task¶
The AI Task for which the pipeline is implemented. Example: image classification
Properties¶
- task_id
- task_name
- task_description
- task_type
- modality
- category
- source
- custom_properties*
Framework¶
The framework used to implement the pipeline and their code repository
Properties¶
- framework_id
- framework_name
- code_repo_url
- framework_version
- source
Stage¶
Various stages of the pipeline such as data preprocessing, training, testing or evaluation
Properties¶
- stage_id
- stage_name
- source
- pipeline_id
- pipeline_name
- custom_properties
Execution¶
Multiple executions of a given stage in a pipeline
Properties¶
- execution_id
- execution_name
- stage_id
- stage_name
- pipeline_id
- pipeline_name
- source
- command (CLI command to run the execution)
- custom_properties
Artifact¶
Artifacts such as model, dataset and metric generated at the end of each execution
Properties¶
- artifact_id
- artifact_name
- pipeline_id
- pipeline_name
- execution_id
- source
- custom_properties
Dataset¶
Subclass of artifact. The dataset used in each Execution of a Pipeline
Properties¶
- dataset_id
- dataset_name
- dataset_url
- modality
- description
- source
- custom_properties
Model¶
Subclass of artifact. The model used in each execution or produced as a result of an execution
Properties¶
- model_id
- model_name
- model_class
- description
- artifact_id
- source
- custom_properties
Metric¶
Subclass of artifact. The evaluation result of each execution
Properties¶
- metric_id
- metric_name
- artifact_id
- evaluations
- source
- custom_properties**
Hyperparameters¶
Parameter setting using for each Execution of a Stage
Properties¶
- parameter_id
- parameter_setting (key-value pair)
- source
- model_id
- custom_properties
NOTE: * are optional properties * There additional information on each node, different for each source. As of now, there are included in the KG for efficient search. But they are available to be used in the future to extract the data and populate as node properties. * *For metric, there are umpteen possible metric names and values. Therefore, we capture all of them as a key value pair under evaluations * custom_properties are where user can enter custom properties for each node while executing a pipeline * source is the source from which the node is obtained - papers-with-code, openml, huggingface
Published works¶
- R. Venkataramanan, A. Tripathy, M. Foltin, H. Y. Yip, A. Justine and A. Sheth, "Knowledge Graph Empowered Machine Learning Pipelines for Improved Efficiency, Reusability, and Explainability," in IEEE Internet Computing, vol. 27, no. 1, pp. 81-88, 1 Jan.-Feb. 2023, doi: 10.1109/MIC.2022.3228087. Link: https://www.computer.org/csdl/magazine/ic/2023/01/10044293/1KL6TPO5huw
Related works¶
- Publio, G. C., Esteves, D., Ławrynowicz, A., Panov, P., Soldatova, L., Soru, T., ... & Zafar, H. (2018). ML-schema: exposing the semantics of machine learning with schemas and ontologies. arXiv preprint arXiv:1807.05351. Link - http://ml-schema.github.io/documentation/ML%20Schema.html
- Nguyen, A., Weller, T., Färber, M., & Sure-Vetter, Y. (2020). Making neural networks fair. In Knowledge Graphs and Semantic Web: Second Iberoamerican Conference and First Indo-American Conference, KGSWC 2020, Mérida, Mexico, November 26–27, 2020, Proceedings 2 (pp. 29-44). Springer International Publishing. Link - https://arxiv.org/pdf/1907.11569.pdf
- Humm, B. G., & Zender, A. (2021). An ontology-based concept for meta automl. In Artificial Intelligence Applications and Innovations: 17th IFIP WG 12.5 International Conference, AIAI 2021, Hersonissos, Crete, Greece, June 25–27, 2021, Proceedings 17 (pp. 117-128). Springer International Publishing.Link - https://www.researchgate.net/profile/Alexander-Zender-2/publication/352574909_An_Ontology-Based_Concept_for_Meta_AutoML/links/619691e107be5f31b796d2fd/An-Ontology-Based-Concept-for-Meta-AutoML.pdf