LOT4KG

LOT for Knowlege Graphs Construction


LOT4KG is methodology for KG lifecycle that extends the LOT ontology engineering methodology with KG construction, ontology evolution and the subsequent KG evolution.

LOT4KG workflow.

Requirements specification workflow.
Ontology Evolution Extension Workflow

Change Analysis.The goal of this activity is to formally capture the changes that are to be implemented. The output of the Change Conceptualisation is a list of formalised changes. During the change evaluation activity, the formalised changes are evaluated against the ontology, to ensure that the ontology remains consistent and coherent, taking them as input and producing an evaluation report as output . Therefore, this activity requires the list of formalised changes for the evaluation to be executed and outputs an evaluation report.

Recommended Tools (based on the survey):

  • Conceptualization:
    • Microsoft Whiteboard - The visual collaboration canvas in Microsoft 365 for flexible work and learning.
    • Chowlk - A set of recommendations for ontology diagrams representation.
    • Protégé - A free, open-source ontology editor and framework for building intelligent systems
    • Diagrams - A free online diagram software for making flowcharts, process diagrams, org charts, UML, ER and network diagrams.
    • DOGMA - A methodological framework for ontology engineering.
    • Metaphactory - Transforms your data into consumable, contextual & actionable knowledge and drives continuous decision intelligence using knowledge graphs and AI.
  • Evaluation:
    • SPARQL - A language for querying RDF data.
    • Metaphactory - Transforms your data into consumable, contextual & actionable knowledge and drives continuous decision intelligence using knowledge graphs and AI.


Ontology Update. In the first sub-activity (Ontology Conceptualization) of ontology update the ontology model is built from the new set of ontological requirements from the previous step. In the next sub-activity (Ontology Encoding), an engineer applies the necessary formalised changes to the ontology, similarly to the ontology encoding sub-activity within the high-level ontology implementation activity. The output of this activity is the code of a new ontology. Then the ontology is evaluated before being published.

Recommended Tools (based on the survey):

  • Change Encoding:
    • Widoco - A wizard for documenting ontologies.
    • Chowlk - A set of recommendations for ontology diagrams representation.
    • Protégé - A free, open-source ontology editor and framework for building intelligent systems
    • Diagrams - A free online diagram software for making flowcharts, process diagrams, org charts, UML, ER and network diagrams.
    • Metaphactory - Transforms your data into consumable, contextual & actionable knowledge and drives continuous decision intelligence using knowledge graphs and AI.
  • Ontology Evaluation:
    • GRLC - Builds a web API from SPARQL queries hosted on GitHub to accessing triple store data.
    • SPARQL - A language for querying RDF data.
    • SHACL - A language for validating RDF graphs against a set of conditions.
    • OOPS! - An on-line tool for ontology evaluation.
    • Protégé - A free, open-source ontology editor and framework for building intelligent systems
    • RDFShape - A playground for RDF data conversion, validation and visualization, among other features.
    • RDFlib - A Python library for working with RDF, a simple yet powerful language for representing information.

Ontology implementation workflow.
Knowledge Graph Engineering Extension Workflow

Knowledge Graph Implementation. The knowledge graph implementation activity aims to construct and validate the knowledge graph. It is composed by a set of sub-activities that transform the input data, which can be of any type and format (e.g., tabular in CSV, text in PDF, etc.), into the knowledge graph and its later validation over a set of constraints. The output of this activity is the implemented knowledge graph (virtual or materialized) and the associated rules for constructing (e.g., RML, SPARQL-Anything, etc.) and validating it (e.g., ShEx or SHACL shapes). The latter may also include a validation report as output.

Recommended Tools (based on the survey):

  • Data Preparation:
    • Triply ETL - TriplyETL allows you to create and maintain production-grade linked data pipelines.
    • Pandas - A fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.
    • RDFlib - A Python library for working with RDF, a simple yet powerful language for representing information.
    • Oxigraph - A graph database implementing the SPARQL standard.
    • Dremio - A lakehouse built natively on Apache Iceberg, Polaris, and Arrow - providing flexibility, preventing lock-in, and enabling community-driven innovation.
    • DBT - An open-source command line tool that helps analysts and engineers transform data in their warehouse more effectively.
    • Metaphacts ETL Pipeline - Provides a means to convert structured data to RDF, perform post-processing steps, and ingest it into a graph database.
    • Open Refine - A powerful free, open source tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data.
    • Cline - A fully collaborative AI partner that's open source, fully extensible, and designed to amplify developer impact.
    • GPT - A a generative artificial intelligence chatbot.
  • Mapping Generation:
    • YARRRML - A human readable text-based representation for declarative Linked Data generation rules.
    • YARRRML Parser - A library allows to convert YARRRML rules to RML or R2RML rules.
    • RDFlib - A Python library for working with RDF, a simple yet powerful language for representing information.
    • Cline - A fully collaborative AI partner that's open source, fully extensible, and designed to amplify developer impact.
    • Yatter - A tool for translating mapping rules from YARRRML in a turtle-based serialization of RML or R2RML.
    • RML - A generic mapping language, based on and extending R2RML
    • Chimera - A framework implemented on top of Apache Camel offering components to define schema and data transformation pipelines based on Semantic Web solutions.
    • SPARQL Anything - A system for Semantic Web re-engineering that allows users to query anything with SPARQL.
    • ShExML - A language based on ShEx to map and merge heterogeneous data sources.
    • Ontopic Studio - An environment for building knowledge graphs from relational data.
    • Tiny RML - An implementation of a subset of RML and R2RML with some helpful extended features.
    • R2RML-F - An R2RML Implementation.
    • Pandas - A fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.
    • RDF4J - A powerful Java framework for processing and handling RDF data.
    • GPT - A a generative artificial intelligence chatbot.
    • Metaphacts ETL Pipeline - Provides a means to convert structured data to RDF, perform post-processing steps, and ingest it into a graph database.
  • Data Transformation:
    • RMLMapper - A tool that executes RML rules to generate Linked Data.
    • Morph-KGC - An engine that constructs RDF knowledge graphs from heterogeneous data sources with the R2RML and RML mapping languages.
    • Chimera - A framework implemented on top of Apache Camel offering components to define schema and data transformation pipelines based on Semantic Web solutions.
    • SPARQL Anything - A system for Semantic Web re-engineering that allows users to query anything with SPARQL.
    • RDFlib - A Python library for working with RDF, a simple yet powerful language for representing information.
    • py-SUbyT - A Python library for Semantic Uplifting by Templates.
    • Morph-xR2RML - An implementation of the xR2RML mapping language that enables the description of mappings from relational or non relational databases to RDF.
    • SPARQL Micro-Services - An architecture that enables querying Web APIs with SPARQL, as well as assigning dereferenceable URIs to Web API resources that do not have a URI in the first place.
    • Jena - A free and open source Java framework for building Semantic Web and Linked Data applications.
    • HDT - A a compact data structure and binary serialization format for RDF.
    • Cline - A fully collaborative AI partner that's open source, fully extensible, and designed to amplify developer impact.
    • GPT - A a generative artificial intelligence chatbot.
    • Metaphacts ETL Pipeline - Provides a means to convert structured data to RDF, perform post-processing steps, and ingest it into a graph database.
  • Constraint Generation:
    • SPARQL - A language for querying RDF data.
    • SHACL - A language for validating RDF graphs against a set of conditions.
    • Astrea - A tool that produces the SHACL shape that can be infered from one, or more, ontologies.
    • Python - A programming language that lets you work quickly and integrate systems more effectively.
    • ShExML - A language based on ShEx to map and merge heterogeneous data sources.
    • RDFlib - A Python library for working with RDF, a simple yet powerful language for representing information.
    • Metaphactory - Transforms your data into consumable, contextual & actionable knowledge and drives continuous decision intelligence using knowledge graphs and AI.
    • Cline - A fully collaborative AI partner that's open source, fully extensible, and designed to amplify developer impact.
    • GPT - A a generative artificial intelligence chatbot.
  • Data Validation:
    • Jena SHACL - An implementation of the W3C Shapes Constraint Language (SHACL).
    • pySHACL - A Python validator for SHACL.
    • GPT - A a generative artificial intelligence chatbot.
    • Cline - A fully collaborative AI partner that's open source, fully extensible, and designed to amplify developer impact.
    • Metaphacts ETL Pipeline - Provides a means to convert structured data to RDF, perform post-processing steps, and ingest it into a graph database.


Knowledge Graph Publication. The high-level KG publication activity is a counterpart to the ontology publication activity. It captures the necessary tasks and steps which are taken to document and make the KG available online. The output of this activity is the documented KG and the online accessible resource. The lower-level activities part of the KG publication are documentation and data publication. During the documentation step, the mappings, RDF data, SHACL shapes and validation report are used to document the process and output of the implementation. The output is the HTML documentation, which can then be published during the data publication step alongside the online KG.

Recommended Tools (based on the survey):

  • Documentation:
    • Github - A proprietary developer platform that allows developers to create, store, manage, and share their code.
    • GitLab - A comprehensive AI-powered DevSecOps Platform.
    • Zenodo - A general-purpose open repository.
    • Confluence - A web-based corporate wiki .
    • Metaphactory - Transforms your data into consumable, contextual & actionable knowledge and drives continuous decision intelligence using knowledge graphs and AI.
    • Markdown - A lightweight markup language for creating formatted text using a plain-text editor.
  • Publication:
    • Triply DB Data Stories - A way of communicating information about your linked data along with explanatory text while also being able to integrate query results.
    • Github - A proprietary developer platform that allows developers to create, store, manage, and share their code.
    • GitLab - A comprehensive AI-powered DevSecOps Platform.
    • Zenodo - A general-purpose open repository.
    • Virtuoso - A database function within the Virtuoso database system specifically designed for managing and querying RDF data.
    • GraphDB - An enterprise ready Semantic Graph Database.
    • W3id - A secure, permanent URL re-direction service for Web applications.
    • GraphQL - A data query and manipulation language that allows specifying what data is to be retrieved or modified.
    • brwsr - A Lightweight Linked Data Browser.
    • Jena - A free and open source Java framework for building Semantic Web and Linked Data applications.
    • Metaphactory - Transforms your data into consumable, contextual & actionable knowledge and drives continuous decision intelligence using knowledge graphs and AI.
    • Ontodia - A JavaScript library that allows to visualize, navigate and explore data in the form of an interactive graph based on underlying data sources.


Knowledge Graph Maintenance. This task is modelled along the same lines as the ontology maintenance task. It is specifically aimed at fixing bugs in the already published KG. This step does not capture proper evolution, the changes within the ontology or the data sources. Therefore, the detailed activity is named bug detection. The output of the activity is the issues and bugs to be fixed by backtracking in the process. The new data requirements task refers to updates in the input data sources that trigger ontology and/or KG update activities.

Recommended Tools (based on the survey):

  • Bug Detection:
    • Git - A distributed version control system that tracks versions of files.
    • SPARQL - A language for querying RDF data.
    • GRLC - Builds a web API from SPARQL queries hosted on GitHub to accessing triple store data.
    • Apache airflow - A web-based corporate wiki .
    • Metaphacts ETL Pipeline - Provides a means to convert structured data to RDF, perform post-processing steps, and ingest it into a graph database.
    • RDFlib - A Python library for working with RDF, a simple yet powerful language for representing information.
    • SHACL - A language for validating RDF graphs against a set of conditions.

Ontology publication workflow.
Knowledge Graph Lifecycle Workflow

Change Detection and Impact Analysis Step. To be able to update an already existing KG, the changes applied to the ontology need to be examined and analysed against the KG. We define two sub-activities: detect delta (optional) and assess change impact. The main output of this activity is a list of relevant changes, relevant for the update of the KG.

Recommended Tools (based on the survey):

  • Delta Detection:
    • Git - A distributed version control system that tracks versions of files.
    • UNIX Diff - A language for querying RDF data.
    • RDFlib - A Python library for working with RDF, a simple yet powerful language for representing information.


Knowledge Graph Update. This activity, just as the one for updating the ontology, serves as a mirror to the KG implementation activity. Its purpose is to bring the KG up to date with regards to any types of changes in the ontology or source data. Hence, the KG update activity can be triggered from the change detection activities which provide the list of ontology changes, or from changes to the source data, depicted by the arrow connecting the KG maintenance activity with KG update. The KG update activity has four sub-activities that make it possible to update the KG, each of them associated with the corresponding assets of the KG implementation (mappings, RDF graph, constraints and validation report). The high-level output is the updated RDF graph and its associated assets (i.e. mappings, data constraints, and validation report), which are to be published using the KG publishing activity.

Recommended Tools (based on the survey):

  • Mappings Update:
    • YARRRML - A human readable text-based representation for declarative Linked Data generation rules.
    • RDFlib - A Python library for working with RDF, a simple yet powerful language for representing information.
    • Yatter - A tool for translating mapping rules from YARRRML in a turtle-based serialization of RML or R2RML.
  • Transformation:
    • Morph-KGC - An engine that constructs RDF knowledge graphs from heterogeneous data sources with the R2RML and RML mapping languages.
    • RDFlib - A Python library for working with RDF, a simple yet powerful language for representing information.
    • Metaphacts ETL Pipeline - Provides a means to convert structured data to RDF, perform post-processing steps, and ingest it into a graph database.
  • Constraints Update:
    • SHACL - A language for validating RDF graphs against a set of conditions.
    • RDFlib - A Python library for working with RDF, a simple yet powerful language for representing information.
    • Metaphacts ETL Pipeline - Provides a means to convert structured data to RDF, perform post-processing steps, and ingest it into a graph database.
  • Validation:
    • GRLC - Builds a web API from SPARQL queries hosted on GitHub to accessing triple store data.
    • SPARQL - A language for querying RDF data.
    • SHACL - A language for validating RDF graphs against a set of conditions.
    • pySHACL - A Python validator for SHACL.
    • Metaphacts ETL Pipeline - Provides a means to convert structured data to RDF, perform post-processing steps, and ingest it into a graph database.

Complete LOT4KG methodology detailed view

Ontology implementation workflow.
Methodology detailed workflow

The LOT4KG methodology figures are available for reuse in the LOT Github repository under the Creative Commons Attribution Share Alike license.

Video Tutorial

Watch a video tutorial of the LOT4KG methodology:

Survey

The LOT4KG methodology has been validated through a survey. You can explore the survey's raw anonymised data, scripts, and aditional findings:


The following table displays the validation results, it has been ordered by the number of checkmarks per row, from highest to lowest. The top row shows the percentage of coverage for the particular sub-activity in the form of a pie chart. Each of the rows is a response for the survey question (R), for the anonymised survey responses AR was used. The table shows the coverage of each of the LOT4KG subactivities with a checkmark. The survey also collected the list of tools and resources for each sub-activity.


KG Requirements Implementation Publication Maintenance Change concept. Change evaluation Change encoding Onto. evaluation Data preparation Mapping dev. Data transf. Constraints dev. Data validation Documentation Publication Bug Detection Detect delta Assess impact Mapping update Data transf. Constraints update Data validation
Activity Coverage
93.75%
93.75%
65.63%
71.88%
77.42%
54.84%
61.29%
48.39%
87.10%
93.55%
74.19%
58.06%
83.87%
77.42%
83.87%
64.52%
16.13%
35.48%
83.87%
67.74%
45.16%
54.84%
R6
R16European Union Agency for Railways (ERA)
R9Building Information aGGregation (BIGG))
R91Dimensions
AR97
R8EDIFACT Ontology
R93
R75CIDOC-CRM
R3OfficeGraph
R77Odeorupa
R73
R11Mlsea
R60Simulation Ontology
AR38
AR84
R61Knowledge Hub Ontology
R31Marine Regions
R5Scihyp
R21Cybermapping
R17
R15Ehri Portal
R12
AR22
R14Polifonia meetups
R35Issa agritrop dataset
R20SWeMLS-KG
R27
R18
R95Katy-kg
R10Deliberation knowledge graph
R13

How to cite

If you are using the content of LOT4KG methodology you should cite: Pernisch R., Chaves-Fraga D., Stork L., Conde-Herreros D., Poveda-Villalón, M., LOT4KG: A Joint Methodology for the Ontology and Knowledge Graph Lifecycle. Under Review (ISWC2025).