Data Models
1. Scope and objectives
Data models ensure that data is accurately and consistently interpreted when exchanged within a data space. The data model consists of metadata that provides information about semantics, helping to interpret the actual exchanged data. Such models are relevant in a data space where a data provider offers data products and data consumers want to utilise and exchange these data products. When using the same data model, semantic interoperability becomes possible and data can be exchanged among the data space participants.
Data spaces should consider shared data models, or ‘semantic standards’. These models serve as dictionaries that enable data providers and data consumers to “speak the same language” when exchanging data. Considering that participants have diverse perspectives and requirements about the meaning of data, it is essential to develop, reuse, and govern these shared data models within the data space. This is a continuous balancing act between the need for strict uniformity to keep data consistent and easy to understand, and the need to accommodate the fact that different organizations have different requirements for their data. In a data space, the governance framework should include these agreements to ensure wide consensus regarding the data models used in the data space.
Data models are located in a common repository, known as a Vocabulary Service. The data product should refer to a data model, which is in Vocabulary service. This allows both the data provider and data consumer to refer to the repository during data exchange, ensuring accurate exchange and interpretation of the data. However, this rises a challenge for federated data spaces, as each data space develops their unique data models. The first step in (re)using data models from other data space is to find and access them. Therefore, data spaces should be able to exchange their data models in a standardized manner to establish agreements on their usage.
A data model is a structured representation of data elements and relationships used to facilitate semantic interoperability within and across domains. However, there are different abstraction levels for data models. This building block distinguishes between the various types of data models and the meta-standards in which they can be expressed while also providing examples. In addition to that this building block describes how these data models can be implemented, reused, governed, by whom, and what tools can assist in this process.
2. Capabilities
Data spaces should ensure that all data offerings are linked to a data model that describes their structure and semantics. They should utilise existing semantic standards and if these models do not suffice, explore the possibility to extend an existing model. Only if this is not viable, a data space governance authority (DSGA) should create a new model reusing existing concepts from existing standards where possible. In all cases, agreements must be established in the governance framework to ensure proper governance of the data models. Tools and processes are needed for the governance and management of the data models and their utilisation.
The following capabilities are needed in data spaces:
Data model development: Reuse or develop data models to ensure uniformity and interoperability.
Data model governance: Governance and management aspect of data models, including tools and processes to maintain data models and to ensure wide consensus regarding the data model used in the data space.
Data model integration: Each data offering offered must be linked to a data model that describes its structure and semantics.
Data model abstraction: Transition from semantic to technical interoperability, enabling the representation of the exchanged data conform the data models (see also: Data Exchange building block).
Data models across data spaces: Standardized discovery of data models across data spaces, support multiple data spaces to become semantic interoperable.
3. Specifications
This section describes how the capabilities of this building block can be implemented within a data space. To start with a description of the following concepts:
Data models: a structured representation of data elements and relationships used to facilitate semantic interoperability within and across domains, including vocabularies, ontologies, application profiles, schema specifications, that can be used to annotate and describe data sets and data services.
Data model management process: The management process for creating, managing, and updating data models within data spaces. This is performed by a data model provider, an entity responsible for providing (creating, publishing and maintaining) the data model. This role is often fulfilled collectively by business communities and delegated to a Standards Development Organisation (SDO), but can also be undertaken by a data space governance authority itself
Vocabulary service: A technical component providing facilities for publishing, editing, browsing and maintaining data models and related documentation. This service may also support the transition from more conceptual data models to technical data models that can be used in actual data exchange.
3.1 Building a data model for the data space
Data space participants must semantically express their data (product) offerings by using a standardised data model, agreed within the data space. Based on the governance of the data space, data model providers supply these data models. The data models are published and stored in a vocabulary service that enables data models to be discovered throughout a data space. This ensures that each data product offered can be linked to a data model that describes its structure and semantics. As mentioned in the building block about Data, Service and Offerings Descriptions, this can be achieved by using the DCAT standard. Each dataset or data service in DCAT can reference a data model in the vocabulary service using the property dcterms:conformsTo.
However, this is a challenge for federated data spaces, as data models need to be discovered across data spaces. By expressing data models in DCAT, data models can be discoverable across data spaces. Since data models are also just a piece of data, they can also be seen as data products, which allows them to be cataloged and exchanged using the Data Space Protocol, making them findable and accessible across data spaces
There are multiple abstraction layers when it comes to data models. While semantic interoperability focuses on the meaning of concepts, technical interoperability is concerned with a specific syntax. These abstraction levels of data models transition from semantic to technical interoperability, with the latter detailing the precise representation that the exchanged data must adhere to. An example are data schemas that should be used during data exchange, as specified in the data exchange protocol outlined by the Data Exchange building block.
It's important to note that not every data space needs to navigate through this complexity. Often, existing standards and their governance processes as provided by a data model provider can be reused. Additionally, data space intermediaries can facilitate the transition from semantic to technical interoperability.
This building block distinguishes between the following abstraction layer of a data model, where each layer consists of metadata about the shared data:
Vocabulary: Basic concepts and relationships expressed as terms and definitions within or across domains, typically described in a meta-standard such as Simple Knowledge Organization System (SKOS), a common data model for sharing and linking taxonomies, classification schemes, etc.
Ontology: Knowledge within and across domains by modelling information objects and their relationships, often expressed in open metamodel standards like OWL (Web Ontology Language), RDF (Resource Description Framework), or UML (Unified Modelling Language).
Application Profile: An application profile is a data model for applications that fulfill a particular use case. In addition to shared semantics, it also allows additional restrictions to be imposed, such as recording cardinalities or the use of certain code lists. An application profile can serve as documentation for analysts and developers
Data Schema: Data exchange technology specific representation of the application profile, including the syntax, structure, data types, and constraints for the data exchange
The above concepts and relationships come together in the following conceptual model and its corresponding terms and definitions, see Figure 1. In general, all data models depicted have different purposes but they should be aligned between each other so that each representation of the same data model should be coherent.
In addition to the various abstraction levels of data models, it is important to note that each of these data models can also refer to specific datasets, such as reference datasets or code lists. These datasets help establish standards for describing data to be used in a specific field, such as metadata. Examples of reference datasets include code lists and authority tables, such as country codes, which provide a consistent way of describing data about countries.
3.2 Linking the data model to technical exchange
Data schema incorporated in data exchange protocol: In a data exchange protocol (e.g., REST API specification) a data schema defines the structure and format of data exchanged between clients and servers. This schema outlines the properties, types, and constraints of data fields. Incorporating a schema ensures data consistency, facilitates validation, enhances documentation, and ensures effective exchange between clients and servers. Since each abstraction layer should be available through a Vocabulary Service, this data schema can be retrieved from the Vocabulary Service. This marks the transition from semantic interoperable towards technical interoperable for the actual data exchange compliant to the specified data models.
For example, the ‘Order message’ data model, defined within the Smart Connected Supplier Network (SCSN) to exchange order data in a standardised manner, is incorporated into the data exchange protocol implemented by users to exchange data.
In this case, the data exchange protocol within SCSN is a REST API specification that provides information on how an order (compliant with the Order message data model) can be exchanged.
3.3 Governing data models
Data spaces should make use of existing data models and the corresponding management process of defined by the data model provider. Agreements on the use of these models in the data space must be documented in the governance framework. Only if reusing existing models is not viable, a data space should create a new model reusing existing concepts from existing standards where possible. Only then, a data space needs to set up a data model management process itself. This involves setting guidelines for creating and maintaining data models, as well as establishing processes for resolving conflicts or inconsistencies. Two perspectives, supported by a Vocabulary service, should then be taken into account:
Development perspective: Focuses on the design of data models as artifacts. As described in the conceptual model of this building block.
Governance perspective: Focuses on the application and lifecycle management (Figure 2) of the data models.
Data models are living documents that evolve over time, passing through various lifecycle phases from development to termination. Governance of semantic standards is crucial to ensure they are technically sound, reliable, and adaptable to the changing needs of users. This includes decisions on initiating development work, approving changes in new versions, and managing a suitable release scheme. This is specifically for the data model provider, which is some cases is the data space, depending on whether data models are based on existing standards and their governance framework.
4. Interlinkages
Data Exchange: The technology and implementation specific abstraction layer of a data model, i.e. the data schemas, specify how the data is structured, including the structure, data types, and constraints of the exchanged data. These data schemas are used in the data exchange protocols during the data transfer.
Data Exchange & Data Sovereignty and Trust Pillar: The data transfer occurring within the data plane of a connector utilises a data exchange protocol for retrieving or transmitting data. Further details regarding this will be provided in the Data Exchange Building Block.
Data, Services and Offerings Description: data, services and offerings in a data space are semantically expressed in a data model. In practice, data services and offerings are described in the standard DCAT, where DCAT allows to refer to a data model in the vocabulary service using the property dcterms:conformsTo.
Below the conceptual model of the building block data models and data exchange is displayed. Figure 3 illustrates the links with the other building blocks, where the numbers in the picture correspond to the enumeration above.
Figure 3. Relationships between data models building block and other various building blocks or pillars.
5. Co-creation questions
The co-creation method is meant to guide data spaces and data space initiatives through setting up, developing and operating a data space. To effectively implement this building block, co-creation method questions have been identified to express your data product semantically and enhance semantic interoperability between participants within a data space (Figure 4).
How will the data space manage semantics for the defined data products?
Evaluate the scope of your data products (1): Based on your data product, decide which data element and concept are required to be semantically expressed. Clearly identifying the required data elements and concepts helps evaluate the suitability of existing models.
What kind of data models are needed and can these be reused or should they be developed?
Determine whether data models can be reused or should be developed (2a): First evaluate the suitability of existing models. If existing models do not suffice, explore the ability to extend and existing model. Only if this is not viable, create a new model reusing existing concepts from existing standards where possible. The reusing of a data models depends on different factors:
The standardization organizations behind the data models (for example W3C).
Their level of maturity and adoption (the W3C Community is very large).
The integration of the data models with others (some data models already use concepts used in many other data models).
The constraints that a data model imposes (for example specific cardinality constraints).
What kind of meta-standard will be used to express the data models in the data space?
Decide upon the meta-standard for your data model (2b): Data models are expressed in one or more meta-standards. Try to follow some best practices where the data models adhere to open metamodel standards, depending on the domain and application requirements. Best practices of metamodel to set up and/or annotate a domain-specific data model:
RDF Schema and SHACL for RDF (Resource Description Framework).
OWL (Web Ontology Language) also for RDF.
SKOS (Simple Knowledge Organization System).
XML Schema (XSD), Schematron for XML-oriented data models.
CSVW for CSV-oriented tabular data.
XSLT, R2RML, RML, and CSVW for data transformation specification.
Decide upon the reference datasets for your data model (2a): Data models might refer to one or more reference datasets. Reference data means data that are used to characterise or relate to other data, such as codelist about country codes. Try to follow commonly used reference datasets as defined by the European commission: EU controlled vocabularies .
How do we ensure that a data model is used for the actual data exchange?
Decide upon the governance of the data model (3): A data model management process is required for maintaining the data models. This includes engaging key stakeholders who must collaborate throughout this process (e.g., standard development organisations). For existing standards, one could use the existing process set up for this. This role is often fulfilled collectively by business communities and delegated to a Standards Development Organisation (SDO), depending on whether data models are based on existing standards and their governance framework.
Select and acquire tooling support (4): To support the adoption of a data model and the management process, a data spaces requires a tool for publishing, editing, browsing and maintaining vocabularies and related documentation. A vocabulary service, as described in this building block and the DSSC Toolbox, is an example of such a tooling.
The answers to each of the co-creation questions should land in the rulebook of the data space.
6. Implementation
This building block relates to the following services as described in the services for implementing technical building blocks:
Vocabulary service: The storage and publication of all data models are centralised within a vocabulary service. Each Vocabulary Service should provide an API through which data models can be retrieved. It should allow access to the different abstraction levels of a data model and enable the exchange of metadata for these models using the DCAT standard. That latter enables the exchange of data models between data spaces.
Participant Agent - Data Plane: The data plane implements the actual Data Exchange, with APIs and data models specific to a particular domain. This relates both to the Data Exchange building block and the Data Models building block, where the data schemas are integrated into the data exchange protocol within the data plane.
Catalogue service: This service provides an overview of registered data offerings in the data space. As mentioned in this building block, data products are semantically described by a data model. The catalogue service should for each data product, if applicable, refer to the data model of the data sets and services that being offered. This is to be done via the DCAT specification.
Other Components: Data exchange takes place among various components of the data space, where a clearly defined data model is necessary. For instance, data models are not only used for the exchange of data between participants but also for other elements in a data space, such as expressing the contents of usage policies, describing data offerings, or interpreting provenance and traceability data.
7. Future topics
One of the future topics is about semantic interoperability across data spaces, as different data spaces develop their unique semantics and structures. The challenge at hand lies in the necessity for a standard that establishes mutual understanding between different data spaces. As data models are stored in vocabulary services within a data space, multiple data spaces implies multiple vocabulary services and a multitude of data models. The existence of different vocabulary services implementations results in data models not being accessible across different data spaces, and thus not being used in other data spaces. This version specifies that by expressing data models in DCAT, data models can be discoverable across data spaces. This is because data models can also be seen as data products, which allows them to be cataloged and exchanged using the Data Space Protocol, making them findable and accessible across data spaces. In future versions, this building block will emphasize the governance of data models across data spaces. While they are currently findable and accessible, the focus will shift to their usage.
So far, this building block has mainly explained the usual approach to semantic interoperability: the top-down approach, which starts with a standard data model that is somehow imposed on the data provider for adoption. However, this might be challenging because it requires the target user (the data provider in this case) to become familiar with a new standard and possibly new technology, such as semantic technology. In the next version, this building block will explore what happens if data providers are new to semantic technologies and linked data principles, and how the vocabulary service can help improve the semantic annotation of the data one step at a time.
8. Further reading
Centre of Excellence of Data Sharing and Cloud: Paper on Establishing Semantic Interoperability across Data Space. This paper describes how data models can be findable, accessible and usable across data spaces.
IDS RAM-4: Explaining the meaning for data models and their governance process.
Reference datasets: EU reference datasets containing a consistent way to describe data. They are standardized and organized arrangements of words and phrases presented as alphabetical lists of terms or as thesauri and taxonomies with a hierarchical structure of broader and narrower terms.
The ENERSHARE D3.3: European Common Energy Data Space Framework Enabling Data Sharing - Driven Across – and Beyond – Energy Services
9. Glossary
The terms being used in this page and conceptual model are specified below:
Term | Definition |
|---|---|
Data Model | A structured representation of data elements and relationships used to facilitate semantic interoperability within and across domains, encompassing vocabularies, ontologies, application profiles and schema specifications for annotating and describing data sets and services. These abstraction levels may not need to be hierarchical; they can exist independently. |
Data model provider | An entity responsible for creating, publishing, and maintaining data models within data spaces. This entity facilitates the management process of vocabulary creation, management, and updates. |
Vocabulary service | A technical component providing facilities for publishing, editing, browsing and maintaining vocabularies and related documentation.
|
Meta-standard | A standard designed to define or annotate data models within a particular domain or across multiple domains. These meta-standards provide a framework or guidelines for creating and annotating other standards (data models), ensuring consistency, interoperability, and compatibility. |
Vocabulary | A data model that contains basic concepts and relationships expressed as terms and definitions within a domain or across domains, typically described in a meta-standard like SKOS. |
Ontology | A data model that defines knowledge within and across domains by modelling information objects and their relationships, often expressed in open metamodel standards like OWL, RDF, or UML. |
Application Profile | A data model that specifies the usage of information in a particular application or domain, often customised from existing data models (e.g., ontologies) to address specific application needs and domain requirements. |
Reference datasets | Reference data, such as code lists and authority tables, means data that are used to characterise or relate to other data. Such a reference data, defines the permissible values to be used in a specific field for example as metadata. Reference data vocabularies are fundamental building blocks of most information systems. Using common interoperable reference data is essential for achieving interoperability.
|
Data Schema | A data model that defines the structure, data types, and constraints. Such a schema includes the technical details of the data structure for the data exchange, usually expressed in metamodel standards like JSON or XML Schema. |