Data Exchange
- 1 1. Scope and objectives
- 2 2. Capabilities
- 3 3. Specifications
- 3.1 3.1 Data Exchange and the Data Spaces Protocol
- 3.2 3.2 Choosing a transmission protocol
- 3.3 3.3 Functional Specifications
- 3.4 3.4 Technical Specifications
- 3.4.1 3.4.1 Efficient transmission of data (mandatory)
- 3.4.2 3.4.2 Complete list of capabilities to operate with the data asset in the data space (mandatory)
- 3.4.3 3.4.3 Allow querying for the different data with complex needs and different data structures
- 3.4.4 3.4.4 Different kinds of data exchange protocols, including streaming
- 3.4.5 3.4.5 Mechanisms for triggered exchange, such as enabling alerts for updates or modifications
- 3.4.6 3.4.6 Information retrieval in federation scenarios, particularly across different data spaces
- 4 4. Interlinkages
- 5 5. Co-creation questions
- 6 6. Implementation
- 7 7. Future topics
- 8 8. Further reading
- 9 9. Glossary
1. Scope and objectives
The scope of this building block is the actual transmission of the data between participants of a data space:
It relates to the other two building blocks in this pillar (Data Models, Provenance & Traceability).
It builds on the capabilities described in the Data Sovereignty & Trust pillar for identification, authentication and authorisation.
Technically, it is about implementing the Data Plane of a Participant agent.
The scope of the building blocks is to establish agreed-upon mechanisms between participants for the actual transmission of data. To achieve this, data spaces must make a strategic choice.
Note that ‘transmission’ can encompass many different types of data exchange (data sharing, messaging, streaming, algorithm-to-data, etc.).
The data exchange process involves a Transfer Process (TP), which progresses through a series of states. These are basic states, REQUESTED, STARTED, COMPLETED, SUSPENDED, TERMINATED as defined by the participants as a minimum, but their final number and complexity may vary depending on the implementation. The process has to ensure that data exchange is managed systematically, with clear transitions between states based on messages exchanged between the provider and consumer.
2. Capabilities
Clear guidelines are essential for data exchange protocols to ensure accurate communication and to overcome technical interoperability barriers. Initially this exchange has to be granted inside the data space but also some principles should be stated for the operations between data spaces.
‘Application Program Interfaces’ (APIs) or other connection mechanisms need to be defined and published in the catalog of data assets according to the description in the Building block of Publication and discovery.
It should reference the semantic meaning of the data exchanged so the participant can completely understand the meaning of the data; part of the information could be retrieved from the Publication and Discovery but other part could be transmitted with the data as described in the building block of Data Models.
Data exchange can occur through push or pull transfers. In a push transfer, the provider initiates the data transmission to the consumer, while in a pull transfer, the consumer retrieves data from the provider. Additionally, data can be classified as finite (e.g., a fixed dataset) or non-finite (e.g., continuous streams), which influences the duration and management of the transfer process.
A choice for the associated transmission method needs to be made (e.g. SOAP, Event Streams (like MQTT), Apache AVRO, Thrift, Protocol Buffers, etc). These transmission methods are generic and usually based on industry standards. Depending on the nature transmission, a suitable method can be selected (e.g. general-purpose solutions, especially when federating data spaces, can include general-purpose API like NGSI-LD for finite sources and LDES for non-finite sources)
A preferred group of transmission methods should be identified to enable a quick connection between adhered data spaces. In order to store that information it is suggested the extension of DCAT standard for dataset with an attribute accessMethods to describe what are the mechanisms, API, etc to access the data.
3. Specifications
3.1 Data Exchange and the Data Spaces Protocol
The implementation of Data Exchange relates to the Data Plane and is linked to the Control Plane. In this section we provide a detailed description of these concepts.
Unlike other elements of the Dataspace Protocol (Control Plane, e.g. Catalog Protocol, Contract Negotiation Protocol), which are very generic by nature, the Data Plane could be domain-specific, having its own and specific data exchange protocol (e.g. described in a particular OpenAPI, or using general-purpose data exchange protocols (e.g. NGSI-LD)). Initially in DCAT 3.0 thi sis expressed by the attribute dataService, which points to an object DataService.
A data space should consider which kind of Data Exchange protocol is needed on the Data Plane:
The protocol should suit the purpose of data sharing or the purpose of allowing data access.
It should be linked to the relevant data models.
It should be linked to the control plane (for establishing identification, access policies, etc.).
The data space has to be capable of maintaining a precise inventory of the technical specifications for the different transfer/access protocols used and their different versions. It has to be maintained in the Building block of Publication and discovery.
3.2 Choosing a transmission protocol
Choosing the transmission protocol can be found in the transmission protocol section of the data space protocol. Examples of transmission protocols can be found in the further reading section. The transmission protocol is part of the data exchange protocol. (i.e. https REST is used in an API like NGSI-LD).
Additional considerations will be defined when the connection is between different data spaces, by direct connection between them or by means of an intermediary entity. In order to allow the transmission of data between data spaces a list of available data exchange protocols should be shared in federation of data spaces.
3.3 Functional Specifications
This building block provides methods that allow data space participant A to exchange data with data space participant B in the same data space or in federated data spaces. Such exchanges may occur in any of the following scenarios:
A gets access to data owned by B.
B gets access to data owned by A.
Both participants get mutual access to their data.
To exchange data, the data space participants must use a common protocol, which includes the syntax and the sequence of the interaction. This protocol is used to define an API. While APIs may vary across domains, standardisation is needed to enable interoperability. Beyond the typical capabilities of these protocols, advanced features such as querying, creation, updating and deletion of data, specialized functions like geoquerying and historical data retrieval may also be necessary in the scope of that protocol.
A consensus protocol, i.e a protocol accepted by most participants, proposing a data exchange protocol that is agnostic to the data structure is crucial. Depending on the scope of the respective data space and coverage of concrete use cases, the data exchange protocol must support the following functionalities:
Efficient transmission of data
Complete list of capabilities to operate with the data asset to the granted users available at the value creation services or publicly available
These two above are mandatory, while the next ones are recommended. The following list is not exhaustive but a general approach, and specific needs could be necessary in some domains.
Querying capabilities for diverse data with complex needs and varying data structures (e.g. geoquerying).
Data streaming endpoints to receive real-time continuous streams of data.
Data retrieval endpoints to request large datasets such as historical data stored in a database
Mechanisms for triggered exchange, such as enabling alerts for updates or modifications of the data sources
Information retrieval in federation scenarios, particularly across different data spaces
3.4 Technical Specifications
3.4.1 Efficient transmission of data (mandatory)
Data exchange starts after an event has taken place or upon a user’s request. The corresponding messages for setting up the exchange has to be sent in the control plane. the exchange has to start once the identification and authorization (control plane) have enabled the transmission. Furthermore, a consistent quality-of-service needs to be maintained, e.g. when a connection is lost.
3.4.2 Complete list of capabilities to operate with the data asset in the data space (mandatory)
The governance of the data space will approve the accepted data exchange protocols. Once approved, the protocol description for the data exchange has to be published in the value creation services and made available to the participants in the data space and to other federated data spaces. The versions of the data exchange protocols has to be controlled and an inventory of such versions has to be available to allow data exchange according to old versions of a protocol if a data asset requires it.
3.4.3 Allow querying for the different data with complex needs and different data structures
In order to retrieve that data, a query language or equivalent resource is recommended (i.e. NGSI-LD querying). The querying has to be capable of building complex requests independently of the data structure. It can be necessary to integrate the querying-API or equivalent mechanism with the control plane, e.g. when different results should be provided depending on the user or policy. Additional capabilities like geoquerying and querying for time periods are also recommended.
3.4.4 Different kinds of data exchange protocols, including streaming
The selection of data exchange protocols should be based on the specific usage scenario. For instance, streaming data requires different protocols compared to querying structured records in a database or retrieving large datasets.
For scenarios involving the retrieval of extensive datasets, such as historical data, dedicated data retrieval endpoints should be available. The transmission of large volumes of data may necessitate specialized mechanisms for data integrity verification, error handling, and efficient transport to ensure reliability and consistency.
3.4.5 Mechanisms for triggered exchange, such as enabling alerts for updates or modifications
The data exchange protocol would need to implement mechanisms to allow setting alerts whenever the data sources are modified to enable participants to use the right data from the sources and for those contracts with the right to access during a period to be capable to provide the right data by pushing data without the need of user’s request.
3.4.6 Information retrieval in federation scenarios, particularly across different data spaces
In case of federation scenarios, requiring connection between data spaces, the data spaces' specific protocols for data exchange needs to be available. The governance of the data spaces will define what are the accepted protocols for the data exchange between the federated data spaces and how to make it available to the different participants. There will be a list of accepted data exchange protocols and their versions available at the start of the technical federation of data space independently if this is created by a direct connection or by means of an intermediary entity. Eventually a specific data model for the data exchange protocols could be created to speed up the negotiation of the data transmission between data spaces.
4. Interlinkages
The data exchange process involves both the control plane and the data plane. The control plane manages the coordination and state transitions of the transfer process, while the data plane handles the actual transmission of data. This separation ensures that data exchange is both secure and efficient, with clear roles for each plane.
Data Models: This building block is strongly related to the Data Models building block. It forms the foundation for data models in technical formats, such as the Application Program Interface (API) payload, for example, JavaScript Object Notation (JSON).
Access & usage policies and enforcement: The data exchange building block has a strong relationship with the Access & usage policies and enforcement building block. Every transaction in the data exchange building block should be enabled and accounted for by Access & usage policies and enforcement building block.
Trust Framework: The data exchange building block is closely intertwined with the Trust Framework building block. The secure trust protocols within the trust framework building block should ensure every transaction in the data exchange building block.
Data, Services, and Offerings Descriptions. This building block relates to the data, services and offerings descriptions because it has to retrieve from its catalog the right version of the data asset to be transferred and the base location for the data asset.
Value creation services. The specification of the data exchange protocol can be located in this building block with all the version used across the data assets.
5. Co-creation questions
For Data Exchange the following co-creation question applies:
Which APIs and protocols are used in the data space?
The Data Space Governance Authority should identify which generic protocols and which domain-specific APIs apply for participants of the data space. Sometimes multiple options can be provided, but in many cases a limited set of protocols/APIs is selected - depending on the use cases of the respective data space.
To what extent the transfer protocols ensure the complete transmission of data ?
The data exchange protocols has to ensure the complete and accurate transmission of data, because, among many other elements, there could be financial implications. This has to be aligned with the building block of Provenance & Traceability.
6. Implementation
There are several elements to be considered in this building block:
Control Plane: It gathers all permissions from other building blocks before granting the access/transfer, including the identification of the users are trusted and that the contracting arrangements are completed.
Data exchange component: Hides the complexities of the exchange protocol to the rest of components and provides (as a minimum) the messages defined (Transfer Request Message, Transfer Start Message, Transfer Suspension Message, Transfer Completion Message, Transfer Termination Message). These messages change the status of the transmission as explained in section 1.
Data Plane: Exchange of data in the data plane is implemented in the block data exchange.
The implementation of data exchange involves a state machine that manages the transitions between different states of the transfer process (REQUESTED, STARTED, COMPLETED, SUSPENDED, and TERMINATED as a minimum, depending on the data exchange protocol). This state machine ensures that the transfer process is controlled and that all participants are aware of the current state of the data exchange.
In this building block once the validations of the control plane have been met, an agreement on the transfer/access mechanism should be reached. It belongs to the data governance of the data space to decide if there is a list of accepted mechanisms globally for all assets in the data space, or, on the contrary, if every asset is offered with its own transfer/access mechanism.
7. Future topics
Experts in the DSSC have identified the following topics for future versions of the blueprint:
Address specifically the data exchange between data spaces, how to manage the data exchange when the participant belong to an external data space and how the enforced permissions are transmitted to the data exchange components, the identity of the participants in the external data space is trusted and how the economic transactions between data spaces are guaranteed to make the data exchange possible.
User experience of the data exchange when it has human intervention.
In the future this building block has to address the data exchange between data spaces, and the principles of trust described here will have to apply as well for this configuration. In example, but not exhaustively it will require how to trust in the identities coming from a different data space, how to agree in a common data exchange protocol or how to apply the access permissions defined in a different data space (roles, actions, etc).
8. Further reading
Open API Specifications (OAS) for Representational State Transfer (RESTful), Hypertext Transfer Protocol (HTTP) APIs.
The NGSI-LD API standard provides a simple yet powerful RESTful API for accessing context/digital twin data. NGSI-LD is an evolution of Next Generation Service Interfaces version 2 (NGSI-v2) that incorporates support for linked data and other powerful features. The latest specifications are published under the European Telecommunications Standards Institute Context Information Management Industry Specifications Group, ETSI CIM ISG.
Linked Data Event Streams (LDES) by Semantic Interoperability Community Europe, SEMIC.
A proposal for the data model in a data interchanges of assets in data spaces federation.
9. Glossary
Term | Definition |
|---|---|
Consensus Protocol | The data exchange protocol that is globally accepted in a domain Explanatory Text: In some domains the data exchange protocols are ‘de facto’ standards (e.g. NGSI for smart cities). |
Federated data spaces | A data space that enables seamless data transactions between the participants of multiple data spaces based on agreed common rules, typically set in a governance framework. Explanatory Text:
|
Geoquerying | Query involving geographical boundaries Explanatory Text: Data querying frequently needs to be restricted to a geographical area. |
Transfer Process (TP) | A process that manages the lifecycle of data exchange between a provider and a consumer, involving, in example, states, as a minimum, REQUESTED, STARTED, COMPLETED, SUSPENDED, and TERMINATED. |
Pull Transfer | A data transfer initiated by the consumer, where data is retrieved from the provider. |
Push Transfer | A data transfer initiated by the provider, where data is sent to the consumer. |
Non-Finite Data | Data that is defined by an infinite set or has no specified end, such as continuous streams. |
Finite Data | Data that is defined by a finite set, such as a fixed dataset. |