Skip to content

Design: Data Brokering

Jonathan Yu edited this page Feb 28, 2017 · 2 revisions

The purpose of data brokering is to facilitate access to data via the broker rather than accessing individual sources themselves. The idea of a broker is that client applications and users can ask questions or submit queries to it, and it will return answers based on information it has aggregated from multiple sources.

In the context of this work, the aim was to faciliate cross domain and inter-organisation access to data. Initially, this was investigated for the eReefs project. The work continues under the OzNome initiative and in particular the OzNome for Land and Water project, where data brokering is being implemented to facilitate access to a range of earth and environmental science data. The general approach however is broadly applicable to other use cases.

The DPN ontology supports a lightweight approach to mediation.

Data providers register the existence of their 'node' and related services and datasets. These are described in a minimal fashion as RDF instance documents of the DPN Ontology.

Data consumers can then query these via APIs setup via Data Broker implementations, which aggregate DPN descriptions and provide query mechanisms to their content. In some cases, Data Broker implementations may perform value added harvest to aggregate additional information from data sources and web services.

Other approaches that use the Brokering approach:

  • GEOSS
  • BCube

Of course, the extent of what is aggregated by the data broker will also affect its ability to offer query services, as well as the level of maintenance and cost placed on the data broker on behalf of the data community.

This work contrasts with the warehouse approach and the federated approaches.

Warehouse approach: Everything (including the data) is aggregated into a data warehouse, which then provides query and access. The cost of update and maintenance of data currency is large in this case, and the owner of the warehouse bears this cost on behalf of the data providers and data consumers.

Federated approach: Each node in the system conforms to a well-described, detailed and common set of standards for all layers - from schema to semantic to business-rules content. The federated approach does not need mediation or data brokering as each node is well-described and easily queriable if the service or data sources location is known (catalogues would solve that). The cost is placed on the data providers themselves to agree on the set of standards and their APIs for query. A universal convention could be achieved, but has its challenges to establish.