Skip to content

Querying openEHR data with path based queries

Pablo Pazos Gutiérrez edited this page Dec 3, 2018 · 1 revision

Introduction

The openEHR specifications for most of it's life didn't include a specification for querying data. For some year the openEHR community only had a proposal paper for an EHR Query Language (EQL, 2007), focused mostly on the syntactic aspects of queries over openEHR data, leaving the complexities about query evaluation, query execution and query result structure out of the scope. Then EQL changed to Archetype Query Language (AQL) and the EQL paper migrated to an AQL wiki page which helped to define more details about the syntax and shape the current AQL specifications until 2016. Still, until this point, most of the specification work was around the syntax, bu also helped to discuss implementation details involving interpretation and execution. In 2017 we had the first version of the QUERY specification, which includes the AQL specification, now formally part of the openEHR specifications. The latest working version of the AQL specifications can be found here. This version has more description about how each element of the query syntax should be evaluated, and also provides some query execution cues, however this part depends on the specific underlying database technology. Is not the same implementing the query execution over a relational database, than a document or object database. Also some operators and conditions in the AQL syntax are very difficult to implement, so most vendor implementations only implement a subset of AQL. Still, the query result structures are missing from the spec, which makes vendor implementations slightly incompatible with each other, since data transformations should occur to correctly process executing queries on different brands of openEHR repositories. Lastly, some operators and filters might be implemented slightly different by different vendors, since, as mentioned, those might depend in the underlying database technology.

A complete query specification

A full openEHR query specification should be composed of:

  1. query syntax (meaning + formal syntax specification, like a processable grammar)
  2. query evaluation algorithm (validation and parsing queries into an in-memory representation)
  3. query execution (mapping in-memory query representation into database transactions, providing hints on how to map to different database technologies including relational models, document models (JSON, XML), object models and graph models)
  4. query result specification (data structure model for query results and different representations of that structure like JSON, XML, CSV)

Alternative query methods

Even though AQL is the only querying mechanism that is part of the openEHR specs, it's not a required component. There exist many openEHR implementations that don't support AQL. Some might support it in the future, but there are still too many levels of freedom to make a meaningful implementation, and there is a high cost attached to it's implementation. One thing is to implement the syntax, evaluation, execution and result structures, but there are tools needed to actually create, validate, test, deploy and execute the queries. It would be great to have an open implementation of AQL, but this is not possible since the implementation will be tightly coupled with the repository solution. So if you want AQL, you need to choose a repository that has support for it.

But how other openEHR implementations manage queries?

Well, some implementation just have custom queries, written in the query language supported by the database technology (SQL, XQuery, SPARQL, ...) and map the database result set to an openEHR data model. In the EHRServer we choose to implement a query mechanism that is easy to use, has a relatively simple implementation, and returns a specific data structure as a result. On this paper we'll explain how the path-based query works, it's implementation details, query result structures and limitations considering the underlying database technology.

The path-based openEHR query language

Scope

The path-based queries where not designed for general purpose queries, where designed considering the most frequent queries for openEHR data, and considering Clinical Decision Support requirements.

We consider that general purpose data can be retrieved with path-based queries in combination for little code to process, sort and further filter the results.

On this context 'general purpose' means to get any data structure at any level of the openEHR information model, or any combination of data structures. And with 'not designer for general purpose' we mean, for instance, if you need to query for an OBSERVATION structure and want results of type OBSERVATION, with path-based queries you can get the complete COMPOSITION, using the criteria over data contained in the OBSERVATION, and with a little code you can extract the OBSERVATIONs from the COMPOSITIONs in the query result. We consider that retrieving just part of a clinical document, is explicitly removing the context of that information, so we chose to return full documents. On the other hand, when time series are needed, path-based queries just return combinations of openEHR data types (DV_TEXT, DV_QUANTITY, DV_BOOLEAN, ...). This use case is designed to show data as charts or tables on user interfaces and reports. Also can be used to analysis, for instance to check historical lab test result values out of range.

TBD....