Grasp is a bridge software that provides a GraphQL endpoint wrapping SPARQL endpoints.
Grasp works as follows:
- Grasp receives a GraphQL query from a client.
- Grasp translates it into a SPARQL query (or queries).
- Grasp sends the query to a SPARQL endpoint (or endpoints).
- The SPARQL endpoints return the results to Grasp.
- Grasp reforms the results to fit the given GraphQL query.
- Grasp sends the results back to the client.
We need to define a GraphQL schema with some Grasp specific notations, which are carefully designed to keep full-compatibility with the GraphQL specification. More specifically, we need to define a SPARQL endpoint URL and a SPARQL query template per a concept, or a type in GraphQL terms. We also use GraphQL decorators for metadata (described later).
Let's look at a simple example.
- Node.js 14 or later
$ git clone https://github.com/dbcls/grasp.git
$ cd grasp
$ npm install
$ RESOURCES_DIR=./examples npm run watch
This loads the example resource definitions from ./example
.
Visit http://localhost:4000. You will see GraphQL Playground.
Note: You can change the port to listen on with PORT
environment variable. Other configurations are described at Configuration section.
Write a GraphQL query below in the query editor (left pane):
query {
dataset(iri: "http://purl.jp/bio/03/dbcatalog/nbdc00012") {
iri
title_en
references {
iri
title
}
}
}
Then, press Ctrl+Enter. The query will be issued and results will be shown (on right pane).
Now we've queried Integbio Database Catalog/RDF with GraphQL. Let's see how it works.
$ docker run --rm -it -p 4000:4000 -v /full/path/to/resources:/app/resources ghcr.io/dbcls/grasp:<tag>
And access localhost:4000
. See available image tags at dbcls/grasp.
The GraphQL query was translated into SPARQL queries and sent to a SPARQL endpoint, then the SPARQL results were returned to Grasp, finally the results were reformed into the GraphQL result.
Grasp does those translation according to a GraphQL schema (type definition), SPARQL Endpoint URL and SPARQL query, which a Grasp admin (who sets up Grasp) provides. We refer to this as resource in Grasp. Let us dig into the definition.
You will see the resource definition at examples/dataset.graphql.
SPARQL Endpoint URL and SPARQL query are written in the GraphQL comment of the type in a special form. SPARQL Endpoint is specified after the --- endpoint ---
line. SPARQL query is placed after the --- sparql ---
line.
The query returns a RDF graph by the CONSTRUCT
query form. The graph has triples which consist of the IRI identifying the object, the predicate corresponding to the field name of the object, and its value.
See the first part of the SPARQL query:
PREFIX : <https://github.com/dbcls/grasp/ns/>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX db: <http://purl.jp/bio/03/dbcatalog/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
CONSTRUCT {
?iri :iri ?iri .
?iri :title_ja ?title_ja .
?iri :title_en ?title_en .
# ...
} WHERE {
# ...
{{#if iri}}VALUES ?iri { {{join " " (as-iriref iri)}} }{{/if}}
}
Here, we can see that the Dataset
object has fields iri
, title_ja
and title_en
, whereiri
works as an identifier.
We use predicates with the special namespace (https://github.com/dbcls/grasp/ns/
) in order to specify the field names.
The last part,
{{#if iri}}VALUES ?iri { {{join " " (as-iriref iri)}} }{{/if}}
should look complicated. Let us explain.
In the first place, the reason why this is needed is that Grasp requires this SPARQL query to return certain triples. More specifically, whose subject is any of iri
(possibly iri
contains multiple values).
The SPARQL query is actually written in Handlebars template. This part can be roughly interpreted as "If iri
is given, render the VALUE-clause to select bindings by the iri
(possibly contains multiple IRIs): VALUES ?iri {<http://example.com/...> <http://example.com/...>}
".
if
is a built-in helper of Handlebars. The argument of if
helper, in this case iri
, is falsy (that is, not passed to the query), it isn't rendered.
join
is a helper defined by Grasp that concatenates the elements of the second argument using the first argument as the delimiter.
as-iriref
is a helper that wraps the elements of the second parameter with <
and >
.
Taken together, this part consequently selects triples by iri
, if iri
given. For more about the use of Grasp-defined helpers, see the later section.
After the comment block, we have Dataset
GraphQL object type as follows. This corresponds to the above-mentioned SPARQL query.
type Dataset {
iri: String!
title_ja: String!
title_en: String
Note: !
means that the field is non-nullable. See https://graphql.org/learn/schema/#lists-and-non-null
for detail.
Now we've defined the Dataset
object type. In addition, we need to define a field to query type in order to fetch a Dataset
with a GraphQL query. We're showing the corresponding part of examples/index.graphql:
type Query {
# ...
dataset(iri: String!): Dataset
# ...
}
Here, we've defined the dataset
field to fetch a Dataset
.
SPARQL results may have multiple values for one field. In the previous example, as for iri
, title_ja
and title_en
, Grasp returns only one value arbitrarily picked from found values.
If you want to have all values in GraphQL results, you need to use a list type in GraphQL schema:
type SequenceStatisticsReport {
# ...
contributor: [String!]!
# ...
}
With this definition, you can all values for contributor
field in a list of String
s.
You might notice that we have references
in a Dataset
object in the GraphQL result:
{
"data": {
"dataset": {
"iri": "http://purl.jp/bio/03/dbcatalog/nbdc00012",
"title_en": "Atlas of Genetics and Cytogenetics in Oncology and Haematology",
"references": [
{
"iri": "http://rdf.ncbi.nlm.nih.gov/pubmed/12520000",
"title": "Atlas of Genetics and Cytogenetics in Oncology and Haematology, year 2003."
},
{
"iri": "http://rdf.ncbi.nlm.nih.gov/pubmed/11125120",
"title": "Atlas of Genetics and Cytogenetics in Oncology and Haematology, updated."
},
...
}
}
}
}
In examples/dataset.graphql, references
field is defined as follows:
type Dataset {
# ...
references: [Pubmed!]!
# ...
}
In this case, Grasp issues two SPARQL queries to complete a GraphQL query to fetch Dataset
including all its references
field. The first is to fetch a Dataset
, with references
having their IRIs. The second is to fetch references
using the IRIs. The second query is processed according to your Pubmed
resource definition (examples/pubmed.graphql). Grasp combines these SPARQL results from these queries into the final GraphQL response.
We cannot handle relations with a blank node in the previously mentioned way, as the blank node can't be pointed with an IRI. In order to handle such relations, we introduce embedded resources.
Consider the following case:
In examples/pubmed.graphql, we have the following definition of Publisher
:
type Dataset {
# ...
publisher: Publisher
# ...
}
The Publisher
resource is defined as follows:
type Publisher @embedded {
name_ja: String
name_en: String
page: String
}
Note that Publisher
doesn't have a special comment containing an endpoint definition and SPARQL query, and that we marked Publisher
as @embedded
. When resolving resources with @embedded
, Grasp doesn't fetch the resource with a separate query. Instead, fills the fields from the query of its parent resource.
The following query fetches Dataset
. In this query, values of publisher
are fetched all together:
CONSTRUCT {
?iri :iri ?iri .
?iri :title_ja ?title_ja .
# ...
?iri :publisher ?publisher .
?publisher :name_ja ?publisher_name_ja .
?publisher :name_en ?publisher_name_en .
?publisher :page ?publisher_page .
# ...
}
# ...
WHERE
{
# ...
OPTIONAL { ?iri dcterms:publisher ?publisher . }
OPTIONAL { ?publisher rdfs:label ?publisher_name_ja . FILTER (lang(?publisher_name_ja) = "ja") }
OPTIONAL { ?publisher rdfs:label ?publisher_name_en . FILTER (lang(?publisher_name_en) = "en") }
OPTIONAL { ?publisher foaf:page ?publisher_page . }
# ...
}
This query returns triples representing a Dataset
and representing the Publisher
of the Dataset
at the same time. Grasp distinguish them by their subjects (of a graph generated by the CONSTRUCT
query form) and build GraphQL objects.
Note that we need to return graph containing triples whose 1) subject points the embedded resource and 2) predicate reflects its field name. ?publisher
is bound to the blank node representing the Publisher
.
You can add your own definitions in the directory specified with RESOURCES_DIR
(default is ./resources
).
The resource definition files must start with [0-9a-zA-Z]
and end with .graphql
. The other files in the directory are ignored.
You need to restart Grasp to reload the definitions. You can use npm run watch
to restart the server automatically.
Consider that you want to query a Dataset
with an id
(like dcterms:identifier
). Add the following field definition into the Query
:
type Query {
# ...
datasetById(id: String): Dataset
}
If we issue the following GraphQL query:
query {
datasetById(id: "NBDC00012") {
...
}
}
The SPARQL template is written in Handlebars.
In this case, we can obtain "NBDC00012"
with {{id}}
notation in the SPARQL query.
You can also use Handlebars' built-in helpers such as if
and each
.
See https://handlebarsjs.com/guide/ for details.
Consider you are defining a GraphQL query, which takes multiple values as a parameter. The field in Query should be like below:
type Query {
datasetsByIds(ids: [String!]): [Dataset!]!
}
Here, you want to issue a SPARQL query like:
VALUES ?id { "NBDC00012" "NBDC00013" }
In this case, instead of doing simple interpolation (as previously mentioned),
you can also use join
and as-strings
helpers:
Here, as-string
wraps elements of the given array with "
(double-quotations).
join
combines them using the first argument as the delimiter.
Consider another example of generating an IN
clause from a parameter like this:
FILTER (?iri IN (<http://...>, <http://...>))
You can write the template using helpers as follows:
Note that we've specified ,
as the delimiter for join
.
as-iriref
works almost same as as-string
except wrapping the elements with <
and >
.
Grasp can be configured with the following environment variables.
(default: 4000
)
Port to listen on.
(default: /
)
If you want to run Grasp on other than /
(say, /foo
), configure ROOT_PATH
to point the path.
(default: Infinity)
Grasp issues queries in batches to reduce number of queries. This may result in too large query to be processed by some SPARQL endpoints. You can use MAX_BATCH_SIZE
in order to avoid this problem by restricting the number of items to fetch.
(default: resources
)
Load resources from the specified directory.