Massive graph-structured data collections are ubiquitous in contemporary data management scenarios such as social networks, Semantic Web, and data-driven exploration in life science such as chemical compound databases. Efficient implementations of graph database engines are the key ingredients to be able to extract and analyse data/patterns in that particular form. However, the lack of structural schema compared to relational databases makes it less straightorward to come up with good excution pipelines.
To address this problem, we propose a simple but complete graph data base engine, Quicksilver.
In this project we deal with data encoded in the RDF format :
Source s |
Edge Label l |
Target t |
---|---|---|
John | LIKES | Java |
Jennifer | IS_FRIEND_WITH | John |
Jennifer | WORKS_FOR | Neo4j |
Jennifer | LIKES | Graphs |
Diana | LIKES | Graphs |
John | IS_FRIEND_WITH | Sally |
Jennifer | IS_FRIEND_WITH | Melissa |
... | ... | ... |
This engine is simple in the sense that it is only able to deal with integer data types and supports exclusively regular path queries. A query is always of the form
(s,expr,t)
where s
and t
belong to the set of nodes in the graph plus the special symbol *
representing an unbounded source or target. We require that not both source and target are unbounded in a query.
The regular language expr
has the following rules :
l>
: Simple path. If it exists an edge with labell
that does from the source vertexts
to the target vertext
then(s,t)
is in the language defined byl>
l<
: Reversed path. Same as before but withs
andt
reversedexpr1/expr2
: Concatenation. There exists a path linking expression 1 and 2, is other words if the tuple(s1,t1)
is in expression 1 and(s2,t2)
is in expression 2,(s1,t2)
will be in the language of the resulting concatenation expression if and only ift1=s2
l+
: Kleene Star. The transitive closure of labell
, a.k.a. the set of all possible edges(s,t)
such that(s,l>,t)
is a valid RDF tuple.
Using this language, we can, for instance construct the following queries
n | Query | Output |
---|---|---|
1 | (John,LIKES>,Java) |
(John,Java) |
2 | (*,LIKES>,Graphs) |
(Jennifer,Graphs), (Diana,Graphs) |
3 | (*,IS_FRIEND_WITH>LIKES/,Java) |
(Jennifer,Java) |
4 | (Jennifer,IS_FRIEND_WITH+,*) |
(Jennifer,John),(Jennifer,Sally),(Jennifer,Melissa) |
5 | (Jennifer,IS_FRIEND_WITH+/LIKES>,*) |
(Jennifer,Java) |
One might think that the integer restriction hardly constrains the set of possible applications but we should notice that it is always possible to encode strings or other datatypes in terms of integers. In that sense, we can transform the previous RDF table as
Source s |
Edge Label l |
Target t |
---|---|---|
0 | 0 | 3 |
1 | 1 | 1 |
1 | 2 | 4 |
1 | 0 | 5 |
2 | 0 | 5 |
... | ... | ... |
Furthermore, this project is complete since a fully functional, Smart, Quick and Frugal, graph data base engine was built from scratch and contains all the ingredients that one may find a standard database engine:
- Special Index data structures such as a K-path index
- Query optimization through cardinality estimation (path synopses) and query planning
- Special Query Evaluation operators : hash, merge and index joins as well as efficient transitive closure implementations.
Giordano Lucas. Quicksilver: A simple graph database engine. Database Technology, BSc Course. Department of Mathematics and Computer Science, Eindhoven University of Technology. 2019. PDF report
Lucas Giordano | Karina Mankevic | Wessel van Liero |