Skip to content

Commit 8f6b00c

Browse files
committed
docs: Add docs for JSONL
1 parent 40ab664 commit 8f6b00c

File tree

5 files changed

+237
-7
lines changed

5 files changed

+237
-7
lines changed

pages/data-migration.mdx

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ instance. Whether your data is structured in files, relational databases, or
1515
other graph databases, Memgraph provides the flexibility to integrate and
1616
analyze your data efficiently.
1717

18-
Memgraph supports file system imports like Parquet and CSV files, offering efficient and
18+
Memgraph supports file system imports like Parquet, CSV and JSONL files, offering efficient and
1919
structured data ingestion. **However, if you want to migrate directly from
2020
another data source, you can use the [`migrate`
2121
module](/advanced-algorithms/available-algorithms/migrate)** from Memgraph MAGE
@@ -52,6 +52,9 @@ semi-structured data to be efficiently loaded, using the [`json_util`
5252
module](/advanced-algorithms/available-algorithms/json_util) and [`import_util`
5353
module](/advanced-algorithms/available-algorithms/import_util).
5454

55+
Memgraph also support JSONL files in which every line is formatted as a separate JSON document. Such JSONL
56+
files can be efficiently imported from the local storage system using the [LOAD JSONL clause](/querying/clauses/load-jsonl).
57+
5558
Check out the [JSON import guide](/data-migration/json).
5659

5760
### Cypherl file

pages/data-migration/json.mdx

Lines changed: 228 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,236 @@
11
---
2-
title: Import data from JSON files
3-
description: Integrate JSON effortlessly with Memgraph. Detailed documentation guiding you every step of the way towards graph use cases.
2+
title: Import data from JSON(L) files
3+
description: Integrate JSON(L) effortlessly with Memgraph. Detailed documentation guiding you every step of the way towards graph use cases.
44
---
55

66
import { Callout } from 'nextra/components'
77

8+
# Import data from JSONL files
9+
10+
A JSONL file is a file in which every line is a separate JSON document. Each line is parsed as node
11+
or edge and each key in the JSON document is used as a node's or edge's property. The data from JSONL files
12+
can be imported using `LOAD JSONL` clause from the local disk.
13+
14+
## `LOAD JSONL` Cypher clause
15+
16+
The `LOAD JSONL` clause uses [simdjson library](https://github.com/simdjson/simdjson) to parse JSON documents as
17+
fast as possible.
18+
19+
### `LOAD JSONL` clause syntax
20+
21+
The syntax of the `LOAD JSONL` clause is:
22+
23+
```cypher
24+
LOAD JSONL FROM <jsonl-location> AS <variable-name>
25+
```
26+
27+
- `<jsonl-location>` is a string representing the path from which JSONL file should be loaded. There are no restrictions on where in
28+
your file system the file can be located, as long as the path is valid (i.e.,
29+
the file exists). If you are using Docker to run Memgraph, you will need to
30+
[copy the files from your local directory into
31+
Docker](/getting-started/first-steps-with-docker#copy-files-from-and-to-a-docker-container)
32+
container where Memgraph can access them. <br/>
33+
- `<variable-name>` is a symbolic name representing the variable to which the
34+
contents of the parsed row will be bound to, enabling access to the row
35+
contents later in the query. The variable doesn't have to be used in any
36+
subsequent clause.
37+
38+
### `LOAD JSONL` clause specificities
39+
40+
When using the `LOAD JSONL` clause please keep in mind:
41+
42+
- The JSONL parser parses the values in their appropriate type so you should get the same property type in Memgraph as in JSONL file. Memgraph supports following
43+
JSON types:
44+
- `string`: The property in Memgraph will be of type string.
45+
- `uint64_t`: The property in Memgraph will be cast to int64_t because Cypher standard doesn't support uint64_t.
46+
- `int64_t`: The property in Memgraph will be saved as int64_t.
47+
- `double`: The property in Memgraph will be used as floating point number.
48+
- `boolean`: The property in Memgraph will be saved as bool.
49+
- `array`: The property in Memgraph will be saved as list.
50+
- `object`: The property in Memgraph will be saved as map.
51+
52+
- **The `LOAD JSONL` clause is not a standalone clause**, meaning a valid query must contain at least one more clause, for example:
53+
54+
```cypher
55+
LOAD JSONL FROM "./people.jsonl" AS row CREATE (p:Person) SET p += row;
56+
```
57+
58+
In this regard, the following query will throw an exception:
59+
60+
```cypher
61+
LOAD JSONL FROM "./file.jsonl" AS row;
62+
```
63+
64+
**Adding a `MATCH` or `MERGE` clause before LOAD JSONL** allows you to match certain entities in the graph before running `LOAD JSONL`, optimizing the process as
65+
matched entities do not need to be searched for every row in the JSONL file.
66+
67+
But, the `MATCH` or `MERGE` clause can be used prior the `LOAD JSONL` clause only
68+
if the clause returns only one row. Returning multiple rows before calling the
69+
`LOAD JSONL` clause will cause a Memgraph runtime error.
70+
71+
- **The `LOAD JSONL` clause can be used at most once per query**, so queries like
72+
the one below will throw an exception:
73+
74+
```cypher
75+
LOAD JSONL FROM "/x.jsonl" AS x
76+
LOAD JSONL FROM "/y.jsonl" AS y
77+
CREATE (n:A {p1 : x, p2 : y});
78+
```
79+
80+
### Increase import speed
81+
82+
83+
The `LOAD JSONL` clause will create relationships much faster and consequently
84+
speed up data import if you [create indexes](/fundamentals/indexes) on nodes or
85+
node properties once you import them:
86+
87+
```cypher
88+
CREATE INDEX ON :Node(id);
89+
```
90+
91+
If the `LOAD JSONL` clause is merging data instead of creating it, create indexes
92+
before running the `LOAD JSONL` clause.
93+
94+
The construct `USING PERIODIC COMMIT <BATCH_SIZE>` also improves the import speed because
95+
it optimizes memory allocation patterns. In our benchmarks, periodic commit
96+
speeds up the execution from 25% to 35%.
97+
98+
```cypher
99+
USING PERIODIC COMMMIT 1024 LOAD CLAUSE FROM "/x.jsonl" AS x
100+
CREATE (n:A {p1 : x, p2 : y});
101+
```
102+
103+
104+
You can also speed up the import if you switch Memgraph to [**analytical storage
105+
mode**](/fundamentals/storage-memory-usage#storage-modes). In the analytical
106+
storage mode there are no ACID guarantees besides manually created snapshots.
107+
After import you can switch the storage mode back to
108+
transactional and enable ACID guarantees.
109+
110+
You can switch between modes within the session using the following query:
111+
112+
```cypher
113+
STORAGE MODE IN_MEMORY_{TRANSACTIONAL|ANALYTICAL};
114+
```
115+
116+
If you use `IN_MEMORY_ANALYTICAL` mode and have nodes and relationships stored in
117+
separate JSONL files, you can run multiple concurrent `LOAD JSONL` queries to import data even faster.
118+
In order to achieve the best import performance, split your nodes and relationships
119+
files into smaller files and run multiple `LOAD JSONL` queries in parallel.
120+
The key is to run all `LOAD JSONL` queries which create nodes first. After that, run
121+
all `LOAD JSONL` queries that create relationships.
122+
123+
124+
### Import multiple JSONL files with distinct graph objects
125+
126+
In this example, the data is split across four files, each file contains nodes
127+
of a single label or relationships of a single type.
128+
129+
130+
<Steps>
131+
132+
{<h3 className="custom-header">JSONL files</h3>}
133+
134+
- [`people_nodes.jsonl`](s3://download.memgraph.com/asset/docs/people_nodes.jsonl) is used to create nodes labeled `:Person`.<br/> The file contains the following data:
135+
```jsonl
136+
{"id": 100, "name": "Daniel", "age": 30, "city": "London"}
137+
{"id": 101, "name": "Alex", "age": 15, "city": "Paris"}
138+
{"id": 102, "name": "Sarah", "age": 17, "city": "London"}
139+
{"id": 103, "name": "Mia", "age": 25, "city": "Zagreb"}
140+
{"id": 104, "name": "Lucy", "age": 21, "city": "Paris"}
141+
```
142+
- [`restaurants_nodes.jsonl`](s3://download.memgraph.com/asset/docs/restaurants_nodes.jsonl) is used to create nodes labeled `:Restaurants`.<br/> The file contains the following data:
143+
```jsonl
144+
{"id": 200, "name": "Mc Donalds", "menu": "Fries;BigMac;McChicken;Apple Pie"}
145+
{"id": 201, "name": "KFC", "menu": "Fried Chicken;Fries;Chicken Bucket"}
146+
{"id": 202, "name": "Subway", "menu": "Ham Sandwich;Turkey Sandwich;Foot-long"}
147+
{"id": 203, "name": "Dominos", "menu": "Pepperoni Pizza;Double Dish Pizza;Cheese filled Crust"}
148+
```
149+
150+
- [`people_relationships.jsonl`](s3://download.memgraph.com/asset/docs/people_relationships.jsonl) is used to connect people with the `:IS_FRIENDS_WITH` relationship.<br/> The file contains the following data:
151+
```jsonl
152+
{"first_person": 100, "second_person": 102, "met_in": 2014}
153+
{"first_person": 103, "second_person": 101, "met_in": 2021}
154+
{"first_person": 102, "second_person": 103, "met_in": 2005}
155+
{"first_person": 101, "second_person": 104, "met_in": 2005}
156+
{"first_person": 104, "second_person": 100, "met_in": 2018}
157+
{"first_person": 101, "second_person": 102, "met_in": 2017}
158+
{"first_person": 100, "second_person": 103, "met_in": 2001}
159+
```
160+
- [`restaurants_relationships.jsonl`](s3://download.memgraph.com/asset/docs/restaurants_relationships.jsonl) is used to connect people with restaurants using the `:ATE_AT` relationship.<br/> The file contains the following data:
161+
```jsonl
162+
{"PERSON_ID": 100, "REST_ID": 200, "liked": true}
163+
{"PERSON_ID": 103, "REST_ID": 201, "liked": false}
164+
{"PERSON_ID": 104, "REST_ID": 200, "liked": true}
165+
{"PERSON_ID": 101, "REST_ID": 202, "liked": false}
166+
{"PERSON_ID": 101, "REST_ID": 203, "liked": false}
167+
{"PERSON_ID": 101, "REST_ID": 200, "liked": true}
168+
{"PERSON_ID": 102, "REST_ID": 201, "liked": true}
169+
```
170+
171+
{<h3 className="custom-header">Import nodes</h3>}
172+
173+
Each row will be parsed as a map, and the
174+
fields can be accessed using the property lookup syntax (e.g. `id: row.id`). Files should be downloaded and then accessed from the local disk.
175+
176+
The following query will load row by row from the file, and create a new node
177+
for each row with properties based on the parsed row values:
178+
179+
```cypher
180+
LOAD JSONL FROM "people_nodes.jsonl" AS row
181+
CREATE (n:Person {id: row.id, name: row.name, age: row.age, city: row.city});
182+
```
183+
184+
In the same manner, the following query will create a new node for each restaurant:
185+
186+
```cypher
187+
LOAD JSONL FROM "restaurants_nodes.jsonl" AS row
188+
CREATE (n:Restaurant {id: row.id, name: row.name, menu: row.menu});
189+
```
190+
191+
{<h3 className="custom-header">Create indexes</h3>}
192+
193+
Creating an [index](/fundamentals/indexes) on a property used to connect nodes
194+
with relationships, in this case, the `id` property of the `:Person` nodes,
195+
will speed up the import of relationships, especially with large datasets:
196+
197+
```cypher
198+
CREATE INDEX ON :Person(id);
199+
```
200+
201+
{<h3 className="custom-header">Import relationships</h3>}
202+
The following query will create relationships between the people nodes:
203+
204+
```cypher
205+
LOAD JSONL FROM "people_relationships.jsonl" AS row
206+
MATCH (p1:Person {id: row.first_person})
207+
MATCH (p2:Person {id: row.second_person})
208+
CREATE (p1)-[f:IS_FRIENDS_WITH]->(p2)
209+
SET f.met_in = row.met_in;
210+
```
211+
212+
The following query will create relationships between people and restaurants where they ate:
213+
214+
```cypher
215+
LOAD JSONL FROM "restaurants_relationships.jsonl" AS row
216+
MATCH (p1:Person {id: row.PERSON_ID})
217+
MATCH (re:Restaurant {id: row.REST_ID})
218+
CREATE (p1)-[ate:ATE_AT]->(re)
219+
SET ate.liked = ToBoolean(row.liked);
220+
```
221+
222+
{<h3 className="custom-header">Final result</h3>}
223+
Run the following query to see how the imported data looks as a graph:
224+
225+
```
226+
MATCH p=()-[]-() RETURN p;
227+
```
228+
229+
![](/pages/data-migration/csv/load_csv_restaurants_relationships.png)
230+
231+
</Steps>
232+
233+
8234
# Import data from JSON files
9235

10236
A JSON file is a file that stores simple data structures and objects in

pages/database-management/authentication-and-authorization/role-based-access-control.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -159,7 +159,7 @@ of the following commands:
159159
| Privilege to enforce [constraints](/fundamentals/constraints). | `CONSTRAINT` |
160160
| Privilege to [dump the database](/configuration/data-durability-and-backup#database-dump).| `DUMP` |
161161
| Privilege to use [replication](/clustering/replication) queries. | `REPLICATION` |
162-
| Privilege to access files in queries, for example, when using `LOAD CSV` and `LOAD PARQUET` clauses. | `READ_FILE` |
162+
| Privilege to access files in queries, for example, when using `LOAD CSV`, `LOAD JSONL` and `LOAD PARQUET` clauses. | `READ_FILE` |
163163
| Privilege to manage [durability files](/configuration/data-durability-and-backup#database-dump). | `DURABILITY` |
164164
| Privilege to try and [free memory](/fundamentals/storage-memory-usage#deallocating-memory). | `FREE_MEMORY` |
165165
| Privilege to use [trigger queries](/fundamentals/triggers). | `TRIGGER` |

pages/help-center/faq.mdx

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -216,7 +216,7 @@ Currently, the fastest way to import data is from a Parquet file with a [LOAD PA
216216
clause](/data-migration/parquet). Check out the [best practices for importing
217217
data](/data-migration/best-practices).
218218

219-
[Other import methods](/data-migration) include importing data from CSV, JSON and CYPHERL files,
219+
[Other import methods](/data-migration) include importing data from CSV, JSON, JSONL and CYPHERL files,
220220
migrating from relational databases, or connecting to a data stream.
221221

222222
### How to import data from MySQL or PostgreSQL?
@@ -227,10 +227,10 @@ You can migrate from [MySQL](/data-migration/migrate-from-rdbms) or
227227
### What file formats does Memgraph support for import?
228228

229229
You can import data from [CSV](/data-migration/csv), [PARQUET](/data-migration/parquet)
230-
[JSON](/data-migration/json) or [CYPHERL](/data-migration/cypherl) files.
230+
[JSON and JSONL](/data-migration/json) or [CYPHERL](/data-migration/cypherl) files.
231231

232232
CSV files can be imported in on-premise instances using the [LOAD CSV
233-
clause](/data-migration/csv), PARQUET files can be imported using the [LOAD PARQUET](/data-migration/parquet) and JSON files can be imported using a
233+
clause](/data-migration/csv), PARQUET files can be imported using the [LOAD PARQUET](/data-migration/parquet) and JSON(L) files can be imported using a
234234
[json_util](/advanced-algorithms/available-algorithms/json_util) module from the
235235
MAGE library. On a Cloud instance, data from CSV and JSON files can be imported only
236236
from a remote address.

pages/querying/query-plan.mdx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -241,6 +241,7 @@ The following table lists all the operators currently supported by Memgraph:
241241
| `IndexedJoin` | Performs an indexed join of the input from its two input branches. |
242242
| `Limit` | Limits certain rows from the pull chain. |
243243
| `LoadCsv` | Loads CSV file in order to import files into the database. |
244+
| `LoadJsonl` | Loads JSONL file in order to import files into the database. |
244245
| `LoadParquet` | Loads Parqet file in order to import files into the database. |
245246
| `Merge` | Applies merge on the input it received. |
246247
| `Once` | Forms the beginning of an operator chain with "only once" semantics. The operator will return false on subsequent pulls. |

0 commit comments

Comments
 (0)