|
1 | 1 | --- |
2 | | -title: Import data from JSON files |
3 | | -description: Integrate JSON effortlessly with Memgraph. Detailed documentation guiding you every step of the way towards graph use cases. |
| 2 | +title: Import data from JSON(L) files |
| 3 | +description: Integrate JSON(L) effortlessly with Memgraph. Detailed documentation guiding you every step of the way towards graph use cases. |
4 | 4 | --- |
5 | 5 |
|
6 | 6 | import { Callout } from 'nextra/components' |
7 | 7 |
|
| 8 | +# Import data from JSONL files |
| 9 | + |
| 10 | +A JSONL file is a file in which every line is a separate JSON document. Each line is parsed as node |
| 11 | +or edge and each key in the JSON document is used as a node's or edge's property. The data from JSONL files |
| 12 | +can be imported using `LOAD JSONL` clause from the local disk. |
| 13 | + |
| 14 | +## `LOAD JSONL` Cypher clause |
| 15 | + |
| 16 | +The `LOAD JSONL` clause uses [simdjson library](https://github.com/simdjson/simdjson) to parse JSON documents as |
| 17 | +fast as possible. |
| 18 | + |
| 19 | +### `LOAD JSONL` clause syntax |
| 20 | + |
| 21 | +The syntax of the `LOAD JSONL` clause is: |
| 22 | + |
| 23 | +```cypher |
| 24 | +LOAD JSONL FROM <jsonl-location> AS <variable-name> |
| 25 | +``` |
| 26 | + |
| 27 | +- `<jsonl-location>` is a string representing the path from which JSONL file should be loaded. There are no restrictions on where in |
| 28 | + your file system the file can be located, as long as the path is valid (i.e., |
| 29 | + the file exists). If you are using Docker to run Memgraph, you will need to |
| 30 | + [copy the files from your local directory into |
| 31 | + Docker](/getting-started/first-steps-with-docker#copy-files-from-and-to-a-docker-container) |
| 32 | + container where Memgraph can access them. <br/> |
| 33 | +- `<variable-name>` is a symbolic name representing the variable to which the |
| 34 | + contents of the parsed row will be bound to, enabling access to the row |
| 35 | + contents later in the query. The variable doesn't have to be used in any |
| 36 | + subsequent clause. |
| 37 | + |
| 38 | +### `LOAD JSONL` clause specificities |
| 39 | + |
| 40 | +When using the `LOAD JSONL` clause please keep in mind: |
| 41 | + |
| 42 | +- The JSONL parser parses the values in their appropriate type so you should get the same property type in Memgraph as in JSONL file. Memgraph supports following |
| 43 | +JSON types: |
| 44 | + - `string`: The property in Memgraph will be of type string. |
| 45 | + - `uint64_t`: The property in Memgraph will be cast to int64_t because Cypher standard doesn't support uint64_t. |
| 46 | + - `int64_t`: The property in Memgraph will be saved as int64_t. |
| 47 | + - `double`: The property in Memgraph will be used as floating point number. |
| 48 | + - `boolean`: The property in Memgraph will be saved as bool. |
| 49 | + - `array`: The property in Memgraph will be saved as list. |
| 50 | + - `object`: The property in Memgraph will be saved as map. |
| 51 | + |
| 52 | +- **The `LOAD JSONL` clause is not a standalone clause**, meaning a valid query must contain at least one more clause, for example: |
| 53 | + |
| 54 | +```cypher |
| 55 | +LOAD JSONL FROM "./people.jsonl" AS row CREATE (p:Person) SET p += row; |
| 56 | +``` |
| 57 | + |
| 58 | +In this regard, the following query will throw an exception: |
| 59 | + |
| 60 | +```cypher |
| 61 | +LOAD JSONL FROM "./file.jsonl" AS row; |
| 62 | +``` |
| 63 | + |
| 64 | +**Adding a `MATCH` or `MERGE` clause before LOAD JSONL** allows you to match certain entities in the graph before running `LOAD JSONL`, optimizing the process as |
| 65 | +matched entities do not need to be searched for every row in the JSONL file. |
| 66 | + |
| 67 | +But, the `MATCH` or `MERGE` clause can be used prior the `LOAD JSONL` clause only |
| 68 | +if the clause returns only one row. Returning multiple rows before calling the |
| 69 | +`LOAD JSONL` clause will cause a Memgraph runtime error. |
| 70 | + |
| 71 | +- **The `LOAD JSONL` clause can be used at most once per query**, so queries like |
| 72 | +the one below will throw an exception: |
| 73 | + |
| 74 | +```cypher |
| 75 | +LOAD JSONL FROM "/x.jsonl" AS x |
| 76 | +LOAD JSONL FROM "/y.jsonl" AS y |
| 77 | +CREATE (n:A {p1 : x, p2 : y}); |
| 78 | +``` |
| 79 | + |
| 80 | +### Increase import speed |
| 81 | + |
| 82 | + |
| 83 | +The `LOAD JSONL` clause will create relationships much faster and consequently |
| 84 | +speed up data import if you [create indexes](/fundamentals/indexes) on nodes or |
| 85 | +node properties once you import them: |
| 86 | + |
| 87 | +```cypher |
| 88 | + CREATE INDEX ON :Node(id); |
| 89 | +``` |
| 90 | + |
| 91 | +If the `LOAD JSONL` clause is merging data instead of creating it, create indexes |
| 92 | +before running the `LOAD JSONL` clause. |
| 93 | + |
| 94 | +The construct `USING PERIODIC COMMIT <BATCH_SIZE>` also improves the import speed because |
| 95 | +it optimizes memory allocation patterns. In our benchmarks, periodic commit |
| 96 | +speeds up the execution from 25% to 35%. |
| 97 | + |
| 98 | +```cypher |
| 99 | + USING PERIODIC COMMMIT 1024 LOAD CLAUSE FROM "/x.jsonl" AS x |
| 100 | + CREATE (n:A {p1 : x, p2 : y}); |
| 101 | +``` |
| 102 | + |
| 103 | + |
| 104 | +You can also speed up the import if you switch Memgraph to [**analytical storage |
| 105 | +mode**](/fundamentals/storage-memory-usage#storage-modes). In the analytical |
| 106 | +storage mode there are no ACID guarantees besides manually created snapshots. |
| 107 | +After import you can switch the storage mode back to |
| 108 | +transactional and enable ACID guarantees. |
| 109 | + |
| 110 | +You can switch between modes within the session using the following query: |
| 111 | + |
| 112 | +```cypher |
| 113 | +STORAGE MODE IN_MEMORY_{TRANSACTIONAL|ANALYTICAL}; |
| 114 | +``` |
| 115 | + |
| 116 | +If you use `IN_MEMORY_ANALYTICAL` mode and have nodes and relationships stored in |
| 117 | + separate JSONL files, you can run multiple concurrent `LOAD JSONL` queries to import data even faster. |
| 118 | +In order to achieve the best import performance, split your nodes and relationships |
| 119 | +files into smaller files and run multiple `LOAD JSONL` queries in parallel. |
| 120 | +The key is to run all `LOAD JSONL` queries which create nodes first. After that, run |
| 121 | +all `LOAD JSONL` queries that create relationships. |
| 122 | + |
| 123 | + |
| 124 | +### Import multiple JSONL files with distinct graph objects |
| 125 | + |
| 126 | +In this example, the data is split across four files, each file contains nodes |
| 127 | +of a single label or relationships of a single type. |
| 128 | + |
| 129 | + |
| 130 | +<Steps> |
| 131 | + |
| 132 | + {<h3 className="custom-header">JSONL files</h3>} |
| 133 | + |
| 134 | + - [`people_nodes.jsonl`](s3://download.memgraph.com/asset/docs/people_nodes.jsonl) is used to create nodes labeled `:Person`.<br/> The file contains the following data: |
| 135 | + ```jsonl |
| 136 | + {"id": 100, "name": "Daniel", "age": 30, "city": "London"} |
| 137 | + {"id": 101, "name": "Alex", "age": 15, "city": "Paris"} |
| 138 | + {"id": 102, "name": "Sarah", "age": 17, "city": "London"} |
| 139 | + {"id": 103, "name": "Mia", "age": 25, "city": "Zagreb"} |
| 140 | + {"id": 104, "name": "Lucy", "age": 21, "city": "Paris"} |
| 141 | + ``` |
| 142 | +- [`restaurants_nodes.jsonl`](s3://download.memgraph.com/asset/docs/restaurants_nodes.jsonl) is used to create nodes labeled `:Restaurants`.<br/> The file contains the following data: |
| 143 | + ```jsonl |
| 144 | + {"id": 200, "name": "Mc Donalds", "menu": "Fries;BigMac;McChicken;Apple Pie"} |
| 145 | + {"id": 201, "name": "KFC", "menu": "Fried Chicken;Fries;Chicken Bucket"} |
| 146 | + {"id": 202, "name": "Subway", "menu": "Ham Sandwich;Turkey Sandwich;Foot-long"} |
| 147 | + {"id": 203, "name": "Dominos", "menu": "Pepperoni Pizza;Double Dish Pizza;Cheese filled Crust"} |
| 148 | + ``` |
| 149 | + |
| 150 | +- [`people_relationships.jsonl`](s3://download.memgraph.com/asset/docs/people_relationships.jsonl) is used to connect people with the `:IS_FRIENDS_WITH` relationship.<br/> The file contains the following data: |
| 151 | + ```jsonl |
| 152 | + {"first_person": 100, "second_person": 102, "met_in": 2014} |
| 153 | + {"first_person": 103, "second_person": 101, "met_in": 2021} |
| 154 | + {"first_person": 102, "second_person": 103, "met_in": 2005} |
| 155 | + {"first_person": 101, "second_person": 104, "met_in": 2005} |
| 156 | + {"first_person": 104, "second_person": 100, "met_in": 2018} |
| 157 | + {"first_person": 101, "second_person": 102, "met_in": 2017} |
| 158 | + {"first_person": 100, "second_person": 103, "met_in": 2001} |
| 159 | + ``` |
| 160 | +- [`restaurants_relationships.jsonl`](s3://download.memgraph.com/asset/docs/restaurants_relationships.jsonl) is used to connect people with restaurants using the `:ATE_AT` relationship.<br/> The file contains the following data: |
| 161 | + ```jsonl |
| 162 | + {"PERSON_ID": 100, "REST_ID": 200, "liked": true} |
| 163 | + {"PERSON_ID": 103, "REST_ID": 201, "liked": false} |
| 164 | + {"PERSON_ID": 104, "REST_ID": 200, "liked": true} |
| 165 | + {"PERSON_ID": 101, "REST_ID": 202, "liked": false} |
| 166 | + {"PERSON_ID": 101, "REST_ID": 203, "liked": false} |
| 167 | + {"PERSON_ID": 101, "REST_ID": 200, "liked": true} |
| 168 | + {"PERSON_ID": 102, "REST_ID": 201, "liked": true} |
| 169 | + ``` |
| 170 | + |
| 171 | + {<h3 className="custom-header">Import nodes</h3>} |
| 172 | + |
| 173 | + Each row will be parsed as a map, and the |
| 174 | + fields can be accessed using the property lookup syntax (e.g. `id: row.id`). Files should be downloaded and then accessed from the local disk. |
| 175 | + |
| 176 | + The following query will load row by row from the file, and create a new node |
| 177 | + for each row with properties based on the parsed row values: |
| 178 | + |
| 179 | + ```cypher |
| 180 | + LOAD JSONL FROM "people_nodes.jsonl" AS row |
| 181 | + CREATE (n:Person {id: row.id, name: row.name, age: row.age, city: row.city}); |
| 182 | + ``` |
| 183 | + |
| 184 | + In the same manner, the following query will create a new node for each restaurant: |
| 185 | + |
| 186 | + ```cypher |
| 187 | + LOAD JSONL FROM "restaurants_nodes.jsonl" AS row |
| 188 | + CREATE (n:Restaurant {id: row.id, name: row.name, menu: row.menu}); |
| 189 | + ``` |
| 190 | + |
| 191 | + {<h3 className="custom-header">Create indexes</h3>} |
| 192 | + |
| 193 | + Creating an [index](/fundamentals/indexes) on a property used to connect nodes |
| 194 | + with relationships, in this case, the `id` property of the `:Person` nodes, |
| 195 | + will speed up the import of relationships, especially with large datasets: |
| 196 | + |
| 197 | + ```cypher |
| 198 | + CREATE INDEX ON :Person(id); |
| 199 | + ``` |
| 200 | + |
| 201 | + {<h3 className="custom-header">Import relationships</h3>} |
| 202 | + The following query will create relationships between the people nodes: |
| 203 | + |
| 204 | + ```cypher |
| 205 | + LOAD JSONL FROM "people_relationships.jsonl" AS row |
| 206 | + MATCH (p1:Person {id: row.first_person}) |
| 207 | + MATCH (p2:Person {id: row.second_person}) |
| 208 | + CREATE (p1)-[f:IS_FRIENDS_WITH]->(p2) |
| 209 | + SET f.met_in = row.met_in; |
| 210 | + ``` |
| 211 | + |
| 212 | + The following query will create relationships between people and restaurants where they ate: |
| 213 | + |
| 214 | + ```cypher |
| 215 | + LOAD JSONL FROM "restaurants_relationships.jsonl" AS row |
| 216 | + MATCH (p1:Person {id: row.PERSON_ID}) |
| 217 | + MATCH (re:Restaurant {id: row.REST_ID}) |
| 218 | + CREATE (p1)-[ate:ATE_AT]->(re) |
| 219 | + SET ate.liked = ToBoolean(row.liked); |
| 220 | + ``` |
| 221 | + |
| 222 | + {<h3 className="custom-header">Final result</h3>} |
| 223 | + Run the following query to see how the imported data looks as a graph: |
| 224 | + |
| 225 | + ``` |
| 226 | + MATCH p=()-[]-() RETURN p; |
| 227 | + ``` |
| 228 | + |
| 229 | +  |
| 230 | + |
| 231 | +</Steps> |
| 232 | + |
| 233 | + |
8 | 234 | # Import data from JSON files |
9 | 235 |
|
10 | 236 | A JSON file is a file that stores simple data structures and objects in |
|
0 commit comments