Skip to content
This repository was archived by the owner on Apr 8, 2025. It is now read-only.

Commit d2fe5fb

Browse files
committed
add docs for read_{edges,nodes}
1 parent 05f8d3d commit d2fe5fb

File tree

1 file changed

+54
-4
lines changed

1 file changed

+54
-4
lines changed

README.md

Lines changed: 54 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -247,19 +247,69 @@ applications.
247247

248248
### 1. Streaming Node Properties
249249

250-
Streaming node properties is available based on label filters.
250+
Node properties from a graph projection can be streamed from
251+
GDS/AuraDS with optional filters based on node label. The stream is
252+
exposed as a Python generator and can be consumed lazily, but **it is
253+
important to fully consume the stream** as older versions of GDS do
254+
not have a server-side timeout. (If the client dies or fails to
255+
consume the full stream, server threads may be deadlocked.)
251256

252-
...
257+
```python
258+
nodes = client.read_nodes(["louvain", "pageRank"], labels=["User"])
259+
260+
# if you know the resulting dataset is small, you can eagerly consume into a
261+
# Python list
262+
result = list(nodes)
263+
264+
# the `result` is now a list of PyArrow RecordBatch objects
265+
print(result)
266+
```
267+
268+
Inspecting the schema, like in the example above, will reveal that
269+
only the node ids and the request properties are provided. _The labels
270+
are not provided and must be implied based on your labels filter!_
271+
(This is a GDS limitation in the stored procedures for streaming node
272+
properties.)
273+
274+
The above result from the `print()` function will look like the
275+
following (assuming a single item in the list):
276+
277+
```python
278+
[pyarrow.RecordBatch
279+
nodeId: int64 not null
280+
louvain: int64 not null
281+
pageRank: int64 not null]
282+
```
253283

254284
### 2. Streaming Relationships
255285

256-
...
286+
Relationship properties can be streamed similar to node properties,
287+
however they support an alternative mode for streaming the "topology"
288+
of the graph when not providing any property or relationship type
289+
filters.
290+
291+
> Note: the relationship types are dictionary encoded, so if accessing
292+
> the raw PyArrow buffer values, you will need to decode. Conversion
293+
> to Python dicts/lists or Pandas DataFrames will decode for you back
294+
> into strings/varchars.
295+
296+
Like with node streams, the stream is provided as a Python generator
297+
and requires full consumption by the client program.
298+
299+
```python
300+
# dump just the topology without properties
301+
topology = client.read_edges()
302+
# `topology` is a generator producing PyArrow RecordBatch objects
303+
304+
# if you want properties and/or to target particular relationship types...
305+
edges = client.read_edges(properties=["score"], relationship_types=["SIMILAR"])
306+
```
257307

258308
### Streaming Caveats
259309

260310
There are a few known caveats to be aware of when creating and consuming Arrow-based streams from Neo4j GDS:
261311

262-
- You should consume the stream in its entirety to avoid blocking
312+
- You must consume the stream in its entirety to avoid blocking
263313
server-side threads.
264314
- While recent versions of GDS will include a timeout, older
265315
versions will consume one or many threads until the stream is

0 commit comments

Comments
 (0)