@@ -247,19 +247,69 @@ applications.
247
247
248
248
### 1. Streaming Node Properties
249
249
250
- Streaming node properties is available based on label filters.
250
+ Node properties from a graph projection can be streamed from
251
+ GDS/AuraDS with optional filters based on node label. The stream is
252
+ exposed as a Python generator and can be consumed lazily, but ** it is
253
+ important to fully consume the stream** as older versions of GDS do
254
+ not have a server-side timeout. (If the client dies or fails to
255
+ consume the full stream, server threads may be deadlocked.)
251
256
252
- ...
257
+ ``` python
258
+ nodes = client.read_nodes([" louvain" , " pageRank" ], labels = [" User" ])
259
+
260
+ # if you know the resulting dataset is small, you can eagerly consume into a
261
+ # Python list
262
+ result = list (nodes)
263
+
264
+ # the `result` is now a list of PyArrow RecordBatch objects
265
+ print (result)
266
+ ```
267
+
268
+ Inspecting the schema, like in the example above, will reveal that
269
+ only the node ids and the request properties are provided. _ The labels
270
+ are not provided and must be implied based on your labels filter!_
271
+ (This is a GDS limitation in the stored procedures for streaming node
272
+ properties.)
273
+
274
+ The above result from the ` print() ` function will look like the
275
+ following (assuming a single item in the list):
276
+
277
+ ``` python
278
+ [pyarrow.RecordBatch
279
+ nodeId: int64 not null
280
+ louvain: int64 not null
281
+ pageRank: int64 not null]
282
+ ```
253
283
254
284
### 2. Streaming Relationships
255
285
256
- ...
286
+ Relationship properties can be streamed similar to node properties,
287
+ however they support an alternative mode for streaming the "topology"
288
+ of the graph when not providing any property or relationship type
289
+ filters.
290
+
291
+ > Note: the relationship types are dictionary encoded, so if accessing
292
+ > the raw PyArrow buffer values, you will need to decode. Conversion
293
+ > to Python dicts/lists or Pandas DataFrames will decode for you back
294
+ > into strings/varchars.
295
+
296
+ Like with node streams, the stream is provided as a Python generator
297
+ and requires full consumption by the client program.
298
+
299
+ ``` python
300
+ # dump just the topology without properties
301
+ topology = client.read_edges()
302
+ # `topology` is a generator producing PyArrow RecordBatch objects
303
+
304
+ # if you want properties and/or to target particular relationship types...
305
+ edges = client.read_edges(properties = [" score" ], relationship_types = [" SIMILAR" ])
306
+ ```
257
307
258
308
### Streaming Caveats
259
309
260
310
There are a few known caveats to be aware of when creating and consuming Arrow-based streams from Neo4j GDS:
261
311
262
- - You should consume the stream in its entirety to avoid blocking
312
+ - You must consume the stream in its entirety to avoid blocking
263
313
server-side threads.
264
314
- While recent versions of GDS will include a timeout, older
265
315
versions will consume one or many threads until the stream is
0 commit comments