Skip to content

Latest commit

 

History

History
142 lines (106 loc) · 5.91 KB

README.gds.md

File metadata and controls

142 lines (106 loc) · 5.91 KB

Graph Data Science (GDS)

This file gives some advice on how to use GDS with the dataset.

NB to use GDS you will need to be either using Sandbox OR Desktop environments, Aura Free will not allow access.

We will be projecting a graph into the GDS Graph Catalog using Native Projection

If you want to ensure you have no existing projections you can run the following Cypher to clear your Graph Catalog:

CALL gds.graph.list() YIELD graphName AS toDrop
CALL gds.graph.drop(toDrop) YIELD graphName
RETURN "Dropped " + graphName;

We will project a graph named 'OperationalPoints' into the Graph Catalog. We will take the OperationalPoint Nodes and the SECTION Relationships to form a monopartite graph:

CALL gds.graph.project(
    'OperationalPoints',
    'OperationalPoint',
    {SECTION: {orientation: 'UNDIRECTED'}},
    {
        relationshipProperties: ['sectionlength', 'traveltime']
    }
);

Path Finding

We can calculate the shortest path between two stations - for example, Malmö Central to Stockholm Central - using our traveltime relatonship weights and the Dijkstra Source-Target Shortest Path algorithm from the GDS library. Note that bad data in our dataset (such as null or zero relationship weights) can cause strange results when calculating weighted shortest paths.

MATCH     
    (:OperationalPointName {name:'Stockholms central'})<-[:NAMED]-(stockholm:OperationalPoint),
    (:OperationalPointName {name:'Malmö central'})<-[:NAMED]-(malmo:OperationalPoint)
CALL gds.shortestPath.dijkstra.stream('OperationalPoints', {
    sourceNode: malmo,
    targetNode: stockholm,
    relationshipWeightProperty: 'traveltime'
})
YIELD index, sourceNode, targetNode, totalCost, nodeIds, costs, path
RETURN *;

Do we get the same result if we use the sectionlength relationship property as our weight instead of traveltime when computing the shortest path?

MATCH     
    (:OperationalPointName {name:'Stockholms central'})<-[:NAMED]-(stockholm:OperationalPoint),
    (:OperationalPointName {name:'Malmö central'})<-[:NAMED]-(malmo:OperationalPoint)
CALL gds.shortestPath.dijkstra.stream('OperationalPoints', {
    sourceNode: malmo,
    targetNode: stockholm,
    relationshipWeightProperty: 'sectionlength'
})
YIELD index, sourceNode, targetNode, totalCost, nodeIds, costs, path
RETURN *;

Community Detection

Now we use the Weakly Connected Components algorithm in 'stream' mode to review OperationalPoints that are not well connected to the network:

CALL gds.wcc.stream('OperationalPoints') YIELD nodeId, componentId
WITH collect(gds.util.asNode(nodeId).id) AS nodes, componentId
RETURN nodes, componentId 
ORDER BY size(nodes) ASC LIMIT 50;

We can also write the Weakly Connected Components componentId properties to the database so we can query and visualise them later:

CALL gds.wcc.write('OperationalPoints', {writeProperty: 'componentId'});

We should index our new Weakly Connected Components componentId property, so that we can query with it in a performant way:

CREATE INDEX index_OperationalPoint_componentid IF NOT EXISTS FOR (opn:OperationalPoint) ON (opn.componentId);

Let's find a specific OperationalPoint and view the other members of its community. You should see that it belongs to an isolated group of OperationalPoints.

MATCH (op:OperationalPoint {id: 'UKN4288'})
WITH op.componentId as component
MATCH path = (:OperationalPoint {componentId: component})-[:SECTION]->()
RETURN path

Centrality

Using the Degree Centrality algorithm we can identify important nodes in the graph based on how many SECTION relationships they have. Nodes with a high Degree Centrality score represent OperationalPoints which are important transfer points in our network.

CALL gds.degree.stream('OperationalPoints')
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).id AS id, score
ORDER BY score DESC LIMIT 50;

We should write the Degree Centrality degreeScore properties to the database so we can query and visualise them later:

CALL gds.degree.write('OperationalPoints', {writeProperty: 'degreeScore'})

Using the Betweenness Centrality algorithm we can identify important nodes in the graph by another metric - those nodes which sit on the shortest path between the most other nodes.

These nodes represent OperationalPoints which many journeys are likely to pass through, and may act as 'bridge' nodes between different parts of the network.

CALL gds.betweenness.stream('OperationalPoints')
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).id AS id, score
ORDER BY score DESC LIMIT 50;

We should also write the Betweenness Centrality scores back to the database so we can query and visualise them later:

CALL gds.betweenness.write('OperationalPoints', {writeProperty: 'betweennessScore'})

Let's also index our new Degree Centrality degreeScoreand Betweenness Centrality betweennessScore properties, so that we can query using them in a performant way:

CREATE INDEX index_OperationalPoint_degreeScore IF NOT EXISTS FOR (opn:OperationalPoint) ON (opn.degreeScore);
CREATE INDEX index_OperationalPoint_betweennessScore IF NOT EXISTS FOR (opn:OperationalPoint) ON (opn.betweennessScore);

Tidy Up

Finally, it's best practice to remove your graph projections from memory when you're finished with them:

CALL gds.graph.drop('OperationalPoints')