Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 66 additions & 0 deletions memgraph-agentic-graphrag/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
---
name: memgraph-agentic-graphrag
description: Answers questions over any knowledge graph stored in Memgraph using agentic GraphRAG techniques (text2Cypher; vector/text search also called local graph search; query-focused summarization).
---

## Deterministic Analytical Questions - Text2Cypher

When a user asks a question that requires precise information from the graph—for
example, counts, aggregations, or analytical queries where exact data is needed—use
a **text2Cypher** approach:

1. **Generate a Cypher query** from the natural language question, based on the
graph schema (use `SHOW SCHEMA INFO;` query to get the schema).
2. **Execute the query** using the Memgraph MCP tool `run_query` when the
workspace is connected to Memgraph.
3. **Return the result** to the user with a clear summary of the findings.

Example: if asked "How many nodes of type X exist?", generate and run:
```cypher
MATCH (n:X) RETURN n.name AS name, count(n) AS count ORDER BY count DESC;
```
Then report back the result to the user.

## Similar entities and semantic search - Local Graph Search

When a user requests to find entities similar to a given one, or to determine
which nodes are related to a concept, employ vector or text search combined with
graph traversals to discover the most relevant results.
```
CALL embeddings.text(['<user query or entity description>']) YIELD embeddings, success
CALL vector_search.search('vs_index', 5, embeddings[0]) YIELD distance, node, similarity
WITH node AS chunk, similarity
MATCH (entity:<Label>)-[:HAS_CHUNK]->(chunk)
RETURN entity.title, similarity ORDER BY similarity DESC LIMIT 5;
```

Replace `<Label>` with the actual node label from the schema.

See [text search reference](references/text_search.md).
See [vector search reference](references/vector_search.md).

## Broad questions - Query-focuse Summarization

When a user asks a broad question, consider using query-focused summarization
over pre-computed community summaries (map-reduce over the graph).

```
WITH "{{the_user_question}}" AS USER_QUESTION
// 1. Identify relevant communities (Thresholding)
MATCH (c:Community)
WHERE c.nodes_count > 5 // Optional: only use significant communities
// 2. Map Step: Generate a partial answer for EACH community
WITH USER_QUESTION, c,
llm.complete("How does the following community summary: " + c.summary + " relate to: " + USER_QUESTION) AS partial_answer
WHERE partial_answer IS NOT NULL AND partial_answer <> ""
WITH USER_QUESTION, collect(partial_answer) AS map_results
// 3. Reduce Step: Synthesize the final consolidated answer
WITH "The following are partial answers from different thematic clusters of " +
" a dataset. Synthesize them into a single, cohesive response to the " +
"original query: " + USER_QUESTION +
"\n\nPartial Answers:\n" +
reduce(s = "", res IN map_results | s + "- " + res + "\n") AS reduce_prompt
RETURN llm.complete(reduce_prompt) AS final_answer;
```

If the graph is missing required preprocessing, see [preprocessing](references/preprocessing.md).
101 changes: 101 additions & 0 deletions memgraph-agentic-graphrag/references/preprocessing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
## Retrieve Schema

Schema is useful in almost all GraphRAG cases. To get the schema run the `SHOW
SCHEMA INFO;` query (there is probably an MCP tool available to do that).

## Text Search Preprocessing

To use text search for finding nodes whose properties contain specific text, you
first need to create a text index on the target label and (optionally) specify
which properties to index.

### Create a text index

Index **all** text-indexable properties on a label:

```cypher
CREATE TEXT INDEX entitySearch ON :<Label>;
```

Or index only **specific** properties:

```cypher
CREATE TEXT INDEX entitySearch ON :<Label>(title, description);
```

Replace `<Label>` with the actual node label from the schema (e.g., `Document`, `Product`, `Article`).

## Vector Search Preprocessing

To use vector search for finding similar content in your graph, you first need to:

1. **Create a vector index** on the target node label (e.g., `Chunk`) and specify the embedding property.
2. **Generate and store embeddings** for each node you want searchable.

Here's how you can do both steps in Memgraph:

```cypher
-- Step 1: Create the vector index for your embeddings
CREATE VECTOR INDEX vs_index ON :Chunk(embedding) WITH CONFIG {"dimension": <embedding_dimension>, "capacity": <expected_node_count>};

-- Step 2: Compute sentence embeddings for all Chunk nodes
MATCH (c:Chunk)
WITH collect(c) AS chunks
CALL embeddings.node_sentence(
chunks,
{excluded_properties: ["<id_property>", "<metadata_property>"]}
) YIELD success;
```

- The first command sets up a vector index called `vs_index` for the `embedding` property of `Chunk` nodes.
- The second part computes embeddings for every `Chunk` node, excluding properties that should not contribute to the semantic representation (e.g. IDs, internal metadata).

Run these commands before attempting any vector search queries on the graph.

## Communities Preprocessing

Replace `<Label>` and `<RELATIONSHIP>` with the actual node label and relationship type from the schema.

```
MATCH p=(n1:<Label>)-[r:<RELATIONSHIP>]->(n2:<Label>) WITH p
WITH project(p) AS subgraph
CALL community_detection.get(subgraph) YIELD node, community_id
SET node.community_id = community_id;
```

## Ranking (PageRank) Preprocessing

Replace `<Label>` and `<RELATIONSHIP>` with the actual node label and relationship type from the schema.

```
MATCH p=(n1:<Label>)-[r:<RELATIONSHIP>]->(n2:<Label>) WITH p
WITH project(p) AS subgraph
CALL pagerank.get(subgraph, 100, 0.85, 1e-5)
YIELD node, rank
SET node.rank = rank;
```

## Query-focused Summarization Preprocessing

This step generates a natural-language summary for each detected community and
stores it as a `Community` node. Adapt the prompt to fit your domain.

```
WITH "You are summarizing a community (cluster) of nodes from a knowledge graph." +
" Below are the titles and descriptions of nodes belonging to one community." +
" Write a concise summary in 2-4 paragraphs that:" +
" - Captures the main themes and recurring topics." +
" - Describes what this cluster of nodes represents." +
" - Does not list individual nodes; synthesize into a coherent narrative." +
" Once the analysis is done, propose a single label that best describes the community. ---"
AS COMMUNITY_SUMMARY_PROMPT_TEMPLATE
MATCH (n) WHERE n.community_id IS NOT NULL
WITH n.community_id AS c_id, count(n) AS c_count, collect(n) AS c_members,
llm.complete(reduce(s=COMMUNITY_SUMMARY_PROMPT_TEMPLATE, m IN collect(n) | s + m.title + " " + m.description + "; ")) AS c_summary
MERGE (community:Community {id: c_id, nodes_count: c_count})
SET community.summary = c_summary
WITH community, c_members
UNWIND c_members AS c_member
MERGE (c_member)-[:BELONGS_TO]->(community)
RETURN community.id AS community_id, community.summary AS community_summary;
```
50 changes: 50 additions & 0 deletions memgraph-agentic-graphrag/references/text_search.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Text Search

### Query the text index

Search a specific property (replace `entitySearch` with your index name and `<value>` with the search term):

```cypher
CALL text_search.search('entitySearch', 'data.title:<value>')
YIELD node, score
RETURN node.title AS title, score
ORDER BY score DESC LIMIT 10;
```

Search across **all** indexed properties (no property prefix needed):

```cypher
CALL text_search.search_all('entitySearch', '<value>')
YIELD node, score
RETURN node.title AS title, score
ORDER BY score DESC LIMIT 10;
```

Use **boolean expressions** to combine conditions:

```cypher
CALL text_search.search('entitySearch', '(data.title:<term1> OR data.title:<term2>) AND data.description:<term3>')
YIELD node, score
RETURN node.title AS title, score
ORDER BY score DESC LIMIT 10;
```

Use **regex search** to match patterns across all properties:

```cypher
CALL text_search.regex_search('entitySearch', '<pattern>.*')
YIELD node, score
RETURN node.title AS title, score
ORDER BY score DESC LIMIT 10;
```

### Notes

- Text indices are powered by the [Tantivy](https://github.com/quickwit-oss/tantivy) full-text search engine.
- Only properties of type `String`, `Integer`, `Float`, or `Boolean` are indexed.
- Changes made within the same transaction are **not** visible to the index — commit first.
- When referencing property names in search queries, always use the `data.` prefix (e.g., `data.title`).
- To drop an index: `DROP TEXT INDEX entitySearch;`

Run the `CREATE TEXT INDEX` command before attempting any text search queries on
the graph.
82 changes: 82 additions & 0 deletions memgraph-agentic-graphrag/references/vector_search.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
# Vector Search

### Query the vector index

Search for the nearest neighbors of a query vector:

```cypher
CALL vector_search.search('vs_index', 5, [1.0, 2.0, 3.0])
YIELD node, distance, similarity
RETURN node, similarity
ORDER BY similarity DESC;
```

Combine vector search with **graph traversals** to enrich results (replace `<Label>` and `<RELATIONSHIP>` with values from your schema):

```cypher
CALL vector_search.search('vs_index', 5, [1.0, 2.0, 3.0])
YIELD node, similarity
MATCH (entity:<Label>)-[:<RELATIONSHIP>]->(node)
RETURN entity.title AS title, similarity
ORDER BY similarity DESC;
```

Use the **embeddings module** to generate a query vector from text on the fly:

```cypher
CALL embeddings.text(['<natural language query>']) YIELD embeddings, success
CALL vector_search.search('vs_index', 5, embeddings[0]) YIELD node, similarity
MATCH (entity:<Label>)-[:<RELATIONSHIP>]->(node)
RETURN entity.title AS title, similarity
ORDER BY similarity DESC;
```

Search a **vector index on edges**:

```cypher
CALL vector_search.search_edges('edge_vs_index', 5, [1.0, 2.0, 3.0])
YIELD edges, distance, similarity
RETURN edges, similarity
ORDER BY similarity DESC;
```

Compute **cosine similarity** between two vectors without an index:

```cypher
RETURN vector_search.cosine_similarity([1.0, 2.0], [1.0, 3.0]) AS similarity;
```

Inspect the current state of all vector indices:

```cypher
CALL vector_search.show_index_info() YIELD * RETURN *;
```

### Similarity metrics

| Metric | Description |
|------------|---------------------------------------------------|
| l2sq | Squared Euclidean distance (default) |
| cos | Cosine similarity |
| ip | Inner product (dot product) |
| haversine | Haversine distance (suitable for geographic data) |
| pearson | Pearson correlation coefficient |
| divergence | A divergence-based metric |
| hamming | Hamming distance |
| tanimoto | Tanimoto coefficient |
| sorensen | Sorensen-Dice coefficient |
| jaccard | Jaccard index |

### Notes

- Vector indices are powered by [USearch](https://github.com/unum-cloud/usearch).
- Memgraph uses `READ_UNCOMMITTED` isolation specifically for vector indices; all other ACID guarantees remain intact.
- `dimension` and `capacity` are **mandatory** when creating an index.
- The default metric is `l2sq` and the default scalar kind is `f32`.
- Dropping a single-store vector index rewrites all vectors back into the property store — this can be slow and memory-intensive on large datasets.
- To drop an index: `DROP VECTOR INDEX vs_index;`

Run the `CREATE VECTOR INDEX` command before attempting any vector search queries
on the graph.

For more details visit https://memgraph.com/docs/querying/vector-search.