memgraph · gitbuda · Mar 1, 2026 · Mar 1, 2026
diff --git a/memgraph-agentic-graphrag/SKILL.md b/memgraph-agentic-graphrag/SKILL.md
@@ -0,0 +1,66 @@
+---
+name: memgraph-agentic-graphrag
+description: Answers questions over any knowledge graph stored in Memgraph using agentic GraphRAG techniques (text2Cypher; vector/text search also called local graph search; query-focused summarization).
+---
+
+## Deterministic Analytical Questions - Text2Cypher
+
+When a user asks a question that requires precise information from the graph—for
+example, counts, aggregations, or analytical queries where exact data is needed—use
+a **text2Cypher** approach:
+
+1. **Generate a Cypher query** from the natural language question, based on the
+   graph schema (use `SHOW SCHEMA INFO;` query to get the schema).
+2. **Execute the query** using the Memgraph MCP tool `run_query` when the
+   workspace is connected to Memgraph.
+3. **Return the result** to the user with a clear summary of the findings.
+
+Example: if asked "How many nodes of type X exist?", generate and run:
+```cypher
+MATCH (n:X) RETURN n.name AS name, count(n) AS count ORDER BY count DESC;
+```
+Then report back the result to the user.
+
+## Similar entities and semantic search - Local Graph Search
+
+When a user requests to find entities similar to a given one, or to determine
+which nodes are related to a concept, employ vector or text search combined with
+graph traversals to discover the most relevant results.
+```
+CALL embeddings.text(['<user query or entity description>']) YIELD embeddings, success
+CALL vector_search.search('vs_index', 5, embeddings[0]) YIELD distance, node, similarity
+WITH node AS chunk, similarity
+MATCH (entity:<Label>)-[:HAS_CHUNK]->(chunk)
+RETURN entity.title, similarity ORDER BY similarity DESC LIMIT 5;
+```
+
+Replace `<Label>` with the actual node label from the schema.
+
+See [text search reference](references/text_search.md).
+See [vector search reference](references/vector_search.md).
+
+## Broad questions - Query-focuse Summarization
+
+When a user asks a broad question, consider using query-focused summarization
+over pre-computed community summaries (map-reduce over the graph).
+
+```
+WITH "{{the_user_question}}" AS USER_QUESTION
+// 1. Identify relevant communities (Thresholding)
+MATCH (c:Community)
+    WHERE c.nodes_count > 5  // Optional: only use significant communities
+// 2. Map Step: Generate a partial answer for EACH community
+WITH USER_QUESTION, c,
+     llm.complete("How does the following community summary: " + c.summary + " relate to: " + USER_QUESTION) AS partial_answer
+  WHERE partial_answer IS NOT NULL AND partial_answer <> ""
+WITH USER_QUESTION, collect(partial_answer) AS map_results
+// 3. Reduce Step: Synthesize the final consolidated answer
+WITH "The following are partial answers from different thematic clusters of " +
+     " a dataset. Synthesize them into a single, cohesive response to the " +
+     "original query: " + USER_QUESTION +
+      "\n\nPartial Answers:\n" +
+      reduce(s = "", res IN map_results | s + "- " + res + "\n") AS reduce_prompt
+RETURN llm.complete(reduce_prompt) AS final_answer;
+```
+
+If the graph is missing required preprocessing, see [preprocessing](references/preprocessing.md).
diff --git a/memgraph-agentic-graphrag/references/preprocessing.md b/memgraph-agentic-graphrag/references/preprocessing.md
@@ -0,0 +1,101 @@
+## Retrieve Schema
+
+Schema is useful in almost all GraphRAG cases. To get the schema run the `SHOW
+SCHEMA INFO;` query (there is probably an MCP tool available to do that).
+
+## Text Search Preprocessing
+
+To use text search for finding nodes whose properties contain specific text, you
+first need to create a text index on the target label and (optionally) specify
+which properties to index.
+
+### Create a text index
+
+Index **all** text-indexable properties on a label:
+
+```cypher
+CREATE TEXT INDEX entitySearch ON :<Label>;
+```
+
+Or index only **specific** properties:
+
+```cypher
+CREATE TEXT INDEX entitySearch ON :<Label>(title, description);
+```
+
+Replace `<Label>` with the actual node label from the schema (e.g., `Document`, `Product`, `Article`).
+
+## Vector Search Preprocessing
+
+To use vector search for finding similar content in your graph, you first need to:
+
+1. **Create a vector index** on the target node label (e.g., `Chunk`) and specify the embedding property.
+2. **Generate and store embeddings** for each node you want searchable.
+
+Here's how you can do both steps in Memgraph:
+
+```cypher
+-- Step 1: Create the vector index for your embeddings
+CREATE VECTOR INDEX vs_index ON :Chunk(embedding) WITH CONFIG {"dimension": <embedding_dimension>, "capacity": <expected_node_count>};
+
+-- Step 2: Compute sentence embeddings for all Chunk nodes
+MATCH (c:Chunk)
+WITH collect(c) AS chunks
+CALL embeddings.node_sentence(
+  chunks,
+  {excluded_properties: ["<id_property>", "<metadata_property>"]}
+) YIELD success;
+```
+
+- The first command sets up a vector index called `vs_index` for the `embedding` property of `Chunk` nodes.
+- The second part computes embeddings for every `Chunk` node, excluding properties that should not contribute to the semantic representation (e.g. IDs, internal metadata).
+
+Run these commands before attempting any vector search queries on the graph.
+
+## Communities Preprocessing
+
+Replace `<Label>` and `<RELATIONSHIP>` with the actual node label and relationship type from the schema.
+
+```
+MATCH p=(n1:<Label>)-[r:<RELATIONSHIP>]->(n2:<Label>) WITH p
+WITH project(p) AS subgraph
+CALL community_detection.get(subgraph) YIELD node, community_id
+SET node.community_id = community_id;
+```
+
+## Ranking (PageRank) Preprocessing
+
+Replace `<Label>` and `<RELATIONSHIP>` with the actual node label and relationship type from the schema.
+
+```
+MATCH p=(n1:<Label>)-[r:<RELATIONSHIP>]->(n2:<Label>) WITH p
+WITH project(p) AS subgraph
+CALL pagerank.get(subgraph, 100, 0.85, 1e-5)
+YIELD node, rank
+SET node.rank = rank;
+```
+
+## Query-focused Summarization Preprocessing
+
+This step generates a natural-language summary for each detected community and
+stores it as a `Community` node. Adapt the prompt to fit your domain.
+
+```
+WITH "You are summarizing a community (cluster) of nodes from a knowledge graph." +
+     " Below are the titles and descriptions of nodes belonging to one community." +
+     " Write a concise summary in 2-4 paragraphs that:" +
+     " - Captures the main themes and recurring topics." +
+     " - Describes what this cluster of nodes represents." +
+     " - Does not list individual nodes; synthesize into a coherent narrative." +
+     " Once the analysis is done, propose a single label that best describes the community. ---"
+     AS COMMUNITY_SUMMARY_PROMPT_TEMPLATE
+MATCH (n) WHERE n.community_id IS NOT NULL
+WITH n.community_id AS c_id, count(n) AS c_count, collect(n) AS c_members,
+     llm.complete(reduce(s=COMMUNITY_SUMMARY_PROMPT_TEMPLATE, m IN collect(n) | s + m.title + " " + m.description + "; ")) AS c_summary
+MERGE (community:Community {id: c_id, nodes_count: c_count})
+SET community.summary = c_summary
+WITH community, c_members
+UNWIND c_members AS c_member
+MERGE (c_member)-[:BELONGS_TO]->(community)
+RETURN community.id AS community_id, community.summary AS community_summary;
+```
diff --git a/memgraph-agentic-graphrag/references/text_search.md b/memgraph-agentic-graphrag/references/text_search.md
@@ -0,0 +1,50 @@
+# Text Search
+
+### Query the text index
+
+Search a specific property (replace `entitySearch` with your index name and `<value>` with the search term):
+
+```cypher
+CALL text_search.search('entitySearch', 'data.title:<value>')
+YIELD node, score
+RETURN node.title AS title, score
+ORDER BY score DESC LIMIT 10;
+```
+
+Search across **all** indexed properties (no property prefix needed):
+
+```cypher
+CALL text_search.search_all('entitySearch', '<value>')
+YIELD node, score
+RETURN node.title AS title, score
+ORDER BY score DESC LIMIT 10;
+```
+
+Use **boolean expressions** to combine conditions:
+
+```cypher
+CALL text_search.search('entitySearch', '(data.title:<term1> OR data.title:<term2>) AND data.description:<term3>')
+YIELD node, score
+RETURN node.title AS title, score
+ORDER BY score DESC LIMIT 10;
+```
+
+Use **regex search** to match patterns across all properties:
+
+```cypher
+CALL text_search.regex_search('entitySearch', '<pattern>.*')
+YIELD node, score
+RETURN node.title AS title, score
+ORDER BY score DESC LIMIT 10;
+```
+
+### Notes
+
+- Text indices are powered by the [Tantivy](https://github.com/quickwit-oss/tantivy) full-text search engine.
+- Only properties of type `String`, `Integer`, `Float`, or `Boolean` are indexed.
+- Changes made within the same transaction are **not** visible to the index — commit first.
+- When referencing property names in search queries, always use the `data.` prefix (e.g., `data.title`).
+- To drop an index: `DROP TEXT INDEX entitySearch;`
+
+Run the `CREATE TEXT INDEX` command before attempting any text search queries on
+the graph.
diff --git a/memgraph-agentic-graphrag/references/vector_search.md b/memgraph-agentic-graphrag/references/vector_search.md
@@ -0,0 +1,82 @@
+# Vector Search
+
+### Query the vector index
+
+Search for the nearest neighbors of a query vector:
+
+```cypher
+CALL vector_search.search('vs_index', 5, [1.0, 2.0, 3.0])
+YIELD node, distance, similarity
+RETURN node, similarity
+ORDER BY similarity DESC;
+```
+
+Combine vector search with **graph traversals** to enrich results (replace `<Label>` and `<RELATIONSHIP>` with values from your schema):
+
+```cypher
+CALL vector_search.search('vs_index', 5, [1.0, 2.0, 3.0])
+YIELD node, similarity
+MATCH (entity:<Label>)-[:<RELATIONSHIP>]->(node)
+RETURN entity.title AS title, similarity
+ORDER BY similarity DESC;
+```
+
+Use the **embeddings module** to generate a query vector from text on the fly:
+
+```cypher
+CALL embeddings.text(['<natural language query>']) YIELD embeddings, success
+CALL vector_search.search('vs_index', 5, embeddings[0]) YIELD node, similarity
+MATCH (entity:<Label>)-[:<RELATIONSHIP>]->(node)
+RETURN entity.title AS title, similarity
+ORDER BY similarity DESC;
+```
+
+Search a **vector index on edges**:
+
+```cypher
+CALL vector_search.search_edges('edge_vs_index', 5, [1.0, 2.0, 3.0])
+YIELD edges, distance, similarity
+RETURN edges, similarity
+ORDER BY similarity DESC;
+```
+
+Compute **cosine similarity** between two vectors without an index:
+
+```cypher
+RETURN vector_search.cosine_similarity([1.0, 2.0], [1.0, 3.0]) AS similarity;
+```
+
+Inspect the current state of all vector indices:
+
+```cypher
+CALL vector_search.show_index_info() YIELD * RETURN *;
+```
+
+### Similarity metrics
+
+| Metric     | Description                                       |
+|------------|---------------------------------------------------|
+| l2sq       | Squared Euclidean distance (default)              |
+| cos        | Cosine similarity                                 |
+| ip         | Inner product (dot product)                       |
+| haversine  | Haversine distance (suitable for geographic data) |
+| pearson    | Pearson correlation coefficient                   |
+| divergence | A divergence-based metric                         |
+| hamming    | Hamming distance                                  |
+| tanimoto   | Tanimoto coefficient                              |
+| sorensen   | Sorensen-Dice coefficient                         |
+| jaccard    | Jaccard index                                     |
+
+### Notes
+
+- Vector indices are powered by [USearch](https://github.com/unum-cloud/usearch).
+- Memgraph uses `READ_UNCOMMITTED` isolation specifically for vector indices; all other ACID guarantees remain intact.
+- `dimension` and `capacity` are **mandatory** when creating an index.
+- The default metric is `l2sq` and the default scalar kind is `f32`.
+- Dropping a single-store vector index rewrites all vectors back into the property store — this can be slow and memory-intensive on large datasets.
+- To drop an index: `DROP VECTOR INDEX vs_index;`
+
+Run the `CREATE VECTOR INDEX` command before attempting any vector search queries
+on the graph.
+
+For more details visit https://memgraph.com/docs/querying/vector-search.