Skip to content

Commit fee7be9

Browse files
Update README.md
1 parent dafcd1c commit fee7be9

File tree

1 file changed

+12
-7
lines changed

1 file changed

+12
-7
lines changed

benchmarks/README.md

Lines changed: 12 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
## Benchmarks
22
We validate the benchmark results provided in [HippoRAG](https://arxiv.org/abs/2405.14831), as well as comparing with other methods:
3-
- NaiveRAG (vector dbs) using the embedder `text-embedding-3-small`
3+
- NaiveRAG (vector dbs) using the OpenAI embedder `text-embedding-3-small`
44
- [LightRAG](https://github.com/HKUDS/LightRAG)
55
- [GraphRAG](https://github.com/gusye1234/nano-graphrag) (we use the implementation provided by `nano-graphrag`, based on the original [Microsoft GraphRAG](https://github.com/microsoft/graphrag))
66

@@ -12,12 +12,14 @@ We validate the benchmark results provided in [HippoRAG](https://arxiv.org/abs/2
1212
| | VectorDB| 0.49| 0.32|
1313
| | LightRAG| 0.47| 0.32|
1414
| | GraphRAG| 0.75| 0.68|
15-
| |**Circlemind**| 0.96| 0.95|
15+
| |**Circlemind**| **0.96**| **0.95**|
1616
| 101||||
1717
| | VectorDB| 0.42| 0.23|
1818
| | LightRAG| 0.45| 0.28|
1919
| | GraphRAG| 0.73| 0.64|
20-
| |**Circlemind**| 0.93| 0.90|
20+
| |**Circlemind**| **0.93**| **0.90**|
21+
22+
**Circlemind is up to 4x more accurate than VectorDB RAG.**
2123

2224
**HotpotQA**
2325
| # Queries | Method | All queries % |
@@ -26,27 +28,30 @@ We validate the benchmark results provided in [HippoRAG](https://arxiv.org/abs/2
2628
| | VectorDB| 0.78|
2729
| | LightRAG| 0.55|
2830
| | GraphRAG| -*|
29-
| |**Circlemind**| 0.84|
31+
| |**Circlemind**| **0.84**|
3032

3133
*: crashes after half an hour of processing
3234

33-
We also briefly report the insertion times for the 2wikimultihopqa benchmark (~800 chunks):
35+
Below, find the insertion times for the 2wikimultihopqa benchmark (~800 chunks):
3436
| Method | Time (minutes) |
3537
|:--------:|-----------------:|
3638
| VectorDB| ~0.3|
3739
| LightRAG| ~25|
3840
| GraphRAG| ~40|
3941
|**Circlemind**| ~1.5|
4042

43+
**Circlemind is 27x faster than GraphRAG while also being over 40% more accurate in retrieval.**
44+
4145
### Run it yourself
4246
The scripts in this directory will generate and evaluate the 2wikimultihopqa datasets on a subsets of 51 and 101 queries with the same methodology as in the HippoRAG paper. In particular, we evaluate the retrieval capabilities of each method, mesauring the percentage of queries for which all the required evidence was retrieved. We preloaded the results so it is enough to run `evaluate_dbs.xx` to get the numbers. You can also run `create_dbs.xx` to regenerate the databases for the different methods.
47+
4348
A couple of NOTES:
4449
- you will need to set an OPENAI_API_KEY;
45-
- LightRAG and GraphRAG could take a while (~1 hour) to process;
50+
- LightRAG and GraphRAG could take a over an 1 hour to process and they can be expensive;
4651
- when pip installing LightRAG, not all dependencies are added; to run it we simply deleted all the imports of each missing dependency (since we use OpenAI they are not necessary).
4752
- we also benchmarked on the HotpotQA dataset (we will soon release the code for that as well).
4853

49-
The output should looks similar to the following (the exact numbers could vary based on your graph configuration)
54+
The output will look similar to the following (the exact numbers could vary based on your graph configuration)
5055
```
5156
Evaluation of the performance of different RAG methods on 2wikimultihopqa (51 queries)
5257

0 commit comments

Comments
 (0)