Skip to content

Commit ee86f70

Browse files
docs: add quickstart and workflow illustration
1 parent f3ab623 commit ee86f70

File tree

2 files changed

+42
-5
lines changed

2 files changed

+42
-5
lines changed

README.md

Lines changed: 42 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -16,10 +16,46 @@
1616

1717
</p>
1818

19-
Ask questions about biosynthetic gene clusters in your genome dataset via LLMs using Retrieval-Augmented Generation.
19+
Ask questions about BGCs in your genome dataset generated by [`BGCFlow`](https://github.com/NBChub/bgcflow) using Large Language Models (LLMs).
2020

21-
* Free software: MIT
22-
* Documentation: <https://nbchub.github.io/chatBGC/>
21+
This python package utilizes vector-based Retrieval-Augmented Generation (RAG) to translate natural language (English) to SQL Queries and is trained to query information from `antiSMASH` and other genome mining tools included in the [`BGCFlow`](https://github.com/NBChub/bgcflow) pipelines.
22+
23+
![RAG](chatbgc/assets/3_RAG.png)
24+
25+
## Quickstart
26+
For a quick use of `chatBGC`, you will need an [OpenAI API Key](https://platform.openai.com/api-keys) and the DuckDB database of your `BGCFlow` run (generated using `bgcflow build database`). See [BGCFlow Wiki](https://github.com/NBChub/bgcflow/wiki/04-Building-and-Serving-OLAP-Database) for more details on creating the database.
27+
28+
```bash
29+
# Setup API Key
30+
OPENAI_API_KEY="<change this to your API Key>"
31+
32+
# Create new folder to set up duckdb and vector database
33+
mkdir chatbgc
34+
cd chatbgc
35+
36+
# Copy the database build using BGCFlow to this directory
37+
BGCFLOW_DIR="<change this to your BGCFlow directory>"
38+
PROJECT_NAME="<change this to your BGCFlow project name>"
39+
ANTISMASH_VERSION="7.1.0" # change this to the correct antiSMASH version used in your BGCFlow run. Only supports version 7.1.0 or above
40+
cp $BGCFLOW_DIR/data/processed/$PROJECT_NAME/dbt/antiSMASH_7.1.0/dbt_bgcflow.duckdb dbt_bgcflow.duckdb -n
41+
42+
# Create python environment and install ChatBGC
43+
python3 -m venv chatbgc_env
44+
source chatbgc_env/bin/activate
45+
python3 -m pip install --upgrade pip
46+
pip install git+https://github.com/NBChub/chatBGC.git
47+
48+
# Setup variable environment / secrets
49+
touch .env
50+
echo "export OPENAI_API_KEY=$OPENAI_API_KEY" > .env
51+
source .env
52+
53+
# Train ChatBGC (Do it once)
54+
chatbgc train --llm_type openai_chat --model gpt-4o dbt_bgcflow.duckdb
55+
56+
# Run ChatBGC
57+
chatbgc run --llm_type openai_chat --model gpt-4o dbt_bgcflow.duckdb
58+
```
2359

2460
## Configuration
2561

@@ -106,8 +142,9 @@ chatbgc run <path_to_duckdb>
106142
chatbgc run --llm_type openai_chat <path_to_duckdb>
107143
```
108144
109-
## Development guide (TO DO)
110-
- Create duckdb schema
145+
## Notes
146+
* Free software: MIT
147+
* Documentation: <https://nbchub.github.io/chatBGC/>
111148
112149
## Credits
113150

chatbgc/assets/3_RAG.png

547 KB
Loading

0 commit comments

Comments
 (0)