|
16 | 16 |
|
17 | 17 | </p>
|
18 | 18 |
|
19 |
| -Ask questions about biosynthetic gene clusters in your genome dataset via LLMs using Retrieval-Augmented Generation. |
| 19 | +Ask questions about BGCs in your genome dataset generated by [`BGCFlow`](https://github.com/NBChub/bgcflow) using Large Language Models (LLMs). |
20 | 20 |
|
21 |
| -* Free software: MIT |
22 |
| -* Documentation: <https://nbchub.github.io/chatBGC/> |
| 21 | +This python package utilizes vector-based Retrieval-Augmented Generation (RAG) to translate natural language (English) to SQL Queries and is trained to query information from `antiSMASH` and other genome mining tools included in the [`BGCFlow`](https://github.com/NBChub/bgcflow) pipelines. |
| 22 | + |
| 23 | + |
| 24 | + |
| 25 | +## Quickstart |
| 26 | +For a quick use of `chatBGC`, you will need an [OpenAI API Key](https://platform.openai.com/api-keys) and the DuckDB database of your `BGCFlow` run (generated using `bgcflow build database`). See [BGCFlow Wiki](https://github.com/NBChub/bgcflow/wiki/04-Building-and-Serving-OLAP-Database) for more details on creating the database. |
| 27 | + |
| 28 | +```bash |
| 29 | +# Setup API Key |
| 30 | +OPENAI_API_KEY="<change this to your API Key>" |
| 31 | + |
| 32 | +# Create new folder to set up duckdb and vector database |
| 33 | +mkdir chatbgc |
| 34 | +cd chatbgc |
| 35 | + |
| 36 | +# Copy the database build using BGCFlow to this directory |
| 37 | +BGCFLOW_DIR="<change this to your BGCFlow directory>" |
| 38 | +PROJECT_NAME="<change this to your BGCFlow project name>" |
| 39 | +ANTISMASH_VERSION="7.1.0" # change this to the correct antiSMASH version used in your BGCFlow run. Only supports version 7.1.0 or above |
| 40 | +cp $BGCFLOW_DIR/data/processed/$PROJECT_NAME/dbt/antiSMASH_7.1.0/dbt_bgcflow.duckdb dbt_bgcflow.duckdb -n |
| 41 | + |
| 42 | +# Create python environment and install ChatBGC |
| 43 | +python3 -m venv chatbgc_env |
| 44 | +source chatbgc_env/bin/activate |
| 45 | +python3 -m pip install --upgrade pip |
| 46 | +pip install git+https://github.com/NBChub/chatBGC.git |
| 47 | + |
| 48 | +# Setup variable environment / secrets |
| 49 | +touch .env |
| 50 | +echo "export OPENAI_API_KEY=$OPENAI_API_KEY" > .env |
| 51 | +source .env |
| 52 | + |
| 53 | +# Train ChatBGC (Do it once) |
| 54 | +chatbgc train --llm_type openai_chat --model gpt-4o dbt_bgcflow.duckdb |
| 55 | + |
| 56 | +# Run ChatBGC |
| 57 | +chatbgc run --llm_type openai_chat --model gpt-4o dbt_bgcflow.duckdb |
| 58 | +``` |
23 | 59 |
|
24 | 60 | ## Configuration
|
25 | 61 |
|
@@ -106,8 +142,9 @@ chatbgc run <path_to_duckdb>
|
106 | 142 | chatbgc run --llm_type openai_chat <path_to_duckdb>
|
107 | 143 | ```
|
108 | 144 |
|
109 |
| -## Development guide (TO DO) |
110 |
| -- Create duckdb schema |
| 145 | +## Notes |
| 146 | +* Free software: MIT |
| 147 | +* Documentation: <https://nbchub.github.io/chatBGC/> |
111 | 148 |
|
112 | 149 | ## Credits
|
113 | 150 |
|
|
0 commit comments