This is a sample application written in Python that demonstrates how use LlamaIndex to build RAG (Retrieval-Augmented Generation) applications. By using the LlamaIndex SQLJoinQueryEngine, the application can query a PostgreSQL-compatible YugabyteDB database from natural language. It can then infer whether to query a secondary vector index to fetch documents. In this case, the secondary index contains the Wikipedia pages of S&P 500 companies.
- Install Python3
- Install Docker
Download the application and provide settings specific to your deployment:
-
Clone the repository.
git clone https://github.com/YugabyteDB-Samples/yugabytedb-llamaindex-sp500-search.git
-
Install the application dependencies.
Dependencies can be installed in a virtual environment, or globally on your machine.
-
Option 1 (recommended): Install Dependencies from requirements.txt in virtual environment
python3 -m venv yb-llamaindex-env source yb-llamaindex-env/bin/activate pip install -r requirements.txt # NOTE: Users with M1 Mac machines should use requirements-m1.txt instead: # pip install -r requirements-m1.txt
-
Option 2: Install Dependencies Globally
pip install llama-index pip install psycopg2 # NOTE: Users with M1 Mac machines should install the psycopg2 binary instead: # pip install psycopg2-binary pip install python-dotenv
-
-
Create an OpenAI API Key and store it's value in a secure location. This will be used to connect the application to the LLM to generate SQL queries, infer results and generate the proper response.
-
Configure the application environment variables in
{project_directory/.env}
.
YugabyteDB is a PostgreSQL-compatible distributed database.
Start a 3-node YugabyteDB cluster in Docker (or feel free to use another deployment option):
# NOTE: if the ~/yb_docker_data already exists on your machine, delete and re-create it
mkdir ~/yb_docker_data
docker network create custom-network
docker run -d --name yugabytedb-node1 --net custom-network \
-p 15433:15433 -p 7001:7000 -p 9001:9000 -p 5433:5433 \
-v ~/yb_docker_data/node1:/home/yugabyte/yb_data --restart unless-stopped \
yugabytedb/yugabyte:2.20.1.0-b97 \
bin/yugabyted start \
--base_dir=/home/yugabyte/yb_data --background=false
docker run -d --name yugabytedb-node2 --net custom-network \
-p 15434:15433 -p 7002:7000 -p 9002:9000 -p 5434:5433 \
-v ~/yb_docker_data/node2:/home/yugabyte/yb_data --restart unless-stopped \
yugabytedb/yugabyte:2.20.1.0-b97 \
bin/yugabyted start --join=yugabytedb-node1 \
--base_dir=/home/yugabyte/yb_data --background=false
docker run -d --name yugabytedb-node3 --net custom-network \
-p 15435:15433 -p 7003:7000 -p 9003:9000 -p 5435:5433 \
-v ~/yb_docker_data/node3:/home/yugabyte/yb_data --restart unless-stopped \
yugabytedb/yugabyte:2.20.1.0-b97 \
bin/yugabyted start --join=yugabytedb-node1 \
--base_dir=/home/yugabyte/yb_data --background=false
The database connectivity settings are provided in the {project_dir}/.env
file and do not need to be changed if you started the cluster with the preceding command.
Navigate to the YugabyteDB UI to confirm that the database is up and running, at http://127.0.0.1:15433.
This application requires a database table with financial information for companies in the S&P 500. This schema includes a companies
table. It also creates a read-only user role to prevent any destructive actions while querying the database directly from LlamaIndex.
-
Copy the schema to the first node's Docker container.
docker cp {project_dir}/sql/schema.sql yugabytedb-node1:/home
-
Copy the seed data file to the Docker container.
docker cp {project_dir}/sql/data.sql yugabytedb-node1:/home
-
Execute the SQL files against the database.
docker exec -it yugabytedb-node1 bin/ysqlsh -h yugabytedb-node1 -f /home/schema.sql docker exec -it yugabytedb-node1 bin/ysqlsh -h yugabytedb-node1 -f /home/data.sql
This command-line application takes an input in natural language and returns a response from LlamaIndex.
- Start the server.
python3 index.py
What is your question?
- Provide a relevant question. For instance:
What is your question?
Provide a detailed company history for the company with the highest marketcap.
Querying SQL database: The first choice seems more relevant as it mentions translating a natural language query into a SQL query over a table containing companies' stats. This could potentially include the company with the highest marketcap and provide a detailed history. The second choice is more about answering semantic questions, which doesn't necessarily imply detailed company history.
SQL query: SELECT *
FROM companies
WHERE marketcap = (SELECT MAX(marketcap) FROM companies)
SQL response: The company with the highest marketcap is Microsoft Corporation. It was founded on April 4, 1975, and is headquartered at One Microsoft Way, Redmond, Washington, United States. Microsoft is a technology company that specializes in software infrastructure. It has a marketcap of $1,043,526,401,920 and employs 221,000 people. The company's contact number is 425-882-8080.
Transformed query given SQL response: Can you provide more details about the key products and services offered by Microsoft Corporation?
query engine response: Microsoft Corporation offers a wide range of products and services. Some of its key products include operating systems such as Windows, which is used by millions of individuals and businesses worldwide. Microsoft Office Suite is another popular product, which includes applications like Word, Excel, PowerPoint, and Outlook for productivity and communication purposes. The company also offers cloud-based services through its Azure platform, providing infrastructure, analytics, and other solutions for businesses. Additionally, Microsoft develops and sells hardware devices like the Xbox gaming console and Surface tablets. It also provides enterprise software solutions, developer tools, and various other products and services to cater to the needs of different industries and customers.
Final response: The company with the highest market cap is Microsoft Corporation. It was founded on April 4, 1975, and is headquartered at One Microsoft Way, Redmond, Washington, United States. Microsoft is a technology company that specializes in software infrastructure. It has a market cap of $1,043,526,401,920 and employs 221,000 people. The company's contact number is 425-882-8080.
Microsoft Corporation offers a wide range of products and services. Some of its key products include operating systems such as Windows, which is used by millions of individuals and businesses worldwide. Microsoft Office Suite is another popular product, which includes applications like Word, Excel, PowerPoint, and Outlook for productivity and communication purposes. The company also offers cloud-based services through its Azure platform, providing infrastructure, analytics, and other solutions for businesses. Additionally, Microsoft develops and sells hardware devices like the Xbox gaming console and Surface tablets. It also provides enterprise software solutions, developer tools, and various other products and services to cater to the needs of different industries and customers.