This is a tool designed to help users query and analyze advertising campaign data. It leverages advanced AI models to interpret user queries, identify relevant dataset columns, and generate SQL queries to extract meaningful insights from advertising data. This project integrates various technologies, including Google Generative AI and ClickHouse, to provide a seamless data analysis experience.
- Query Interpretation: Understands and enhances user queries related to advertising data.
- Dynamic SQL Generation: Automatically generates SQL queries based on user input and dataset schema.
- Data Extraction: Retrieves relevant data from a ClickHouse database.
- Answer Framing: Presents results in a user-friendly format.
- Google Generative AI: For query enhancement and text generation.
- ClickHouse: For querying and managing the advertising campaign data.
- Pandas: For data manipulation and analysis.
- Python-dotenv: For managing environment variables.
-
Clone the Repository:
git clone https://github.com/bmkjn/Chat-with-Columnar-Database.git cd Chat-with-Columnar-Database
-
Install Dependencies:
Create a virtual environment and install required packages
-
Environment Variables:
Create a
.env
file in the root directory with the following variables:GOOGLE_API_KEY=your_google_api_key LANGCHAIN_API_KEY=your_langchain_api_key
-
Database Configuration:
Ensure ClickHouse is running locally and accessible. Update the database configuration if necessary.
-
Load Dataset:
Ensure your dataset is available at
MASTER_FILES/Master_Final.csv
. -
Run the Script:
Execute the ChatwithDatabase.ipynb to start the data analysis process:
-
Query Examples:
- What languages is EENADU Newspaper available in?
- What is the Market for the top spent Channel?
The script will process these queries, generate SQL code, and provide results based on the dataset.
Here's an example of how the system processes a query:
-
User Query: "What is the Market for the top spent Channel?"
-
Generated SQL Code:
SELECT Market, Channel, SUM(Amount) AS TotalAmount FROM campaigns GROUP BY Market, Channel ORDER BY TotalAmount DESC LIMIT 1;
-
Query Result:
[["North", null, 40648499795.36029]]
-
Framed Answer:
The top-performing combination of market and channel in terms of total amount is the "North" market with no specific channel specified. This combination has generated a total amount of $40,648,499,795.36.