Skip to content

rittmananalytics/gemini_bigquery_nl2sql_extension

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Natural Language to BigQuery SQL - Implementation Guide

This guide explains how to set up the Stage 1 proof-of-concept for the Natural Language to BigQuery SQL extension for Vertex AI.

Prerequisites

  • Google Cloud Platform account with billing enabled
  • gcloud CLI tool installed and configured
  • Python 3.9+ with pip
  • BigQuery dataset with sample data
  • Permissions to:
    • Create/deploy Cloud Functions
    • Create/manage BigQuery datasets and tables
    • Use Vertex AI APIs
    • Create Cloud Storage buckets
    • Manage service accounts and IAM permissions

Step 1: Clone the Repository

git clone https://github.com/yourusername/nl-to-bigquery-extension.git
cd nl-to-bigquery-extension

Step 2: Configure Your Environment

  1. Copy the example environment file:
cp .env.example .env
  1. Edit the .env file with your specific configuration:
# Google Cloud settings
GCP_PROJECT_ID=your-project-id
BIGQUERY_DATASET_ID=your_dataset_id

# Schema metadata
# Format: table_name,column_name,data_type,description
SCHEMA_METADATA=sales,date,DATE,The date of the sale;sales,product_id,STRING,Product identifier;sales,customer_id,STRING,Customer identifier;sales,quantity,INTEGER,Number of units sold;sales,revenue,FLOAT,Total revenue from the sale

# Cloud Function settings
FUNCTION_NAME=nl-to-sql
FUNCTION_REGION=us-central1
FUNCTION_MEMORY=256MB
FUNCTION_TIMEOUT=60s

# Extension settings
EXTENSION_NAME=nl-to-bigquery
EXTENSION_DISPLAY_NAME="Natural Language to BigQuery"
EXTENSION_DESCRIPTION="Translate natural language questions to BigQuery SQL and get results"
  1. Update the schema metadata to match your actual BigQuery tables.

Step 3: Set Up Python Environment

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

Step 4: Deploy the Cloud Function

chmod +x scripts/deploy_function.sh
./scripts/deploy_function.sh

This script will:

  • Create a service account for the Cloud Function
  • Grant necessary permissions to query BigQuery and use Vertex AI
  • Deploy the function to your GCP project
  • Save the function URL to a file for the next step

Step 5: Register the Extension with Vertex AI

chmod +x scripts/register_extension.sh
./scripts/register_extension.sh

This script will:

  • Create a Cloud Storage bucket for the extension artifacts
  • Update the OpenAPI specification with your function URL
  • Upload the OpenAPI spec to the bucket
  • Register the extension with Vertex AI

Step 6: Test the Extension

  1. Open the Vertex AI UI in the Google Cloud Console
  2. Navigate to Extensions
  3. Find your "Natural Language to BigQuery" extension
  4. Test it with example questions like:
    • "What were the total sales last month?"
    • "Show me the top 5 customers by revenue"
    • "How many products were sold in each category?"

Understanding the Implementation

LangChain-based SQL Parser

The core functionality uses LangChain with Vertex AI to translate natural language to SQL:

  • LangChainSQLParser class creates an in-memory SQLite database with your schema
  • It uses this schema to help the language model generate appropriate SQL
  • The parser leverages the ChatVertexAI model to generate SQL queries

BigQuery Connector

The BigQuery connector handles:

  • Executing the generated SQL queries against your BigQuery dataset
  • Formatting the results for the API response
  • Providing schema information and table listings

API Handler

The API handler orchestrates the process:

  1. Receives natural language questions via HTTP requests
  2. Passes them to the LangChain parser to generate SQL
  3. Executes the SQL using the BigQuery connector
  4. Returns the results and metadata to the caller

Next Steps

After testing and verifying this proof-of-concept:

  1. Improve the SQL generation with fine-tuning or more sophisticated prompts
  2. Add support for multiple datasets and complex joins
  3. Implement better error handling and feedback mechanisms
  4. Add visualization capabilities for query results
  5. Build a user interface for interacting with the extension

Troubleshooting

Common Issues

  1. Missing permissions: Ensure your service account has the proper roles (BigQuery Data Viewer, BigQuery Job User, Vertex AI User)

  2. Schema errors: Verify your schema metadata matches your actual BigQuery tables

  3. Extension registration fails: Check the bucket permissions and OpenAPI specification format

  4. LangChain errors: Make sure Vertex AI APIs are enabled in your project

Debugging

  • Check Cloud Function logs: gcloud functions logs read --gen2 $FUNCTION_NAME
  • Test the parser locally with the sample questions
  • Manually invoke the API endpoint with curl or Postman to check responses

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published