This guide explains how to set up the Stage 1 proof-of-concept for the Natural Language to BigQuery SQL extension for Vertex AI.
- Google Cloud Platform account with billing enabled
gcloud
CLI tool installed and configured- Python 3.9+ with pip
- BigQuery dataset with sample data
- Permissions to:
- Create/deploy Cloud Functions
- Create/manage BigQuery datasets and tables
- Use Vertex AI APIs
- Create Cloud Storage buckets
- Manage service accounts and IAM permissions
git clone https://github.com/yourusername/nl-to-bigquery-extension.git
cd nl-to-bigquery-extension
- Copy the example environment file:
cp .env.example .env
- Edit the
.env
file with your specific configuration:
# Google Cloud settings
GCP_PROJECT_ID=your-project-id
BIGQUERY_DATASET_ID=your_dataset_id
# Schema metadata
# Format: table_name,column_name,data_type,description
SCHEMA_METADATA=sales,date,DATE,The date of the sale;sales,product_id,STRING,Product identifier;sales,customer_id,STRING,Customer identifier;sales,quantity,INTEGER,Number of units sold;sales,revenue,FLOAT,Total revenue from the sale
# Cloud Function settings
FUNCTION_NAME=nl-to-sql
FUNCTION_REGION=us-central1
FUNCTION_MEMORY=256MB
FUNCTION_TIMEOUT=60s
# Extension settings
EXTENSION_NAME=nl-to-bigquery
EXTENSION_DISPLAY_NAME="Natural Language to BigQuery"
EXTENSION_DESCRIPTION="Translate natural language questions to BigQuery SQL and get results"
- Update the schema metadata to match your actual BigQuery tables.
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
chmod +x scripts/deploy_function.sh
./scripts/deploy_function.sh
This script will:
- Create a service account for the Cloud Function
- Grant necessary permissions to query BigQuery and use Vertex AI
- Deploy the function to your GCP project
- Save the function URL to a file for the next step
chmod +x scripts/register_extension.sh
./scripts/register_extension.sh
This script will:
- Create a Cloud Storage bucket for the extension artifacts
- Update the OpenAPI specification with your function URL
- Upload the OpenAPI spec to the bucket
- Register the extension with Vertex AI
- Open the Vertex AI UI in the Google Cloud Console
- Navigate to Extensions
- Find your "Natural Language to BigQuery" extension
- Test it with example questions like:
- "What were the total sales last month?"
- "Show me the top 5 customers by revenue"
- "How many products were sold in each category?"
The core functionality uses LangChain with Vertex AI to translate natural language to SQL:
LangChainSQLParser
class creates an in-memory SQLite database with your schema- It uses this schema to help the language model generate appropriate SQL
- The parser leverages the ChatVertexAI model to generate SQL queries
The BigQuery connector handles:
- Executing the generated SQL queries against your BigQuery dataset
- Formatting the results for the API response
- Providing schema information and table listings
The API handler orchestrates the process:
- Receives natural language questions via HTTP requests
- Passes them to the LangChain parser to generate SQL
- Executes the SQL using the BigQuery connector
- Returns the results and metadata to the caller
After testing and verifying this proof-of-concept:
- Improve the SQL generation with fine-tuning or more sophisticated prompts
- Add support for multiple datasets and complex joins
- Implement better error handling and feedback mechanisms
- Add visualization capabilities for query results
- Build a user interface for interacting with the extension
-
Missing permissions: Ensure your service account has the proper roles (BigQuery Data Viewer, BigQuery Job User, Vertex AI User)
-
Schema errors: Verify your schema metadata matches your actual BigQuery tables
-
Extension registration fails: Check the bucket permissions and OpenAPI specification format
-
LangChain errors: Make sure Vertex AI APIs are enabled in your project
- Check Cloud Function logs:
gcloud functions logs read --gen2 $FUNCTION_NAME
- Test the parser locally with the sample questions
- Manually invoke the API endpoint with curl or Postman to check responses