A BigQuery AI-native competitive intelligence system that transforms Meta Ad Library data into strategic insights through progressive disclosure (L1βL4). Built entirely with BigQuery's native AI primitivesβno external ML infrastructure required.
Competition Highlight: "From 466 competitor candidates to 5 critical insights using only BigQuery AIβdetecting 73.7% copying similarity, identifying 6 untapped market opportunities, and forecasting competitive moves 30 days ahead."
- 10-Stage Pipeline: Automated competitor discovery β validation β analysis β progressive intelligence
- Real-Time Intelligence: Copying detection, creative fatigue analysis, and market gap identification
- Temporal Forecasting: 30-day competitive trend predictions with confidence intervals
- Multi-Dimensional Analysis: Audience, creative, channel, visual, and whitespace intelligence
- Progressive Disclosure: L1 executive insights β L4 detailed SQL dashboards
- 100% BigQuery Native: No external vector databases, ML services, or orchestration complexity
# Enable BigQuery, Storage, and Vertex AI APIs
gcloud services enable bigquery.googleapis.com
gcloud services enable storage.googleapis.com
gcloud services enable aiplatform.googleapis.com
gcloud services enable customsearch.googleapis.com# Set your project variables
export BQ_PROJECT="your-gcp-project-id"
export BQ_DATASET="competitive_intelligence"
# Create dataset
bq mk --dataset --location=US $BQ_PROJECT:$BQ_DATASET# Create bucket for media storage (visual intelligence)
export GCS_BUCKET="your-project-competitive-intel"
gsutil mb gs://$GCS_BUCKET# Authenticate with Google Cloud
gcloud auth application-default login
# Verify BigQuery access
bq query --use_legacy_sql=false 'SELECT 1 as test'- Visit ScrapeCreators.com
- Sign up and obtain your API key for Meta Ad Library access
- Note: This provides structured access to public Meta Ad Library data
- Create a Google Custom Search Engine
- Get API key from Google Cloud Console
- Used for automatic competitor discovery
Create .env file in project root:
# Required: BigQuery configuration
BQ_PROJECT=your-gcp-project-id
BQ_DATASET=competitive_intelligence
# Required: Storage configuration
GCS_BUCKET=your-project-competitive-intel
# Required: Meta Ad Library access
SC_API_KEY=your_scrapecreators_api_key
# Optional: Google Custom Search (for auto-discovery)
GOOGLE_CSE_API_KEY=your_google_cse_api_key
GOOGLE_CSE_ENGINE_ID=your_search_engine_id
# Optional: Advanced features
VERTEX_AI_REGION=us-central1# Clone repository
git clone https://github.com/your-username/bigquery-ai-competitive-intelligence.git
cd bigquery-ai-competitive-intelligence
# Install with pip
pip install .
# Or install with development dependencies
pip install .[dev]
# Or install everything including notebook extras
pip install .[all]# Clone repository
git clone https://github.com/your-username/bigquery-ai-competitive-intelligence.git
cd bigquery-ai-competitive-intelligence
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\\Scripts\\activate
# Install in development mode
pip install -e .[dev]# Launch Jupyter
jupyter notebook
# Open and run: notebooks/demo_competitive_intelligence.ipynbThe demo notebook walks through the complete pipeline using Warby Parker as an example.
# Run entire 10-stage pipeline
python -m src.pipeline.orchestrator --brand "Your Brand Name"
# With optional parameters
python -m src.pipeline.orchestrator \
--brand "Your Brand Name" \
--vertical "eyewear" \
--verbose
# Dry run for testing
python -m src.pipeline.orchestrator \
--brand "Test Brand" \
--dry-run# Run all stages sequentially with caching
python tests/stage_testing_framework.py \
--brand "Your Brand Name" \
--vertical "eyewear"
# Run a specific stage (uses cached results from previous stages)
python tests/stage_testing_framework.py \
--brand "Your Brand Name" \
--stage 1 # Discovery
python tests/stage_testing_framework.py \
--brand "Your Brand Name" \
--stage 2 # AI Curation
# Continue with stages 3-10...
python tests/stage_testing_framework.py --brand "Your Brand Name" --stage 3 # Ranking
python tests/stage_testing_framework.py --brand "Your Brand Name" --stage 4 # Ingestion
python tests/stage_testing_framework.py --brand "Your Brand Name" --stage 5 # Strategic Labeling
python tests/stage_testing_framework.py --brand "Your Brand Name" --stage 6 # Embeddings
python tests/stage_testing_framework.py --brand "Your Brand Name" --stage 7 # Visual Intelligence
python tests/stage_testing_framework.py --brand "Your Brand Name" --stage 8 # Strategic Analysis
python tests/stage_testing_framework.py --brand "Your Brand Name" --stage 9 # Multi-Dimensional Intelligence
python tests/stage_testing_framework.py --brand "Your Brand Name" --stage 10 # Enhanced Output
# Force re-run (ignore cache)
python tests/stage_testing_framework.py \
--brand "Your Brand Name" \
--stage 5 \
--force
# Clean tables before testing
python tests/stage_testing_framework.py \
--brand "Your Brand Name" \
--clean- Cached Results: Each stage caches its output for subsequent stages
- Independent Testing: Test any stage without re-running earlier stages
- Full Traceability: Results saved in
data/output/stage_tests/[test_id]/ - Debugging Support: Detailed logs and intermediate outputs
# View run ID from output
echo "Run ID displayed in pipeline output"
# Query BigQuery results
bq query --use_legacy_sql=false '
SELECT * FROM `'$BQ_PROJECT'.'$BQ_DATASET'.strategic_intelligence_[RUN_ID]`
WHERE signal_strength = "CRITICAL"
'
# Check output files
ls -la data/output/systematic_intelligence_*.json
ls -la data/output/interventions_*.json
ls -la data/output/whitespace_*.json
ls -la data/output/sql_dashboards_*/L1 Executive (5 critical insights):
SELECT insight_text, confidence, business_impact
FROM `your-project.competitive_intelligence.strategic_intelligence_[RUN_ID]`
WHERE disclosure_level = 'L1_EXECUTIVE'
ORDER BY composite_score DESCL2 Strategic (15 strategic signals):
SELECT * FROM `your-project.competitive_intelligence.audience_intelligence_[RUN_ID]`L3 Interventions (25 actionable recommendations):
SELECT * FROM `your-project.competitive_intelligence.interventions_[RUN_ID]`L4 Dashboards (Full analytical detail):
-- See all generated SQL dashboards in:
-- data/output/sql_dashboards_[RUN_ID]/- Technical Architecture - High-level system design and BigQuery AI integration
- Pipeline Architecture - Detailed 10-stage pipeline specifications
- BigQuery Command Reference - Complete AI primitive usage guide
- Demo Notebook - Interactive pipeline walkthrough
- Demo Video - Video of Demo Notebook walkthrough
- Kaggle Competition Writeup - Complete competition submission with innovation highlights
Using ML.DISTANCE() on 768-dimensional embeddings with temporal lag analysis:
-- Detect copying with mathematical proof
SELECT
ML.DISTANCE(a.embedding, b.embedding, 'COSINE') as similarity,
DATE_DIFF(b.start_date, a.start_date, DAY) as copy_lag
WHERE similarity < 0.3 -- 70%+ similarity thresholdResult: Detected Zenni Optical copying Warby Parker at 73.7% similarity
Multi-dimensional whitespace detection across messaging Γ funnel Γ persona:
-- Find untapped market opportunities
GROUP BY messaging_angle, funnel_stage, target_persona
HAVING competitor_count = 0 -- VIRGIN_TERRITORYResult: Identified 6 market opportunities worth $150K-300K each
Prevents executive information overload through smart filtering:
- L1: 5 critical insights (80%+ confidence)
- L2: 15 strategic signals (60%+ confidence)
- L3: 25 actionable interventions (high actionability)
- L4: Complete analytical transparency
30-day competitive predictions using ML.FORECAST():
ML.FORECAST(
MODEL competitive_trends,
STRUCT(30 AS horizon, 0.95 AS confidence_level)
)# Detect copying patterns
from src.intelligence.framework import CompetitiveCopyingDetector
detector = CompetitiveCopyingDetector(project_id=BQ_PROJECT)
threats = detector.detect_copying_threats(
target_brand="Your Brand",
similarity_threshold=0.3
)# Find market gaps
from src.intelligence.framework import Enhanced3DWhiteSpaceDetector
whitespace = Enhanced3DWhiteSpaceDetector(project_id=BQ_PROJECT)
opportunities = whitespace.detect_opportunities(
target_brand="Your Brand",
min_investment_potential=100000
)# Monitor creative exhaustion
from src.intelligence.framework import CreativeFatigueAnalyzer
fatigue = CreativeFatigueAnalyzer(project_id=BQ_PROJECT)
status = fatigue.analyze_fatigue_risk(
target_brand="Your Brand",
window_days=30
)# Override auto-discovery with manual competitor list
competitors = [
"Warby Parker", "Zenni Optical", "EyeBuyDirect",
"Glasses.com", "LensCrafters"
]
pipeline = CompetitiveIntelligencePipeline(
target_brand="Your Brand",
manual_competitors=competitors
)# Control visual analysis budget
config = {
"visual_analysis_budget": 200, # Total images
"per_brand_limit": 20, # Images per competitor
"sampling_strategy": "adaptive" # Adjusts by portfolio size
}# Run tests
pytest tests/
# Code formatting
black src/ tests/
# Linting
flake8 src/ tests/
# Type checking (if using mypy)
mypy src/- Fork the repository
- Create feature branch (
git checkout -b feature/amazing-feature) - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open Pull Request
This project demonstrates breakthrough innovations in competitive intelligence using BigQuery AI:
- SQL-Native AI Workflows: Complex multi-round consensus validation entirely within BigQuery
- Real-Time Competitive Threat Detection: Mathematical copying detection with temporal analysis
- True Multimodal Intelligence: Simultaneous visual-text analysis for story alignment
- Progressive Disclosure Architecture: Information hierarchy preventing executive overwhelm
Built for the BigQuery AI Hackathon - transforming competitive analysis from reactive reporting to proactive strategic advantage.
Ready to transform your competitive intelligence? Start with the demo notebook or dive into the technical architecture.