A demo to show Weaviate-based semantic search and AI agent layer for marketing content that enables natural language discovery, intelligent analysis, and automated content repurposing.
- ๐ Semantic Search: Find content by meaning using Weaviate's Query Agent
- ๐ค AI Agents: Automated workflows for discovery and content repurposing
- โป๏ธ Content Repurposing: Transform existing content into multiple formats (LinkedIn, Email, Twitter, Summary)
- ๐ Content Library: Browse and filter your content collection
- ๐จ Modern UI: Clean Streamlit interface for demos and POCs
- Vector Database: Weaviate Cloud
- Discovery Agent: Weaviate Query Agent (native NLโsearch)
- Content Agent: LangGraph (multi-step workflows)
- LLM: OpenAI GPT-4o-mini
- Embeddings: Weaviate Native (Snowflake Arctic)
- Frontend: Streamlit
Discovery (Query Agent):
User Query โ NL Interpretation โ Filter Extraction โ Hybrid Search โ Results + Citations
Repurposing (LangGraph):
Source Content โ Analysis โ Format Generation โ Quality Review โ Output
- Python 3.11+
- Weaviate Cloud account (sandbox cluster)
- OpenAI API key
# Clone repository
cd Content-Intelligence-Hub
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txtCreate a .env file:
cp .env.example .envEdit .env with your credentials:
WEAVIATE_URL=https://your-cluster.weaviate.cloud
WEAVIATE_API_KEY=your-weaviate-api-key
OPENAI_API_KEY=your-openai-api-key
python -m scripts.schema createFirst, create sample_data/content.json with your marketing content (see Sample Data Format below).
Then ingest:
python -m scripts.ingeststreamlit run src/app.pyThe app will open at http://localhost:8501
Create sample_data/content.json with this structure:
[
{
"title": "Cloud Migration Guide for CTOs",
"body": "Full content text here...",
"summary": "A comprehensive guide to planning and executing cloud migration.",
"content_type": "blog",
"persona": "cto",
"funnel_stage": "consideration",
"channel": "website",
"topics": ["cloud", "migration", "infrastructure"],
"performance_score": 85,
"url": "https://example.com/blog/cloud-migration",
"created_at": "2025-01-15T10:00:00Z"
}
]title(string): Content titlebody(string): Full content textcontent_type(string): blog | case_study | email | social_post | landing_page | whitepaper
summary(string): Short descriptionpersona(string): cto | cfo | developer | marketing_leader | etc.funnel_stage(string): awareness | consideration | decision | retentionchannel(string): website | linkedin | email | twitter | youtubetopics(array): Array of topic tagsperformance_score(number): 0-100 engagement metricurl(string): Source URLcreated_at(ISO date string): Publication date
Ask natural language questions about your content:
"Find content about cloud migration for CTOs"
"What case studies do we have?"
"Show me high-performing awareness content"
The Query Agent will:
- Interpret your query semantically
- Extract relevant filters automatically
- Return results with citations
- Support follow-up questions
- Select source content (from Discovery or enter UUID)
- Choose output formats (LinkedIn, Email, Twitter, Summary)
- Click "Generate All Formats"
- Review and download generated content
- Browse all content with filters
- View content distribution by type, persona, stage
- Quick access to repurpose any item
# Create collection
python -m scripts.schema create
# Get collection info
python -m scripts.schema info
# Delete collection (WARNING: deletes all data)
python -m scripts.schema delete# Ingest from default file
python -m scripts.ingest
# Ingest from custom file
python -m scripts.ingest ingest path/to/content.json
# Clear all content
python -m scripts.ingest clear# Test hybrid search
python -m src.search "cloud migration for CTOs"# Test Query Agent
python -m src.agents.query_agent "Find content about AI"
# Test Repurpose Agent
python -m src.agents.repurpose_agent <content-uuid> linkedin email twitterContent-Intelligence-Hub/
โโโ src/ # Application source code
โ โโโ __init__.py
โ โโโ app.py # Streamlit application
โ โโโ auth.py # Authentication module
โ โโโ config.py # Configuration and env vars
โ โโโ search.py # Search functions
โ โโโ weaviate_client.py # Weaviate connection
โ โโโ agents/ # AI agents
โ โโโ __init__.py
โ โโโ query_agent.py # Weaviate Query Agent
โ โโโ repurpose_agent.py # LangGraph workflow
โ โโโ state.py # LangGraph state schema
โ โโโ tools.py # LangGraph tools
โโโ scripts/ # CLI utilities
โ โโโ __init__.py
โ โโโ ingest.py # Data ingestion
โ โโโ schema.py # Schema management
โโโ deploy/ # Deployment configs
โ โโโ deploy.sh # Deployment script
โ โโโ setup-gcp.sh # GCP setup script
โ โโโ DEPLOYMENT.md # Deployment guide
โโโ sample_data/ # Sample data
โ โโโ content.json # Marketing content
โ โโโ README.md # Data format guide
โโโ Dockerfile # Container definition
โโโ cloudbuild.yaml # Cloud Build CI/CD
โโโ requirements.txt # Python dependencies
โโโ .env.example # Environment template
โโโ README.md # This file
Edit config.py to customize:
EMBEDDING_MODEL: OpenAI embedding modelLLM_MODEL: OpenAI chat modelLLM_TEMPERATURE: Generation temperature (0.0-1.0)HYBRID_SEARCH_ALPHA: Hybrid search weight (0=BM25, 1=vector, 0.5=equal)DEFAULT_SEARCH_LIMIT: Default number of search results
from src.weaviate_client import create_client, close_client
client = create_client()
print(f"Connected: {client.is_ready()}")
close_client(client)python -m src.agents.query_agent "Find blog posts about DevOps"# Get a content UUID from the library first
python -m src.agents.repurpose_agent <uuid> linkedin email- Verify
WEAVIATE_URLis correct (includes https://) - Check
WEAVIATE_API_KEYis valid - Ensure Weaviate cluster is running
- Check network/firewall settings
- Verify
OPENAI_API_KEYis valid and has credits - Check rate limits
- Ensure models are accessible (gpt-4o-mini, text-embedding-3-small)
Run: python -m scripts.schema create
- Ensure data is ingested:
python -m scripts.ingest - Check collection has data:
python -m scripts.schema info - Verify query syntax
pip install -r requirements.txt --upgrade| Property | Type | Description |
|---|---|---|
| title | text | Content title (vectorized) |
| body | text | Full content (vectorized) |
| summary | text | Summary (vectorized) |
| content_type | text | Type classification |
| persona | text | Target audience |
| funnel_stage | text | Marketing funnel stage |
| channel | text | Distribution channel |
| topics | text[] | Topic tags |
| performance_score | number | Engagement metric (0-100) |
| url | text | Source URL |
| created_at | date | Publication date |
Vectorization: Weaviate Native (Snowflake Arctic) on title, body, summary
Search: Hybrid (BM25 + vector similarity)
-
Discovery (2 min)
- "Find our best content about cloud migration for CTOs"
- Show filters applied, results with citations
- Follow-up: "Now just the case studies"
-
Repurpose (2 min)
- Select top case study
- Generate LinkedIn + Email + Twitter
- Show all formats, highlight quality scores
-
Value Prop (1 min)
- Search by meaning, not keywords
- Repurpose in seconds, not hours
- Built on Weaviate + LangGraph
This is a demo/POC project for GTM/Marketing teams.