Content Intelligence Hub

A demo to show Weaviate-based semantic search and AI agent layer for marketing content that enables natural language discovery, intelligent analysis, and automated content repurposing.

🎯 Features

🔍 Semantic Search: Find content by meaning using Weaviate's Query Agent
🤖 AI Agents: Automated workflows for discovery and content repurposing
♻️ Content Repurposing: Transform existing content into multiple formats (LinkedIn, Email, Twitter, Summary)
📊 Content Library: Browse and filter your content collection
🎨 Modern UI: Clean Streamlit interface for demos and POCs

🏗️ Architecture

Stack

Vector Database: Weaviate Cloud
Discovery Agent: Weaviate Query Agent (native NL→search)
Content Agent: LangGraph (multi-step workflows)
LLM: OpenAI GPT-4o-mini
Embeddings: Weaviate Native (Snowflake Arctic)
Frontend: Streamlit

Agent Flow

Discovery (Query Agent):
  User Query → NL Interpretation → Filter Extraction → Hybrid Search → Results + Citations

Repurposing (LangGraph):
  Source Content → Analysis → Format Generation → Quality Review → Output

🚀 Quick Start

1. Prerequisites

Python 3.11+
Weaviate Cloud account (sandbox cluster)
OpenAI API key

2. Installation

# Clone repository
cd Content-Intelligence-Hub

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

3. Configuration

Create a .env file:

cp .env.example .env

Edit .env with your credentials:

WEAVIATE_URL=https://your-cluster.weaviate.cloud
WEAVIATE_API_KEY=your-weaviate-api-key
OPENAI_API_KEY=your-openai-api-key

4. Create Schema

python -m scripts.schema create

5. Ingest Sample Data

First, create sample_data/content.json with your marketing content (see Sample Data Format below).

Then ingest:

python -m scripts.ingest

6. Run Application

streamlit run src/app.py

The app will open at http://localhost:8501

📝 Sample Data Format

Create sample_data/content.json with this structure:

[
  {
    "title": "Cloud Migration Guide for CTOs",
    "body": "Full content text here...",
    "summary": "A comprehensive guide to planning and executing cloud migration.",
    "content_type": "blog",
    "persona": "cto",
    "funnel_stage": "consideration",
    "channel": "website",
    "topics": ["cloud", "migration", "infrastructure"],
    "performance_score": 85,
    "url": "https://example.com/blog/cloud-migration",
    "created_at": "2025-01-15T10:00:00Z"
  }
]

Required Fields

title (string): Content title
body (string): Full content text
content_type (string): blog | case_study | email | social_post | landing_page | whitepaper

Optional Fields

summary (string): Short description
persona (string): cto | cfo | developer | marketing_leader | etc.
funnel_stage (string): awareness | consideration | decision | retention
channel (string): website | linkedin | email | twitter | youtube
topics (array): Array of topic tags
performance_score (number): 0-100 engagement metric
url (string): Source URL
created_at (ISO date string): Publication date

🎮 Usage

Discovery Page

Ask natural language questions about your content:

"Find content about cloud migration for CTOs"
"What case studies do we have?"
"Show me high-performing awareness content"

The Query Agent will:

Interpret your query semantically
Extract relevant filters automatically
Return results with citations
Support follow-up questions

Repurpose Page

Select source content (from Discovery or enter UUID)
Choose output formats (LinkedIn, Email, Twitter, Summary)
Click "Generate All Formats"
Review and download generated content

Content Library

Browse all content with filters
View content distribution by type, persona, stage
Quick access to repurpose any item

🛠️ CLI Tools

Schema Management

# Create collection
python -m scripts.schema create

# Get collection info
python -m scripts.schema info

# Delete collection (WARNING: deletes all data)
python -m scripts.schema delete

Data Ingestion

# Ingest from default file
python -m scripts.ingest

# Ingest from custom file
python -m scripts.ingest ingest path/to/content.json

# Clear all content
python -m scripts.ingest clear

Search Testing

# Test hybrid search
python -m src.search "cloud migration for CTOs"

Agent Testing

# Test Query Agent
python -m src.agents.query_agent "Find content about AI"

# Test Repurpose Agent
python -m src.agents.repurpose_agent <content-uuid> linkedin email twitter

📁 Project Structure

Content-Intelligence-Hub/
├── src/                        # Application source code
│   ├── __init__.py
│   ├── app.py                  # Streamlit application
│   ├── auth.py                 # Authentication module
│   ├── config.py               # Configuration and env vars
│   ├── search.py               # Search functions
│   ├── weaviate_client.py      # Weaviate connection
│   └── agents/                 # AI agents
│       ├── __init__.py
│       ├── query_agent.py      # Weaviate Query Agent
│       ├── repurpose_agent.py  # LangGraph workflow
│       ├── state.py            # LangGraph state schema
│       └── tools.py            # LangGraph tools
├── scripts/                    # CLI utilities
│   ├── __init__.py
│   ├── ingest.py               # Data ingestion
│   └── schema.py               # Schema management
├── deploy/                     # Deployment configs
│   ├── deploy.sh               # Deployment script
│   ├── setup-gcp.sh            # GCP setup script
│   └── DEPLOYMENT.md           # Deployment guide
├── sample_data/                # Sample data
│   ├── content.json            # Marketing content
│   └── README.md               # Data format guide
├── Dockerfile                  # Container definition
├── cloudbuild.yaml             # Cloud Build CI/CD
├── requirements.txt            # Python dependencies
├── .env.example                # Environment template
└── README.md                   # This file

🔧 Configuration

Edit config.py to customize:

EMBEDDING_MODEL: OpenAI embedding model
LLM_MODEL: OpenAI chat model
LLM_TEMPERATURE: Generation temperature (0.0-1.0)
HYBRID_SEARCH_ALPHA: Hybrid search weight (0=BM25, 1=vector, 0.5=equal)
DEFAULT_SEARCH_LIMIT: Default number of search results

🧪 Testing

Test Weaviate Connection

from src.weaviate_client import create_client, close_client

client = create_client()
print(f"Connected: {client.is_ready()}")
close_client(client)

Test Query Agent

python -m src.agents.query_agent "Find blog posts about DevOps"

Test Repurpose Agent

# Get a content UUID from the library first
python -m src.agents.repurpose_agent <uuid> linkedin email

🐛 Troubleshooting

Weaviate Connection Failed

Verify WEAVIATE_URL is correct (includes https://)
Check WEAVIATE_API_KEY is valid
Ensure Weaviate cluster is running
Check network/firewall settings

OpenAI API Errors

Verify OPENAI_API_KEY is valid and has credits
Check rate limits
Ensure models are accessible (gpt-4o-mini, text-embedding-3-small)

Collection Not Found

Run: python -m scripts.schema create

No Search Results

Ensure data is ingested: python -m scripts.ingest
Check collection has data: python -m scripts.schema info
Verify query syntax

Import Errors

pip install -r requirements.txt --upgrade

📊 Data Model

MarketingContent Collection

Property	Type	Description
title	text	Content title (vectorized)
body	text	Full content (vectorized)
summary	text	Summary (vectorized)
content_type	text	Type classification
persona	text	Target audience
funnel_stage	text	Marketing funnel stage
channel	text	Distribution channel
topics	text[]	Topic tags
performance_score	number	Engagement metric (0-100)
url	text	Source URL
created_at	date	Publication date

Vectorization: Weaviate Native (Snowflake Arctic) on title, body, summary
Search: Hybrid (BM25 + vector similarity)

🎯 Demo Script

Discovery (2 min)
- "Find our best content about cloud migration for CTOs"
- Show filters applied, results with citations
- Follow-up: "Now just the case studies"
Repurpose (2 min)
- Select top case study
- Generate LinkedIn + Email + Twitter
- Show all formats, highlight quality scores
Value Prop (1 min)
- Search by meaning, not keywords
- Repurpose in seconds, not hours
- Built on Weaviate + LangGraph

📄 License

This is a demo/POC project for GTM/Marketing teams.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
deploy		deploy
sample_data		sample_data
scripts		scripts
src		src
.cursorrules		.cursorrules
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
cloudbuild.yaml		cloudbuild.yaml
requirements.txt		requirements.txt

RiskResponse/Content-Intelligence-Hub-Demo

Folders and files

Latest commit

History

Repository files navigation