Skip to content

RiskResponse/Content-Intelligence-Hub-Demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

2 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Content Intelligence Hub

A demo to show Weaviate-based semantic search and AI agent layer for marketing content that enables natural language discovery, intelligent analysis, and automated content repurposing.

๐ŸŽฏ Features

  • ๐Ÿ” Semantic Search: Find content by meaning using Weaviate's Query Agent
  • ๐Ÿค– AI Agents: Automated workflows for discovery and content repurposing
  • โ™ป๏ธ Content Repurposing: Transform existing content into multiple formats (LinkedIn, Email, Twitter, Summary)
  • ๐Ÿ“Š Content Library: Browse and filter your content collection
  • ๐ŸŽจ Modern UI: Clean Streamlit interface for demos and POCs

๐Ÿ—๏ธ Architecture

Stack

  • Vector Database: Weaviate Cloud
  • Discovery Agent: Weaviate Query Agent (native NLโ†’search)
  • Content Agent: LangGraph (multi-step workflows)
  • LLM: OpenAI GPT-4o-mini
  • Embeddings: Weaviate Native (Snowflake Arctic)
  • Frontend: Streamlit

Agent Flow

Discovery (Query Agent):
  User Query โ†’ NL Interpretation โ†’ Filter Extraction โ†’ Hybrid Search โ†’ Results + Citations

Repurposing (LangGraph):
  Source Content โ†’ Analysis โ†’ Format Generation โ†’ Quality Review โ†’ Output

๐Ÿš€ Quick Start

1. Prerequisites

  • Python 3.11+
  • Weaviate Cloud account (sandbox cluster)
  • OpenAI API key

2. Installation

# Clone repository
cd Content-Intelligence-Hub

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

3. Configuration

Create a .env file:

cp .env.example .env

Edit .env with your credentials:

WEAVIATE_URL=https://your-cluster.weaviate.cloud
WEAVIATE_API_KEY=your-weaviate-api-key
OPENAI_API_KEY=your-openai-api-key

4. Create Schema

python -m scripts.schema create

5. Ingest Sample Data

First, create sample_data/content.json with your marketing content (see Sample Data Format below).

Then ingest:

python -m scripts.ingest

6. Run Application

streamlit run src/app.py

The app will open at http://localhost:8501

๐Ÿ“ Sample Data Format

Create sample_data/content.json with this structure:

[
  {
    "title": "Cloud Migration Guide for CTOs",
    "body": "Full content text here...",
    "summary": "A comprehensive guide to planning and executing cloud migration.",
    "content_type": "blog",
    "persona": "cto",
    "funnel_stage": "consideration",
    "channel": "website",
    "topics": ["cloud", "migration", "infrastructure"],
    "performance_score": 85,
    "url": "https://example.com/blog/cloud-migration",
    "created_at": "2025-01-15T10:00:00Z"
  }
]

Required Fields

  • title (string): Content title
  • body (string): Full content text
  • content_type (string): blog | case_study | email | social_post | landing_page | whitepaper

Optional Fields

  • summary (string): Short description
  • persona (string): cto | cfo | developer | marketing_leader | etc.
  • funnel_stage (string): awareness | consideration | decision | retention
  • channel (string): website | linkedin | email | twitter | youtube
  • topics (array): Array of topic tags
  • performance_score (number): 0-100 engagement metric
  • url (string): Source URL
  • created_at (ISO date string): Publication date

๐ŸŽฎ Usage

Discovery Page

Ask natural language questions about your content:

"Find content about cloud migration for CTOs"
"What case studies do we have?"
"Show me high-performing awareness content"

The Query Agent will:

  • Interpret your query semantically
  • Extract relevant filters automatically
  • Return results with citations
  • Support follow-up questions

Repurpose Page

  1. Select source content (from Discovery or enter UUID)
  2. Choose output formats (LinkedIn, Email, Twitter, Summary)
  3. Click "Generate All Formats"
  4. Review and download generated content

Content Library

  • Browse all content with filters
  • View content distribution by type, persona, stage
  • Quick access to repurpose any item

๐Ÿ› ๏ธ CLI Tools

Schema Management

# Create collection
python -m scripts.schema create

# Get collection info
python -m scripts.schema info

# Delete collection (WARNING: deletes all data)
python -m scripts.schema delete

Data Ingestion

# Ingest from default file
python -m scripts.ingest

# Ingest from custom file
python -m scripts.ingest ingest path/to/content.json

# Clear all content
python -m scripts.ingest clear

Search Testing

# Test hybrid search
python -m src.search "cloud migration for CTOs"

Agent Testing

# Test Query Agent
python -m src.agents.query_agent "Find content about AI"

# Test Repurpose Agent
python -m src.agents.repurpose_agent <content-uuid> linkedin email twitter

๐Ÿ“ Project Structure

Content-Intelligence-Hub/
โ”œโ”€โ”€ src/                        # Application source code
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ app.py                  # Streamlit application
โ”‚   โ”œโ”€โ”€ auth.py                 # Authentication module
โ”‚   โ”œโ”€โ”€ config.py               # Configuration and env vars
โ”‚   โ”œโ”€โ”€ search.py               # Search functions
โ”‚   โ”œโ”€โ”€ weaviate_client.py      # Weaviate connection
โ”‚   โ””โ”€โ”€ agents/                 # AI agents
โ”‚       โ”œโ”€โ”€ __init__.py
โ”‚       โ”œโ”€โ”€ query_agent.py      # Weaviate Query Agent
โ”‚       โ”œโ”€โ”€ repurpose_agent.py  # LangGraph workflow
โ”‚       โ”œโ”€โ”€ state.py            # LangGraph state schema
โ”‚       โ””โ”€โ”€ tools.py            # LangGraph tools
โ”œโ”€โ”€ scripts/                    # CLI utilities
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ ingest.py               # Data ingestion
โ”‚   โ””โ”€โ”€ schema.py               # Schema management
โ”œโ”€โ”€ deploy/                     # Deployment configs
โ”‚   โ”œโ”€โ”€ deploy.sh               # Deployment script
โ”‚   โ”œโ”€โ”€ setup-gcp.sh            # GCP setup script
โ”‚   โ””โ”€โ”€ DEPLOYMENT.md           # Deployment guide
โ”œโ”€โ”€ sample_data/                # Sample data
โ”‚   โ”œโ”€โ”€ content.json            # Marketing content
โ”‚   โ””โ”€โ”€ README.md               # Data format guide
โ”œโ”€โ”€ Dockerfile                  # Container definition
โ”œโ”€โ”€ cloudbuild.yaml             # Cloud Build CI/CD
โ”œโ”€โ”€ requirements.txt            # Python dependencies
โ”œโ”€โ”€ .env.example                # Environment template
โ””โ”€โ”€ README.md                   # This file

๐Ÿ”ง Configuration

Edit config.py to customize:

  • EMBEDDING_MODEL: OpenAI embedding model
  • LLM_MODEL: OpenAI chat model
  • LLM_TEMPERATURE: Generation temperature (0.0-1.0)
  • HYBRID_SEARCH_ALPHA: Hybrid search weight (0=BM25, 1=vector, 0.5=equal)
  • DEFAULT_SEARCH_LIMIT: Default number of search results

๐Ÿงช Testing

Test Weaviate Connection

from src.weaviate_client import create_client, close_client

client = create_client()
print(f"Connected: {client.is_ready()}")
close_client(client)

Test Query Agent

python -m src.agents.query_agent "Find blog posts about DevOps"

Test Repurpose Agent

# Get a content UUID from the library first
python -m src.agents.repurpose_agent <uuid> linkedin email

๐Ÿ› Troubleshooting

Weaviate Connection Failed

  • Verify WEAVIATE_URL is correct (includes https://)
  • Check WEAVIATE_API_KEY is valid
  • Ensure Weaviate cluster is running
  • Check network/firewall settings

OpenAI API Errors

  • Verify OPENAI_API_KEY is valid and has credits
  • Check rate limits
  • Ensure models are accessible (gpt-4o-mini, text-embedding-3-small)

Collection Not Found

Run: python -m scripts.schema create

No Search Results

  • Ensure data is ingested: python -m scripts.ingest
  • Check collection has data: python -m scripts.schema info
  • Verify query syntax

Import Errors

pip install -r requirements.txt --upgrade

๐Ÿ“Š Data Model

MarketingContent Collection

Property Type Description
title text Content title (vectorized)
body text Full content (vectorized)
summary text Summary (vectorized)
content_type text Type classification
persona text Target audience
funnel_stage text Marketing funnel stage
channel text Distribution channel
topics text[] Topic tags
performance_score number Engagement metric (0-100)
url text Source URL
created_at date Publication date

Vectorization: Weaviate Native (Snowflake Arctic) on title, body, summary
Search: Hybrid (BM25 + vector similarity)

๐ŸŽฏ Demo Script

  1. Discovery (2 min)

    • "Find our best content about cloud migration for CTOs"
    • Show filters applied, results with citations
    • Follow-up: "Now just the case studies"
  2. Repurpose (2 min)

    • Select top case study
    • Generate LinkedIn + Email + Twitter
    • Show all formats, highlight quality scores
  3. Value Prop (1 min)

    • Search by meaning, not keywords
    • Repurpose in seconds, not hours
    • Built on Weaviate + LangGraph

๐Ÿ“„ License

This is a demo/POC project for GTM/Marketing teams.

๐Ÿ“š Resources

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published