Skip to content

feat: Automated conversation ingestion with SQLite storage#3

Open
marklubin wants to merge 1 commit intomainfrom
feature/conversation-ingestion-cron
Open

feat: Automated conversation ingestion with SQLite storage#3
marklubin wants to merge 1 commit intomainfrom
feature/conversation-ingestion-cron

Conversation

@marklubin
Copy link
Owner

Summary

  • Adds automated daily conversation ingestion via cron job
  • Implements dual storage strategy with SQLite for scalability and Neo4j for graph relationships
  • Includes monitoring dashboard in Gradio UI

Key Features

  • 📁 SQLite Storage: Efficient storage for conversation data, fragments, summaries, and embeddings
  • 🔄 Idempotent Processing: Checksum-based deduplication prevents reprocessing
  • 🔗 Dual Storage: Maintains compatibility by storing in both SQLite and Neo4j
  • 📊 Monitoring Dashboard: New Gradio tab shows job history and allows manual runs
  • 🚨 Error Handling: System broadcasts on Neo4j failures, but SQLite continues
  • Integration Tests: Comprehensive tests for storage and ingestion flow

Implementation Details

  • Cron job scans configured directory for new conversation files (JSON/TXT/LOG)
  • Processes through existing pipeline: chunking → summarization → embedding
  • Stores raw data and processed results in SQLite for efficient querying
  • Maintains Neo4j storage for backward compatibility
  • Tracks job history and processing status per conversation

Configuration

# Add to .env
SQLITE_DB_PATH="./data/conversations.db"
CHAT_LOGS_PATH="./data/chat_logs"
CRON_ENABLED="true"

Testing

  • Run uv run pytest tests/integration/test_conversation_ingestion_simple.py -v
  • Manual test: uv run python scripts/conversation_ingestion_cron.py
  • Create test data: uv run python scripts/create_sample_conversations.py

Future Migration Path

This dual storage approach provides a gradual migration path from Neo4j to SQLite for conversation data, improving scalability while maintaining existing functionality.

🤖 Generated with Claude Code

- Add SQLite models for conversations, fragments, summaries, and embeddings
- Create cron job script for daily automated ingestion
- Implement dual storage to both SQLite and Neo4j
- Add idempotent processing with checksum-based deduplication
- Include system broadcast alerts for Neo4j failures
- Add Gradio UI tab for cron job monitoring
- Create integration tests for storage and ingestion flow
- Add sample conversation generator for testing

This provides a foundation for automated conversation processing with
better scalability than Neo4j alone, while maintaining compatibility.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant