A text analysis framework implementing a three-stage pipeline architecture for processing and analyzing temporal data. The system combines multiple AI models (OpenAI, Grok, Venice.AI) to perform advanced sampling, prompt engineering, and inference capabilities to generate insights, with a focus on temporal analysis, event mapping and forecasting. This repository serves as a workbench for Chanscope Knowledge Agents application.
-
Temporal Intelligence:
- Precise datetime handling across timezones
- Time-aware context generation
- Historical pattern analysis
-
Distributed Processing:
- Multi-provider model orchestration
- Concurrent chunk processing
- Batched operations with progress tracking
-
Adaptive Analysis:
- Dynamic provider selection
- Automatic fallback mechanisms
- Environment-aware execution (notebook/terminal)
-
Performance Monitoring (Optional):
- Literal AI integration for comprehensive monitoring
- Thread-level execution tracking
- Provider performance metrics
- Error pattern analysis
The framework is designed for robust handling of large-scale text analysis tasks, with built-in support for data validation, error recovery, and detailed operational logging. It provides a flexible foundation for building knowledge processing applications with temporal awareness.
The project supports multiple AI model providers:
- OpenAI: Default provider for both completions and embeddings
- Requires:
OPENAI_API_KEY
,OPENAI_MODEL
,OPENAI_EMBEDDING_MODEL
- Requires:
- Grok (X.AI): Alternative provider with its own embedding model
- Optional:
GROK_API_KEY
,GROK_MODEL
,GROK_EMBEDDING_MODEL
- Optional:
- Venice.AI: Additional model provider for completions
- Optional:
VENICE_API_KEY
,VENICE_MODEL
- Optional:
Configure your preferred provider in config.ini
. The system features automatic fallback to OpenAI if the primary provider fails, ensuring robust operation.
- Pipeline Architecture:
- Embedding Generation (OpenAI/Grok)
- Chunk Analysis (OpenAI/Grok/Venice)
- Summary Generation with temporal context
- Provider Integration:
- Dynamic model selection and fallback
- Standardized cross-provider responses
- Concurrent batch processing
- Literal AI Integration:
- Thread-level execution tracking
- Step-by-step performance metrics
- Provider usage patterns
- Error rate monitoring
- Monitoring Features:
- Automatic OpenAI instrumentation
- Custom step tracking
- Error pattern analysis
- Resource utilization metrics
- Time-Aware Analysis:
- Historical pattern recognition
- Temporal context preservation
- Content Management:
- Semantic chunking with quality thresholds
- Duplicate detection and filtering
- Multi-format data handling (CSV/Parquet/Excel)
- Adaptive Execution:
- Environment-aware (Notebook/Terminal)
- Async processing with progress tracking
- Configurable worker pools
- Error Recovery:
- Exponential backoff retries
- Provider fallback chains
- Comprehensive logging system
- Signal Processing:
- Semantic search and retrieval
- Pattern detection and analysis
- Multi-source data integration
- Contextual Analysis:
- Thread activity monitoring
- Impact assessment metrics
- Narrative evolution tracking
-
Clone the repository:
git clone https://github.com/your-username/knowledge-agents.git cd knowledge-agents
-
Install dependencies:
pip install -r requirements.txt
-
Configure your model providers:
- Copy
config_template.ini
toconfig.ini
- Add your API keys and model preferences
- Copy
-
Set up monitoring (optional):
- Get a Literal AI API key
- Set the environment variable:
os.environ["LITERAL_API_KEY"] = "your-literal-api-key"
- Or pass it directly to the run function:
chunks, summary = await run_knowledge_agents( query=query, process_new=True, providers=providers, monitor_api_key="your-literal-api-key" )
-
Launch Jupyter Notebook:
jupyter notebook
-
Navigate to and open
knowledge_workbench.ipynb
-
Clone the repository:
git clone https://github.com/your-username/knowledge-agents.git cd knowledge-agents
-
Install dependencies:
pip install -r requirements.txt
-
Configure your model providers:
- Update
config_template.ini
- Add your API keys and model preferences
- Update
-
Set up monitoring (optional):
export LITERAL_API_KEY="your-literal-api-key"
-
Run the main script:
python model_ops.py
The framework includes comprehensive performance monitoring through Literal AI integration:
- Thread-level execution tracking
- Step-by-step performance metrics
- Provider usage patterns
- Error rate monitoring
- Resource utilization metrics
-
Enable monitoring by setting the Literal AI API key:
os.environ["LITERAL_API_KEY"] = "your-literal-api-key"
-
Run with monitoring enabled:
chunks, summary = await run_knowledge_agents( query=query, process_new=True, providers=providers, monitor_api_key=os.getenv("LITERAL_API_KEY") )
-
Access monitoring data through the Literal AI dashboard:
- View thread execution timelines
- Analyze provider performance
- Track error patterns
- Monitor resource usage
- Embedding generation
- Content retrieval
- Chunk analysis
- Summary generation
- Error handling and recovery
For data collection functionality, you can utilize the data gathering tools from the chanscope-lambda repository. If you prefer not to set up a Lambda function, you can use the gather.py
script directly from that repository for data collection purposes.
- Clone the chanscope-lambda repository
- Navigate to the gather.py script
- Follow the script's documentation for standalone data gathering functionality
The prompt.yaml
file is a crucial component that defines the system's interaction patterns and analytical capabilities. It contains two main sections:
-
Objective Analysis
- Handles complex forecasting tasks combining numerical and textual data
- Performs structured analysis including numerical validation, contextual integration, and pattern recognition
- Generates multimodal forecasts with confidence metrics and contextual validation
-
Generate Chunks
- Specializes in processing and analyzing text segments
- Performs temporal analysis, information extraction, and context generation
- Maintains structured output format for consistency
-
Summary Generation
- Templates for comprehensive summaries with forecasting capabilities
- Integrates numerical data with contextual information
- Includes historical analysis, forecast generation, and risk assessment
-
Text Chunk Summary
- Templates for analyzing discrete text segments
- Extracts time series data and key information
- Generates domain context, background knowledge, and assumptions
Each prompt type is designed to maintain temporal awareness, preserve numerical precision, and provide comprehensive contextual analysis. The system uses these prompts to ensure consistent, high-quality output across different analytical tasks.
- Data Gathering Lambda: chanscope-lambda
- Prompt Engineering Research: Temporal-Aware Language Models for Temporal Knowledge Graph Question Answering; used for designing temporal-aware prompts and multimodal forecasting capabilities