A powerful tool that combines audio analysis and portfolio assessment to extract competency insights across multiple dimensions, combining local processing with cloud services.
- Overview
- Components
- Features
- Installation
- Usage
- Output
- LLM Provider
- System Requirements
- Troubleshooting
CompExtractor analyzes both audio recordings and portfolios to extract competency insights, providing detailed reports on key competency dimensions. The tool combines local processing with cloud services:
- For audio analysis: Local transcription with cloud-based speaker diarization and competency analysis
- For portfolio analysis: Web page conversion to PDF and cloud-based competency analysis
- Transcription: Uses OpenAI's Whisper locally (no API key needed)
- Runs completely offline
- Uses local GPU/CPU for processing
- Supports multiple languages
- Downloads model files on first use (~1.5GB for medium model)
-
Speaker Diarization: Uses pyannote.audio (requires Hugging Face token)
- Requires accepting model terms of use at huggingface.co
- Needs HUGGING_FACE_TOKEN in .env
-
HTML-to-PDF Conversion: Uses an external service for portfolio analysis
- Converts Google Sites pages to PDF for analysis
- Needs PDF_HOST in .env (defaults to https://html2pdf-u707.onrender.com)
-
Competency Analysis: Uses OpenRouter API (requires API key)
- Analyzes transcripts and portfolio content against competency framework
- Generates ratings and insights
- Needs OPENROUTER_API_KEY and related settings in .env
Analyzes key competencies across three levels:
- Emerging (1-3)
- Developing (4-7)
- Proficient (8-10)
The competency framework is customizable through RTF files, allowing you to define your own competency dimensions and criteria.
-
Clone the repository
-
Install dependencies:
pip install -r requirements.txt
-
Required environment variables (.env):
# For Competency Analysis (OpenRouter) OPENROUTER_API_KEY=your_key_here OPENROUTER_URL=https://openrouter.ai/api/v1/chat/completions OPENROUTER_MODEL=anthropic/claude-3.7-sonnet # For Speaker Diarization (Hugging Face) HUGGING_FACE_TOKEN=your_token_here # For Portfolio Analysis (HTML-to-PDF) PDF_HOST=https://html2pdf-u707.onrender.com
-
Required sound files (in root directory):
- sound.mp3 (completion sound)
- coin.mp3 (progress indicator)
python src/compextractor_gui.py
The GUI provides a landing page with three options:
- Audio Reflection - For analyzing audio recordings
- Portfolio - For analyzing portfolios
- Video Performance of Learning - Placeholder for future implementation
- Simplified audio file selection showing file count and names
- Competency file selection (RTF/TXT)
- Optional speaker diarization
- Output format selection:
- Full Report (HTML with narrative and ratings)
- Structured JSON (machine-readable format)
- Both (generates both formats)
- Progress tracking
- Automatic report generation for each audio file
- Direct portfolio URL input for single portfolio analysis
- CSV file upload option for batch processing multiple portfolios
- Portfolio section selection:
- Beginner, Intermediate, Advanced skill levels
- Business and Resume sections
- Output format selection (same options as Audio Reflection)
- Progress tracking
- Automatic report generation for each portfolio
python src/main.py
The tool generates different outputs depending on the analysis type:
- Transcription files (before and after diarization)
- HTML report with:
- Competency ratings (1-10 scale)
- Evidence for each rating
- Interactive radar charts for competency visualization
- Separate sections for each speaker (if diarization enabled)
- Optional structured JSON output
- HTML report with:
- Competency ratings (1-10 scale)
- Evidence for each rating
- Areas for improvement
- Interactive radar charts for competency visualization
- Examples from the portfolio
- Optional structured JSON output
results/
- Contains all output filestranscript_*_before_diarization_*.txt
- Raw transcriptstranscript_*_after_diarization_*.txt
- Speaker-labeled transcriptscombined_report_*.html
- Audio analysis reportsportfolio_report_*.html
- Portfolio analysis reportsstructured_data_*.json
- Audio analysis JSON dataportfolio_data_*.json
- Portfolio analysis JSON data
temp/
- Temporary audio chunks (auto-cleaned after processing)
CompExtractor currently uses OpenRouter as the LLM provider for competency analysis. The default model is anthropic/claude-3.7-sonnet
, but this can be changed in your .env file.
The code can be modified to use other LLM providers by:
- Updating the
extract_competency_insights
function insrc/main.py
- Modifying the API endpoint, headers, and request format to match your preferred provider
- Updating the environment variables accordingly
For example, to use Amazon Bedrock directly instead of OpenRouter, you would need to:
- Change the API endpoint to Amazon Bedrock's endpoint
- Update the authentication method to use AWS credentials
- Adjust the request format to match Amazon Bedrock's API requirements
- Configure the appropriate region and service settings
- Python 3.11+ (tested on 3.11.11)
- FFmpeg (for audio processing)
- Internet connection (for diarization and competency analysis)
- Ensure you've accepted the model terms at huggingface.co
- Verify your Hugging Face token is correct in the .env file
- Try running
huggingface-cli login
in your terminal
- Check that FFmpeg is installed and accessible in your PATH
- Ensure audio files are in a supported format
- For large files, the system will automatically split them into chunks
- Ensure the PDF_HOST environment variable is set correctly
- Verify that the portfolio URL is accessible and publicly viewable
- For CSV batch processing, ensure the CSV file has the correct column headers
- If the HTML-to-PDF service is down, consider setting up a local service
- Ensure all required sound files are in the root directory
- Check that the banner.jpeg file exists for the GUI header
- If the GUI appears cut off, try resizing the window
Created by Gus Halwani (@fizt656)