A cloud-based audio analysis service that performs comprehensive audio feature extraction using the Essentia.js library. The service accepts audio file URLs, performs extensive audio analysis, and stores the results in Cloudflare R2.
✨ Now Enhanced with Pow3r Workflow Orchestrator & Pow3r Pass Authentication
- 🔐 Pow3r Pass ACL: Enterprise-grade authentication with granular permissions
- ☁️ Cloudflare Edge: Global distribution with sub-10s response times
- 🔄 Workflow Orchestration: Complex multi-step audio analysis pipelines
- 🤖 MCP Server: Model Context Protocol for AI agent integration
- 🧪 E2E Testing: Comprehensive test coverage with automated deployment
📚 Documentation:
- MCP Usage Guide - Complete guide for using MCP Server
- Guardian - Service architecture and best practices
Essentia is a Cloudflare Edge-based audio analysis service that leverages the Essentia.js library (a JavaScript port of the Essentia audio analysis library) to extract detailed audio features from audio files. The service runs on Cloudflare Workers and provides a Model Context Protocol (MCP) Server interface for AI agents and automated workflows.
The service performs comprehensive audio analysis including:
- Spectral Analysis: FFT, DCT, spectral peaks, rolloff, complexity, and contrast
- Frequency Band Analysis: Bark bands, Mel bands, ERB bands
- Cepstral Analysis: MFCC (Mel-Frequency Cepstral Coefficients), GFCC (Gammatone-Frequency Cepstral Coefficients)
- Pitch & Tonal Analysis: Pitch detection, melody extraction, key detection, scale detection, tuning frequency
- Rhythm Analysis: Beat detection, BPM estimation, onset detection, rhythm transform
- Harmonic Analysis: HPCP (chroma), chord detection, inharmonicity, dissonance
- High-Level Descriptors: Danceability, dynamic complexity, audio segmentation
- Beat Markers & Loop Slicing: Automatic detection of beat positions and generation of loop points (4, 8, 16 beats)
- Song Section Detection: Automatic identification of song sections (verse, chorus, bridge, intro, outro)
- Songwriting Metadata:
- Story Arcs: Tension/release patterns and narrative structure analysis
- Motifs: Recurring melodic and harmonic patterns tracked as "characters"
- Quotes: Repeated musical phrases (melodic and harmonic quotes)
- Psychological Analysis: Valence (emotion), arousal (energy), emotional trajectory, and paradigm shifts
- Hierarchical Metadata: Structured at three levels:
- Song Level: Overall analysis, story arcs, psychological profile
- Section Level: Per-section metadata (verse, chorus, etc.)
- Loop Level: Beat-aligned loop metadata for slicing and remixing
All analysis results are stored as JSON files in Cloudflare R2 and accessible via MCP resources or public URLs.
- Audio File Processing: Downloads and processes audio files from HTTPS URLs
- Comprehensive Analysis: Extracts 30+ different audio features and descriptors
- Cloud Storage Integration: Automatically uploads analysis results to Google Cloud Storage
- RESTful API: Simple POST endpoint for audio analysis requests
- Error Handling: Robust error handling with cleanup of temporary files
- Scalable: Designed for Google Cloud Run with auto-scaling capabilities
- Fast Fourier Transform (FFT)
- Discrete Cosine Transform (DCT)
- Spectral peaks detection
- Spectral rolloff
- High Frequency Content (HFC)
- Spectral contrast
- Bark scale bands
- Mel scale bands
- ERB (Equivalent Rectangular Bandwidth) bands
- MFCC (Mel-Frequency Cepstral Coefficients)
- GFCC (Gammatone-Frequency Cepstral Coefficients)
- LPC (Linear Predictive Coding)
- Pitch salience function
- Predominant pitch/melody extraction
- Musical key detection
- Scale detection
- Tuning frequency extraction
- Beat detection (TempoTapDegara algorithm)
- BPM (beats per minute) estimation
- Onset detection
- Rhythm transform
- Beat loudness
- HPCP (Harmonic Pitch Class Profile / Chroma)
- Chord detection
- Inharmonicity analysis
- Dissonance measurement
- Danceability metrics
- Dynamic complexity
- Audio segmentation (zero crossing rate)
- Beat Markers: Precise beat positions for loop slicing
- Loop Points: Automatic generation of 4, 8, and 16-beat loops
- Section Detection: Verse, chorus, bridge, intro, outro identification
- Story Arc Analysis: Tension/release patterns and narrative structure
- Motif Extraction: Recurring patterns tracked throughout the song
- Quote Detection: Repeated melodic and harmonic phrases
- Psychological Descriptors: Valence, arousal, emotional trajectory
- Paradigm Shift Detection: Identification of sudden emotional/musical changes
All interactions must use the MCP Server with Pow3r Pass authentication. Do not call REST APIs directly.
See MCP Usage Guide for complete documentation.
- Pow3r Pass token from
config.superbots.link - Required scopes:
audio:analyze,metadata:read, etc. - MCP client or HTTP client supporting JSON-RPC 2.0
# Set your Pow3r Pass token
export POW3R_PASS_TOKEN="your_pow3r_pass_token"
# Use MCP server (see docs/MCP-USAGE.md for details)
curl -X POST https://essentia-audio-analysis.contact-7d8.workers.dev \
-H "Content-Type: application/json" \
-H "MCP-Protocol-Version: 2024-11-05" \
-H "Authorization: Bearer $POW3R_PASS_TOKEN" \
-d '{
"jsonrpc": "2.0",
"id": 1,
"method": "tools/call",
"params": {
"name": "analyze_audio",
"arguments": {
"fileUrl": "https://example.com/audio.mp3",
"includeMetadata": true
}
}
}'Endpoint: https://essentia-audio-analysis.contact-7d8.workers.dev
Required Headers:
Content-Type: application/jsonMCP-Protocol-Version: 2024-11-05Authorization: Bearer <pow3r_pass_token>
Available Tools:
analyze_audio- Full audio analysis with metadataextract_beats- Beat markers and tempodetect_sections- Song section detectionanalyze_psychology- Psychological descriptorsget_metadata- Retrieve stored results
See MCP Usage Guide for complete documentation and examples.
The REST API endpoint exists for backward compatibility but should not be used for new integrations. All new code should use the MCP server interface.
Success Response (200 OK):
{
"success": true,
"result": {
"fft": "https://storage.googleapis.com/essentiajs/analytics/audio/{uuid}-audio/fft.json",
"dct": "https://storage.googleapis.com/essentiajs/analytics/audio/{uuid}-audio/dct.json",
"mfcc": "https://storage.googleapis.com/essentiajs/analytics/audio/{uuid}-audio/mfcc.json",
"chords": "https://storage.googleapis.com/essentiajs/analytics/audio/{uuid}-audio/chords.json",
"metadata": "https://storage.googleapis.com/essentiajs/analytics/audio/{uuid}-audio/metadata.json",
// ... more analysis types
}
}Metadata Structure (metadata.json):
{
"song": {
"duration": 390.5,
"bpm": 120,
"key": "C major",
"storyArc": {
"tension": [...],
"release": [...],
"narrativeStructure": "verse-chorus-verse-chorus-bridge-chorus"
},
"psychological": {
"overallValence": 0.7,
"overallArousal": 0.8,
"emotionalTrajectory": [
{"time": 0, "valence": 0.5, "arousal": 0.6},
{"time": 60, "valence": 0.8, "arousal": 0.9}
],
"paradigmShifts": [
{"time": 180, "type": "harmonic", "magnitude": 0.8}
]
},
"motifs": [
{
"id": "motif-melodic-0",
"type": "melodic",
"occurrences": [{"start": 10, "end": 15}, {"start": 70, "end": 75}],
"evolution": "variation"
}
],
"quotes": [
{
"type": "harmonic",
"original": {"start": 20, "end": 30},
"quoted": [{"start": 100, "end": 110}]
}
]
},
"sections": [
{
"type": "verse",
"start": 0,
"end": 60,
"confidence": 0.95,
"metadata": {
"bpm": 118,
"key": "C major",
"energy": 0.6,
"psychological": {
"valence": 0.5,
"arousal": 0.6
},
"motifs": ["motif-melodic-0"],
"storyArc": "exposition"
}
}
],
"loops": [
{
"id": "loop-4-0",
"start": 0,
"end": 2,
"length": 4,
"type": "4-beat",
"metadata": {
"energy": 0.7,
"harmony": ["C", "Am", "F", "G"],
"psychological": {
"valence": 0.6,
"arousal": 0.7
}
}
}
],
"beatMarkers": {
"bpm": 120,
"beats": [0, 0.5, 1.0, 1.5, ...],
"confidence": 0.95,
"totalBeats": 780
}
}Error Response (400/500):
{
"success": false,
"message": "Error description"
}Using MCP Server (Recommended):
# Set Pow3r Pass token
export POW3R_PASS_TOKEN="your_token"
# Use the provided script
node scripts/analyze-via-mcp.js https://example.com/audio.mp3
# Or use cURL directly
curl -X POST https://essentia-audio-analysis.contact-7d8.workers.dev \
-H "Content-Type: application/json" \
-H "MCP-Protocol-Version: 2024-11-05" \
-H "Authorization: Bearer $POW3R_PASS_TOKEN" \
-d '{
"jsonrpc": "2.0",
"id": 1,
"method": "tools/call",
"params": {
"name": "analyze_audio",
"arguments": {
"fileUrl": "https://example.com/audio.mp3",
"includeMetadata": true
}
}
}'Using JavaScript (MCP):
const POW3R_PASS_TOKEN = process.env.POW3R_PASS_TOKEN;
const MCP_ENDPOINT = 'https://essentia-audio-analysis.contact-7d8.workers.dev';
const response = await fetch(MCP_ENDPOINT, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'MCP-Protocol-Version': '2024-11-05',
'Authorization': `Bearer ${POW3R_PASS_TOKEN}`
},
body: JSON.stringify({
jsonrpc: '2.0',
id: 1,
method: 'tools/call',
params: {
name: 'analyze_audio',
arguments: {
fileUrl: 'https://example.com/audio.mp3',
includeMetadata: true
}
}
})
});
const data = await response.json();
console.log(data);Using Python (MCP):
import os
import requests
POW3R_PASS_TOKEN = os.environ.get('POW3R_PASS_TOKEN')
MCP_ENDPOINT = 'https://essentia-audio-analysis.contact-7d8.workers.dev'
response = requests.post(
MCP_ENDPOINT,
headers={
'Content-Type': 'application/json',
'MCP-Protocol-Version': '2024-11-05',
'Authorization': f'Bearer {POW3R_PASS_TOKEN}'
},
json={
'jsonrpc': '2.0',
'id': 1,
'method': 'tools/call',
'params': {
'name': 'analyze_audio',
'arguments': {
'fileUrl': 'https://example.com/audio.mp3',
'includeMetadata': True
}
}
}
)
data = response.json()
print(data)See MCP Usage Guide for complete examples and documentation.
The service is deployed to Cloudflare Workers:
npm run deployThis command:
- Patches Essentia.js for Cloudflare Workers compatibility
- Bundles the worker code
- Deploys to Cloudflare Workers
- Configures R2 bucket bindings
Configure in wrangler.toml or via Cloudflare dashboard:
FRAME_SAMPLE_RATE: Frame sampling rate (default: 5)WORKER_ENV: Environment (production/staging)R2_BUCKET: R2 bucket binding (configured in wrangler.toml)
After a successful analysis via MCP, results are returned in the MCP response. You can also access stored results using the get_metadata tool:
// Get metadata via MCP
const result = await callMCPTool('get_metadata', {
fileId: 'abc123',
metadataType: 'all'
});
// Results are also available as MCP resources
// Use resources/list to discover available resourcesThe service includes comprehensive E2E testing with Pow3r Workflow Orchestrator and Pow3r Pass authentication.
# Required environment variables
export POW3R_PASS_TOKEN=your_pow3r_pass_token
export ESSENTIA_URL=https://essentia.yourdomain.workers.dev
export WORKFLOW_URL=https://config.superbots.link/mcp/workflow
export POW3R_PASS_URL=https://config.superbots.link/pass# Install test dependencies
npm install
npx playwright install --with-deps
# Run full E2E test suite
npm run test:e2e
# Run specific test suites
npm run test:e2e:essentia # Essentia API tests
npm run test:e2e:workflow # Workflow orchestrator tests
# Deploy and test
./deploy-e2e.sh # Full deployment + testing
./deploy-e2e.sh essentia-only # Essentia service onlyTests validate proper permission enforcement:
| Scope | Description | Status |
|---|---|---|
audio:analyze |
Analyze audio files | ✅ Tested |
metadata:read |
Read analysis results | ✅ Tested |
beats:extract |
Extract beat markers | ✅ Tested |
sections:detect |
Detect song sections | ✅ Tested |
psychology:analyze |
Psychological analysis | ✅ Tested |
Complex audio analysis pipelines run on Cloudflare Edge:
// Example workflow execution
const workflow = {
name: 'audio-analysis-pipeline',
steps: [
'validate-input',
'download-audio',
'low-level-analysis',
'beat-analysis',
'section-detection',
'psychological-analysis',
'compile-results'
],
authentication: {
pow3rPass: true,
requiredScopes: ['audio:analyze']
}
};- HTML Report:
playwright-report/index.html - JSON Results:
test-results/e2e-results.json - Performance: Sub-10s API responses, Edge execution validated
Essentia/
├── config/
│ ├── gcpConfig.js # Google Cloud Storage configuration
│ └── index.js # Configuration module exports
├── index.js # Main Express application and API endpoint
├── helpers.js # Helper functions for enhanced analysis
│ # - Beat markers and loop slicing
│ # - Section detection
│ # - Songwriting metadata extraction
│ # - Psychological analysis
├── package.json # Node.js dependencies and scripts
├── Dockerfile # Docker configuration for deployment
└── README.md # This file
- express: Web framework for Node.js
- essentia.js: Audio analysis library (JavaScript port of Essentia)
- @google-cloud/storage: Google Cloud Storage client library
- dotenv: Environment variable management
- uuid: UUID generation for unique file naming
- cors: Cross-Origin Resource Sharing middleware
- body-parser: Request body parsing middleware
- @google-cloud/functions-framework: For local development/testing
The service uses frame sampling to balance analysis quality with processing time:
- Default: Processes every 5th frame (
FRAME_SAMPLE_RATE=5) - Full Processing: Set
FRAME_SAMPLE_RATE=1for complete analysis (slower, more accurate) - Faster Processing: Increase to 10-20 for quicker results (less detailed)
Processing Time Estimates (for 6.5-minute, 10MB file):
- Frame sampling (5): ~10-30 seconds CPU time
- Full processing (1): ~84-420 seconds CPU time
- Use frame sampling (5-10) for production workloads
- Monitor CPU time to stay within platform limits
- Consider Cloud Run for longer processing times (no 30s CPU limit)
-
Frame Sampling: By default, processes every 5th frame for efficiency. Adjust
FRAME_SAMPLE_RATEenvironment variable to change this. -
File Size: The service accepts large files (up to 10GB URL-encoded), but processing time and memory usage will increase with file size.
-
Temporary Files: Audio files are downloaded to the local filesystem temporarily and cleaned up after processing. Ensure sufficient disk space is available.
-
Cloud Storage: Results are stored in a public Google Cloud Storage bucket. Ensure proper access controls are configured.
-
Section Detection: Section classification (verse/chorus/bridge) uses heuristic analysis. Accuracy may vary depending on musical style.
-
Motif Detection: Requires minimum 2 occurrences to be identified as a motif. Adjust threshold in code if needed.
The service includes error handling for:
- Invalid or missing file URLs
- Download failures
- File read errors
- Upload failures to Google Cloud Storage
- Temporary file cleanup errors
All errors return appropriate HTTP status codes and error messages.
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
ISC
Mansoor
For issues, questions, or contributions, please open an issue on the repository.