Cascade is a production-ready, high-performance, and low-latency audio stream processing library designed for Voice Activity Detection (VAD). Built upon the excellent Silero VAD model, Cascade significantly reduces VAD processing latency while maintaining high accuracy through its 1:1:1 binding architecture and asynchronous streaming technology.
Based on our latest streaming VAD performance tests with different chunk sizes:
| Chunk Size (bytes) | Processing Time (ms) | Throughput (chunks/sec) | Total Test Time (s) | Speech Segments |
|---|---|---|---|---|
| 1024 | 0.66 | 92.2 | 3.15 | 2 |
| 4096 | 1.66 | 82.4 | 0.89 | 2 |
| 8192 | 2.95 | 72.7 | 0.51 | 2 |
| Metric | Value | Description |
|---|---|---|
| Best Processing Speed | 0.66ms/chunk | Optimal performance with 1024-byte chunks |
| Peak Throughput | 92.2 chunks/sec | Maximum processing throughput |
| Success Rate | 100% | Processing success rate across all tests |
| Accuracy | High | Guaranteed by the Silero VAD model |
| Architecture | 1:1:1:1 | Independent model per processor instance |
- Excellent performance across chunk sizes: High throughput and low latency with various chunk sizes
- Real-time capability: Sub-millisecond processing enables real-time applications
- Scalability: Linear performance scaling with independent processor instances
- Lock-Free Design: The 1:1:1 binding architecture eliminates lock contention, boosting performance.
- Frame-Aligned Buffer: A highly efficient buffer optimized for 512-sample frames.
- Asynchronous Streaming: Non-blocking audio stream processing based on
asyncio. - Memory Optimization: Zero-copy design, object pooling, and cache alignment.
- Concurrency Optimization: Dedicated threads, asynchronous queues, and batch processing.
- Real-time Interruption Detection: VAD-based intelligent interruption detection, allowing users to interrupt system responses at any time
- State Synchronization Guarantee: Two-way guard mechanism ensures strong consistency between physical and logical layers
- Automatic State Management: VAD automatically manages speech collection state, external services control processing state
- Anti-false-trigger Design: Minimum interval checking and state mutex locks effectively prevent false triggers
- Low-latency Response: Interruption detection latency < 50ms for natural conversation experience
- Modular Design: A component architecture with high cohesion and low coupling.
- Interface Abstraction: Dependency inversion through interface-based design.
- Type System: Data validation and type checking using Pydantic.
- Comprehensive Testing: Unit, integration, and performance tests.
- Code Standards: Adherence to PEP 8 style guidelines.
- Error Handling: Robust error handling and recovery mechanisms.
- Resource Management: Automatic cleanup and graceful shutdown.
- Monitoring Metrics: Real-time performance monitoring and statistics.
- Scalability: Horizontal scaling by increasing the number of instances.
- Stability Assurance: Handles boundary conditions and exceptional cases gracefully.
Cascade employs a 1:1:1:1 independent architecture to ensure optimal performance and thread safety.
graph TD
Client --> StreamProcessor
subgraph "1:1:1:1 Independent Architecture"
StreamProcessor --> |per connection| IndependentProcessor[Independent Processor Instance]
IndependentProcessor --> |independent loading| VADModel[Silero VAD Model]
IndependentProcessor --> |independent management| VADIterator[VAD Iterator]
IndependentProcessor --> |independent buffering| FrameBuffer[Frame-Aligned Buffer]
IndependentProcessor --> |independent state| StateMachine[State Machine]
end
subgraph "Asynchronous Processing Flow"
VADModel --> |asyncio.to_thread| VADInference[VAD Inference]
VADInference --> StateMachine
StateMachine --> |None| SingleFrame[Single Frame Output]
StateMachine --> |start| Collecting[Start Collecting]
StateMachine --> |end| SpeechSegment[Speech Segment Output]
end
pip install cascade-vad
OR
# Using uv is recommended
uv venv -p 3.12
source .venv/bin/activate
# Install from PyPI (recommended)
pip install cascade-vad
# Or install from source
git clone https://github.com/xucailiang/cascade.git
cd cascade
pip install -e .import cascade
import asyncio
async def basic_example():
"""A basic usage example."""
# Method 1: Simple file processing
async for result in cascade.process_audio_file("audio.wav"):
if result.result_type == "segment":
segment = result.segment
print(f"π€ Speech Segment: {segment.start_timestamp_ms:.0f}ms - {segment.end_timestamp_ms:.0f}ms")
else:
frame = result.frame
print(f"π Single Frame: {frame.timestamp_ms:.0f}ms")
# Method 2: Stream processing
async with cascade.StreamProcessor() as processor:
async for result in processor.process_stream(audio_stream):
if result.result_type == "segment":
segment = result.segment
print(f"π€ Speech Segment: {segment.start_timestamp_ms:.0f}ms - {segment.end_timestamp_ms:.0f}ms")
else:
frame = result.frame
print(f"π Single Frame: {frame.timestamp_ms:.0f}ms")
asyncio.run(basic_example())import cascade
async def advanced_example():
"""An advanced configuration example."""
# Custom configuration
config = cascade.Config(
vad_threshold=0.7, # Higher detection threshold
min_silence_duration_ms=100,
speech_pad_ms=100
)
# Use the custom config
async with cascade.StreamProcessor(config) as processor:
# Process audio stream
async for result in processor.process_stream(audio_stream):
# Process results...
pass
# Get performance statistics
stats = processor.get_stats()
print(f"Throughput: {stats.throughput_chunks_per_second:.1f} chunks/sec")
asyncio.run(advanced_example())import cascade
async def interruption_example():
"""Interruption detection example"""
# Configure interruption detection
config = cascade.Config(
vad_threshold=0.5,
interruption_config=cascade.InterruptionConfig(
enable_interruption=True, # Enable interruption detection
min_interval_ms=500 # Minimum interruption interval 500ms
)
)
async with cascade.StreamProcessor(config) as processor:
async for result in processor.process_stream(audio_stream):
# Detect interruption events
if result.is_interruption:
print(f"π Interruption detected! Interrupted state: {result.interruption.system_state.value}")
# Stop current TTS playback
await tts_service.stop()
# Cancel LLM request
await llm_service.cancel()
# Process speech segments
elif result.is_speech_segment:
# ASR recognition
text = await asr_service.recognize(result.segment.audio_data)
# Set to processing
processor.set_system_state(cascade.SystemState.PROCESSING)
# LLM generation
response = await llm_service.generate(text)
# Set to responding
processor.set_system_state(cascade.SystemState.RESPONDING)
# TTS playback
await tts_service.play(response)
# Reset to idle after completion
processor.set_system_state(cascade.SystemState.IDLE)
asyncio.run(interruption_example())For detailed documentation, see: Interruption Implementation Summary
# Run basic integration tests
python tests/test_simple_vad.py -v
# Run simulated audio stream tests
python tests/test_stream_vad.py -v
# Run performance benchmark tests
python tests/benchmark_performance.pyTest Coverage:
- β Basic API Usage
- β Stream Processing
- β File Processing
- β Real Audio VAD
- β Automatic Speech Segment Saving
- β 1:1:1:1 Architecture Validation
- β Performance Benchmarks
- β FrameAlignedBuffer Tests
We provide a complete WebSocket-based web demonstration that showcases Cascade's real-time VAD capabilities with multiple client support.
- Real-time Audio Processing: Capture audio from browser microphone and process with VAD
- Live VAD Visualization: Real-time display of VAD detection results
- Speech Segment Management: Display detected speech segments with playback support
- Dynamic VAD Configuration: Adjust VAD parameters in real-time
- Multi-client Support: Independent Cascade instances for each WebSocket connection
# Start backend server
cd web_demo
python server.py
# Start frontend (in another terminal)
cd web_demo/frontend
pnpm install && pnpm devFor detailed setup instructions, see Web Demo Documentation.
-
Resource Allocation
- Each instance uses approximately 50MB of memory.
- Recommended: 2-3 instances per CPU core.
- Monitor memory usage to prevent Out-of-Memory (OOM) errors.
-
Performance Tuning
- Adjust
max_instancesto match server CPU cores. - Increase
buffer_size_framesfor higher throughput. - Tune
vad_thresholdto balance accuracy and sensitivity.
- Adjust
-
Error Handling
- Implement retry mechanisms for transient errors.
- Use health checks to monitor service status.
- Log detailed information for troubleshooting.
# Get performance monitoring metrics
stats = processor.get_stats()
# Key monitoring metrics
print(f"Total Chunks Processed: {stats.total_chunks_processed}")
print(f"Average Processing Time: {stats.average_processing_time_ms:.2f}ms")
print(f"Throughput: {stats.throughput_chunks_per_second:.1f} chunks/sec")
print(f"Speech Segments: {stats.speech_segments}")
print(f"Error Rate: {stats.error_rate:.2%}")
print(f"Memory Usage: {stats.memory_usage_mb:.1f}MB")- Python: 3.12 (recommended)
- pydantic: 2.4.0+ (Data validation)
- numpy: 1.24.0+ (Numerical computation)
- scipy: 1.11.0+ (Signal processing)
- silero-vad: 5.1.2+ (VAD model)
- onnxruntime: 1.22.1+ (ONNX inference)
- torchaudio: 2.7.1+ (Audio processing)
- pytest: Testing framework
- black: Code formatter
- ruff: Linter
- mypy: Type checker
- pre-commit: Git hooks
We welcome community contributions! Please follow these steps:
- Fork the project and create a feature branch.
- Install development dependencies:
pip install -e .[dev] - Run tests:
pytest - Lint your code:
ruff check . && black --check . - Type check:
mypy cascade - Submit a Pull Request with a clear description of your changes.
This project is licensed under the MIT License - see the LICENSE file for details.
- Silero Team: For their excellent VAD model.
- PyTorch Team: For the deep learning framework.
- Pydantic Team: For the type validation system.
- Python Community: For the rich ecosystem.
- Author: Xucailiang
- Email: xucailiang.ai@gmail.com
- Project Homepage: https://github.com/xucailiang/cascade
- Issue Tracker: https://github.com/xucailiang/cascade/issues
- Documentation: https://cascade-vad.readthedocs.io/
β If you find this project helpful, please give it a star!

