MOCA-Net represents a novel approach to neural network architecture design, combining sparse mixture-of-experts, external memory mechanisms, and budget-aware computation for efficient sequence modeling. The architecture introduces three key innovations: intelligent sparse token routing, learnable memory banks with adaptive gating, and differentiable budget optimization during training.
Get up and running with MOCA-Net in just a few commands:
# Setup environment
make setup
# Run tests
make test
# Train on copy task (fast CPU run)
make train
# Run ablation studies
make ablate
# Generate plots
python scripts/plot_runs.py
# Chat interactively with trained SST-2 model
make chat
# Show interactive inference demo
make demo-interactiveMOCA-Net's design philosophy centers around three core innovations that work together to achieve efficient sequence modeling:
- Sparse Token Router: Dynamically selects which experts and memory slots to engage per token, operating under strict compute budget constraints
- External Memory Bank: Implements learnable memory slots with sophisticated gated update mechanisms for long-term information retention
- Budget-Aware Training: Incorporates a differentiable loss term that actively encourages efficient resource utilization throughout the training process
This architectural approach achieves O(L) complexity while preserving the expressive power needed for complex sequence modeling tasks through intelligent resource allocation strategies.
The data flow through MOCA-Net follows this streamlined path:
Input → Token Router → [Experts + Memory] → Combined Output → Task Head
Core Components:
- Token Router: Intelligently routes tokens to top-k experts and memory slots based on learned routing policies
- Expert Layer: Implements a mixture of lightweight MLP experts with shared projection layers for efficiency
- Memory Bank: Provides external memory with attention-based read/write operations for persistent information storage
- Budget Loss: Continuously monitors and optimizes resource usage during training
For a comprehensive understanding of the mathematical foundations, refer to docs/ARCHITECTURE.md. For detailed information about the Stanford SST-2 dataset integration, see docs/SST2_INTEGRATION.md. For comprehensive interactive inference usage, see docs/INTERACTIVE_INFERENCE.md.
MOCA-Net is designed to meet specific performance benchmarks across different tasks:
| Task | Metric | Target | CPU Runtime |
|---|---|---|---|
| Copy/Recall | Accuracy | ≥95% | ≤10 min |
| Text Classification | Accuracy | ≥80% | ≤5 min |
- Python 3.12 or higher
- Ubuntu 24.04 or compatible Linux distribution
# Clone the repository
git clone https://github.com/paredezadrian/mocanet.git
cd mocanet
# Create virtual environment and install dependencies
make setup
# Verify installation
make testMOCA-Net provides several training configurations to suit different research needs:
# Train on copy task (default configuration)
make train
# Train on text classification task (real SST-2 dataset)
make demo
# Quick training runs for rapid experimentation
make run-copy # 1000 steps on copy task
make run-text # 500 steps on text classification
# Test and debug SST-2 dataset integration
make test-sst2 # Test SST-2 dataset loading
make debug-sst2 # Debug data and model outputs
# Interactive inference with trained model
make chat # Chat with trained SST-2 model (uses final checkpoint)
make demo-interactive # Show interactive inference demo
make test-quality # Test model quality and compare checkpointsEvaluate your trained models using the built-in evaluation framework:
# Evaluate trained model on copy task
python -m mocanet.eval runs/mocanet_best.pt --task copy
# Generate comprehensive training plots
python scripts/plot_runs.pyChat interactively with your trained SST-2 sentiment analysis model:
# Start interactive chat session (recommended)
make chat
# Test model quality first
make test-quality
# Show interactive demo
make demo-interactive
# Or run directly with custom checkpoint
python scripts/interactive_inference.py runs/mocanet_final.ptInteractive Features:
- Real-time sentiment analysis of any text input
- Confidence scores and probability distributions
- Model statistics and configuration details
- Rich terminal interface with colored output
- Built-in commands:
help,stats,quit
Important Notes:
- Use
runs/mocanet_final.ptfor the fully trained model (96.4% validation accuracy, step 2000) - Avoid
runs/mocanet_best.ptfor inference (step 0, poor quality) - Test model quality with
make test-qualitybefore using interactively
Example Usage:
Enter text: This movie was absolutely fantastic!
Sentiment: Positive
Confidence: 0.892
Negative Probability: 0.108
Positive Probability: 0.892
Confidence Level: High
Interactive Workflow:
- Test Model Quality: Run
make test-qualityto verify checkpoint performance - Start Interactive Session: Use
make chatto begin sentiment analysis - Input Text: Type any sentence to analyze sentiment
- View Results: See prediction, confidence, and probability distributions
- Use Commands: Type
help,stats, orquitfor additional features
Available Commands:
help- Show available commands and usagestats- Display model architecture and configuration detailsquit- Exit the interactive session
Explore the impact of different architectural components:
# Run comprehensive ablation studies
make ablate
# Results are automatically saved to runs/ablation/MOCA-Net employs YAML-based configuration management powered by Hydra, enabling flexible experiment management and reproducible research. The configuration system is organized into logical components:
- base.yaml: Core model architecture and training parameters
- copy_task.yaml: Task-specific settings for copy/recall experiments
- text_cls.yaml: Configuration for text classification tasks with real Stanford SST-2 dataset
model:
embedding_dim: 128 # Token embedding dimension
num_experts: 4 # Number of expert networks in mixture
num_memory_slots: 64 # Memory bank capacity
top_k_experts: 2 # Sparse routing parameter (top-k selection)
router_temperature: 1.0 # Routing temperature for sparsity control
training:
batch_size: 64 # Training batch size
max_steps: 5000 # Maximum training steps
learning_rate: 1e-3 # Learning rate
warmup_steps: 200 # Learning rate warmup steps
gradient_clip_norm: 1.0 # Gradient clipping norm
# SST-2 Dataset Configuration
text_cls:
use_real_sst2: true # Use real Stanford SST-2 dataset
dataset: "sst2" # Full Stanford SST-2 dataset
min_freq: 2 # Minimum token frequency for vocabulary
max_vocab_size: 10000 # Maximum vocabulary size- Target Accuracy: ≥95% on sequences up to 60 tokens
- Expected Runtime: ≤10 minutes on CPU
- Memory Usage: <4GB RAM
mocanet_final.pt: Final trained model (step 2000, 96.4% validation accuracy) - Use for inferencemocanet_best.pt: Best validation checkpoint (step 0) - Avoid for inferencemocanet_step_*.pt: Intermediate training checkpoints for analysis
- Target Accuracy: ≥95% on Stanford SST-2 dataset
- Expected Runtime: ≤10 minutes on CPU for 500 steps, ~6.5 minutes for 2000 steps
- Dataset Size: 67,349 training samples, 872 validation samples, 1,821 test samples
- Real Dataset: Full Stanford Sentiment Treebank v2 (SST-2) from Hugging Face
- Real-time Analysis: Instant sentiment predictions for any text input
- Confidence Scoring: Probability distributions and confidence levels (>0.8 for trained model)
- User Interface: Rich terminal-based interactive chat system with colored output
- Model Insights: Access to model statistics and configuration details
- Quality Assurance: Built-in model quality testing and checkpoint comparison
- Performance: 66.7% accuracy on test sentences with 96.4% validation accuracy
The framework supports systematic ablation studies to understand component contributions:
- No Memory: Disables external memory bank to isolate memory effects
- No Experts: Replaces mixture-of-experts with single expert architecture
- Dense Routing: Uses all experts instead of sparse routing for comparison
- Smaller Model: Reduces model size by half to analyze scaling effects
Execute ablation studies with: make ablate
Ensure code quality and functionality with the comprehensive testing suite:
# Run complete test suite
make test
# Test interactive inference functionality
make test-interactive
# Test model quality and compare checkpoints
make test-quality
# Run specific test file with verbose output
python -m pytest tests/test_layers.py -v
# Generate coverage report
python -m pytest tests/ --cov=src/mocanet --cov-report=htmlMOCA-Net provides several utility scripts for different purposes:
scripts/plot_runs.py: Generate comprehensive training plots and visualizationsscripts/debug_data.py: Debug data loading and preprocessing issuesscripts/test_sst2.py: Test SST-2 dataset integration and loading
scripts/interactive_inference.py: Main interactive chat interface for sentiment analysisscripts/demo_interactive.py: Demo script showing interactive capabilitiesscripts/test_model_quality.py: Test and compare different model checkpoints
docs/INTERACTIVE_INFERENCE.md: Comprehensive guide to interactive inferencedocs/ARCHITECTURE.md: Detailed architecture and mathematical foundationsdocs/SST2_INTEGRATION.md: SST-2 dataset integration details
# Test model quality before interactive use
make test-quality
# Start interactive chat
make chat
# Show interactive demo
make demo-interactive
# Generate training plots
python scripts/plot_runs.pyTraining progress is comprehensively logged using multiple tools:
- Rich: Provides beautiful, interactive console output with real-time progress bars
- TensorBoard: Tracks training curves, metrics, and model performance over time
- Checkpoints: Automatically saves model states every 1000 steps for recovery
- CUDA Out of Memory: Reduce batch size in configuration files
- Slow Training Performance: Ensure
num_workers=0for CPU-based training - Import Errors: Run
make setupto properly install all dependencies - Poor Interactive Inference Results: Use
make test-qualityto verify model quality
-
Random/Incorrect Predictions:
- Ensure you're using
runs/mocanet_final.pt(notmocanet_best.pt) - Run
make test-qualityto verify checkpoint quality - Check that training completed successfully (validation accuracy >90%)
- Ensure you're using
-
Low Confidence Scores:
- Model may need more training steps
- Verify training configuration and hyperparameters
- Check training logs for convergence
-
Import Errors in Interactive Mode:
- Ensure virtual environment is activated:
. venv/bin/activate - Run
make setupto install all dependencies - Check Python path and module imports
- Ensure virtual environment is activated:
- Use
batch_size=16for text classification tasks on CPU - Set
max_steps=1000for rapid experimentation cycles - Enable
gradient_clip_norm=1.0for training stability - For interactive inference, use the final checkpoint (step 2000) for best results
The MOCA-Net project continues to evolve with planned enhancements:
- Learned Write Policy: Develop adaptive memory update strategies based on input characteristics
- KV Compression: Implement efficient memory representation techniques
- Hierarchical Experts: Design multi-level expert organization for complex tasks
- Retrieval-Augmented Tasks: Integrate external knowledge sources
- Dynamic Routing: Create adaptive expert selection mechanisms based on input complexity
MOCA-Net builds upon and extends several foundational works in neural architecture design:
- Mixture of Experts: Shazeer et al. (2017) - Out of the Box: An Empirical Study of the Real-World Effectiveness of Neural Machine Translation
- External Memory: Graves et al. (2014) - Neural Turing Machines
- Sparse Routing: Lepikhin et al. (2020) - GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
We welcome contributions from the research community! To contribute:
- Fork the repository
- Create a feature branch for your contribution
- Add comprehensive tests for new functionality
- Ensure all tests pass:
make test - Submit a pull request with detailed description
This project is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0).
License Terms:
- ✅ You can share and adapt this work freely
- ✅ You must give attribution to the original author
- ❌ You cannot use it for commercial purposes
- ✅ You must share adaptations under the same license
For complete license details, see the LICENSE file, or visit Creative Commons for more information.
MOCA-Net stands on the shoulders of the open-source machine learning community:
- PyTorch Team: For providing an excellent deep learning framework
- Rich Library: For beautiful console output and user experience
- Hydra: For robust configuration management and experiment tracking
- Open-Source ML Community: For continuous inspiration and collaboration
Ready to explore the frontiers of efficient neural architecture design? Begin your journey with make setup and discover the innovative world of MOCA-Net.