MOCA-Net: Memory-Orchestrated Context Allocation Network

MOCA-Net represents a novel approach to neural network architecture design, combining sparse mixture-of-experts, external memory mechanisms, and budget-aware computation for efficient sequence modeling. The architecture introduces three key innovations: intelligent sparse token routing, learnable memory banks with adaptive gating, and differentiable budget optimization during training.

Quick Start

Get up and running with MOCA-Net in just a few commands:

# Setup environment
make setup

# Run tests
make test

# Train on copy task (fast CPU run)
make train

# Run ablation studies
make ablate

# Generate plots
python scripts/plot_runs.py

# Chat interactively with trained SST-2 model
make chat

# Show interactive inference demo
make demo-interactive

Architecture Overview

MOCA-Net's design philosophy centers around three core innovations that work together to achieve efficient sequence modeling:

Sparse Token Router: Dynamically selects which experts and memory slots to engage per token, operating under strict compute budget constraints
External Memory Bank: Implements learnable memory slots with sophisticated gated update mechanisms for long-term information retention
Budget-Aware Training: Incorporates a differentiable loss term that actively encourages efficient resource utilization throughout the training process

This architectural approach achieves O(L) complexity while preserving the expressive power needed for complex sequence modeling tasks through intelligent resource allocation strategies.

System Architecture

The data flow through MOCA-Net follows this streamlined path:

Input → Token Router → [Experts + Memory] → Combined Output → Task Head

Core Components:

Token Router: Intelligently routes tokens to top-k experts and memory slots based on learned routing policies
Expert Layer: Implements a mixture of lightweight MLP experts with shared projection layers for efficiency
Memory Bank: Provides external memory with attention-based read/write operations for persistent information storage
Budget Loss: Continuously monitors and optimizes resource usage during training

For a comprehensive understanding of the mathematical foundations, refer to docs/ARCHITECTURE.md. For detailed information about the Stanford SST-2 dataset integration, see docs/SST2_INTEGRATION.md. For comprehensive interactive inference usage, see docs/INTERACTIVE_INFERENCE.md.

Performance Targets

MOCA-Net is designed to meet specific performance benchmarks across different tasks:

Task	Metric	Target	CPU Runtime
Copy/Recall	Accuracy	≥95%	≤10 min
Text Classification	Accuracy	≥80%	≤5 min

Installation

Prerequisites

Python 3.12 or higher
Ubuntu 24.04 or compatible Linux distribution

Setup Instructions

# Clone the repository
git clone https://github.com/paredezadrian/mocanet.git
cd mocanet

# Create virtual environment and install dependencies
make setup

# Verify installation
make test

Usage Examples

Training Workflows

MOCA-Net provides several training configurations to suit different research needs:

# Train on copy task (default configuration)
make train

# Train on text classification task (real SST-2 dataset)
make demo

# Quick training runs for rapid experimentation
make run-copy    # 1000 steps on copy task
make run-text    # 500 steps on text classification

# Test and debug SST-2 dataset integration
make test-sst2   # Test SST-2 dataset loading
make debug-sst2  # Debug data and model outputs

# Interactive inference with trained model
make chat        # Chat with trained SST-2 model (uses final checkpoint)
make demo-interactive  # Show interactive inference demo
make test-quality     # Test model quality and compare checkpoints

Model Evaluation

Evaluate your trained models using the built-in evaluation framework:

# Evaluate trained model on copy task
python -m mocanet.eval runs/mocanet_best.pt --task copy

# Generate comprehensive training plots
python scripts/plot_runs.py

Interactive Inference

Chat interactively with your trained SST-2 sentiment analysis model:

# Start interactive chat session (recommended)
make chat

# Test model quality first
make test-quality

# Show interactive demo
make demo-interactive

# Or run directly with custom checkpoint
python scripts/interactive_inference.py runs/mocanet_final.pt

Interactive Features:

Real-time sentiment analysis of any text input
Confidence scores and probability distributions
Model statistics and configuration details
Rich terminal interface with colored output
Built-in commands: help, stats, quit

Important Notes:

Use runs/mocanet_final.pt for the fully trained model (96.4% validation accuracy, step 2000)
Avoid runs/mocanet_best.pt for inference (step 0, poor quality)
Test model quality with make test-quality before using interactively

Example Usage:

Enter text: This movie was absolutely fantastic!
Sentiment: Positive
Confidence: 0.892
Negative Probability: 0.108
Positive Probability: 0.892
Confidence Level: High

Interactive Workflow:

Test Model Quality: Run make test-quality to verify checkpoint performance
Start Interactive Session: Use make chat to begin sentiment analysis
Input Text: Type any sentence to analyze sentiment
View Results: See prediction, confidence, and probability distributions
Use Commands: Type help, stats, or quit for additional features

Available Commands:

help - Show available commands and usage
stats - Display model architecture and configuration details
quit - Exit the interactive session

Ablation Studies

Explore the impact of different architectural components:

# Run comprehensive ablation studies
make ablate

# Results are automatically saved to runs/ablation/

Configuration

MOCA-Net employs YAML-based configuration management powered by Hydra, enabling flexible experiment management and reproducible research. The configuration system is organized into logical components:

base.yaml: Core model architecture and training parameters
copy_task.yaml: Task-specific settings for copy/recall experiments
text_cls.yaml: Configuration for text classification tasks with real Stanford SST-2 dataset

Key Configuration Parameters

model:
  embedding_dim: 128        # Token embedding dimension
  num_experts: 4            # Number of expert networks in mixture
  num_memory_slots: 64      # Memory bank capacity
  top_k_experts: 2          # Sparse routing parameter (top-k selection)
  router_temperature: 1.0   # Routing temperature for sparsity control

training:
  batch_size: 64            # Training batch size
  max_steps: 5000           # Maximum training steps
  learning_rate: 1e-3       # Learning rate
  warmup_steps: 200         # Learning rate warmup steps
  gradient_clip_norm: 1.0   # Gradient clipping norm

# SST-2 Dataset Configuration
text_cls:
  use_real_sst2: true       # Use real Stanford SST-2 dataset
  dataset: "sst2"           # Full Stanford SST-2 dataset
  min_freq: 2               # Minimum token frequency for vocabulary
  max_vocab_size: 10000     # Maximum vocabulary size

Expected Results

Copy Task Performance

Target Accuracy: ≥95% on sequences up to 60 tokens
Expected Runtime: ≤10 minutes on CPU
Memory Usage: <4GB RAM

Checkpoint Information

mocanet_final.pt: Final trained model (step 2000, 96.4% validation accuracy) - Use for inference
mocanet_best.pt: Best validation checkpoint (step 0) - Avoid for inference
mocanet_step_*.pt: Intermediate training checkpoints for analysis

Text Classification Performance

Target Accuracy: ≥95% on Stanford SST-2 dataset
Expected Runtime: ≤10 minutes on CPU for 500 steps, ~6.5 minutes for 2000 steps
Dataset Size: 67,349 training samples, 872 validation samples, 1,821 test samples
Real Dataset: Full Stanford Sentiment Treebank v2 (SST-2) from Hugging Face

Interactive Inference Capabilities

Real-time Analysis: Instant sentiment predictions for any text input
Confidence Scoring: Probability distributions and confidence levels (>0.8 for trained model)
User Interface: Rich terminal-based interactive chat system with colored output
Model Insights: Access to model statistics and configuration details
Quality Assurance: Built-in model quality testing and checkpoint comparison
Performance: 66.7% accuracy on test sentences with 96.4% validation accuracy

Ablation Studies

The framework supports systematic ablation studies to understand component contributions:

No Memory: Disables external memory bank to isolate memory effects
No Experts: Replaces mixture-of-experts with single expert architecture
Dense Routing: Uses all experts instead of sparse routing for comparison
Smaller Model: Reduces model size by half to analyze scaling effects

Execute ablation studies with: make ablate

Testing

Ensure code quality and functionality with the comprehensive testing suite:

# Run complete test suite
make test

# Test interactive inference functionality
make test-interactive

# Test model quality and compare checkpoints
make test-quality

# Run specific test file with verbose output
python -m pytest tests/test_layers.py -v

# Generate coverage report
python -m pytest tests/ --cov=src/mocanet --cov-report=html

Available Scripts

MOCA-Net provides several utility scripts for different purposes:

Training and Evaluation

scripts/plot_runs.py: Generate comprehensive training plots and visualizations
scripts/debug_data.py: Debug data loading and preprocessing issues
scripts/test_sst2.py: Test SST-2 dataset integration and loading

Interactive Inference

scripts/interactive_inference.py: Main interactive chat interface for sentiment analysis
scripts/demo_interactive.py: Demo script showing interactive capabilities
scripts/test_model_quality.py: Test and compare different model checkpoints

Documentation

docs/INTERACTIVE_INFERENCE.md: Comprehensive guide to interactive inference
docs/ARCHITECTURE.md: Detailed architecture and mathematical foundations
docs/SST2_INTEGRATION.md: SST-2 dataset integration details

Usage Examples

# Test model quality before interactive use
make test-quality

# Start interactive chat
make chat

# Show interactive demo
make demo-interactive

# Generate training plots
python scripts/plot_runs.py

Monitoring and Logging

Training progress is comprehensively logged using multiple tools:

Rich: Provides beautiful, interactive console output with real-time progress bars
TensorBoard: Tracks training curves, metrics, and model performance over time
Checkpoints: Automatically saves model states every 1000 steps for recovery

Troubleshooting

Common Issues and Solutions

CUDA Out of Memory: Reduce batch size in configuration files
Slow Training Performance: Ensure num_workers=0 for CPU-based training
Import Errors: Run make setup to properly install all dependencies
Poor Interactive Inference Results: Use make test-quality to verify model quality

Interactive Inference Issues

Random/Incorrect Predictions:
- Ensure you're using runs/mocanet_final.pt (not mocanet_best.pt)
- Run make test-quality to verify checkpoint quality
- Check that training completed successfully (validation accuracy >90%)
Low Confidence Scores:
- Model may need more training steps
- Verify training configuration and hyperparameters
- Check training logs for convergence
Import Errors in Interactive Mode:
- Ensure virtual environment is activated: . venv/bin/activate
- Run make setup to install all dependencies
- Check Python path and module imports

Performance Optimization Tips

Use batch_size=16 for text classification tasks on CPU
Set max_steps=1000 for rapid experimentation cycles
Enable gradient_clip_norm=1.0 for training stability
For interactive inference, use the final checkpoint (step 2000) for best results

Future Development Roadmap

The MOCA-Net project continues to evolve with planned enhancements:

Learned Write Policy: Develop adaptive memory update strategies based on input characteristics
KV Compression: Implement efficient memory representation techniques
Hierarchical Experts: Design multi-level expert organization for complex tasks
Retrieval-Augmented Tasks: Integrate external knowledge sources
Dynamic Routing: Create adaptive expert selection mechanisms based on input complexity

Research References

MOCA-Net builds upon and extends several foundational works in neural architecture design:

Mixture of Experts: Shazeer et al. (2017) - Out of the Box: An Empirical Study of the Real-World Effectiveness of Neural Machine Translation
External Memory: Graves et al. (2014) - Neural Turing Machines
Sparse Routing: Lepikhin et al. (2020) - GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding

Contributing

We welcome contributions from the research community! To contribute:

Fork the repository
Create a feature branch for your contribution
Add comprehensive tests for new functionality
Ensure all tests pass: make test
Submit a pull request with detailed description

License

This project is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0).

License Terms:

✅ You can share and adapt this work freely
✅ You must give attribution to the original author
❌ You cannot use it for commercial purposes
✅ You must share adaptations under the same license

For complete license details, see the LICENSE file, or visit Creative Commons for more information.

Acknowledgments

MOCA-Net stands on the shoulders of the open-source machine learning community:

PyTorch Team: For providing an excellent deep learning framework
Rich Library: For beautiful console output and user experience
Hydra: For robust configuration management and experiment tracking
Open-Source ML Community: For continuous inspiration and collaboration

Ready to explore the frontiers of efficient neural architecture design? Begin your journey with make setup and discover the innovative world of MOCA-Net.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
configs		configs
docs		docs
scripts		scripts
src/mocanet		src/mocanet
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.sh		setup.sh

License

paredezadrian/mocanet

Folders and files

Latest commit

History

Repository files navigation

MOCA-Net: Memory-Orchestrated Context Allocation Network

Quick Start

Architecture Overview

System Architecture

Performance Targets

Installation

Prerequisites

Setup Instructions

Usage Examples

Training Workflows

Model Evaluation

Interactive Inference

Ablation Studies

Configuration

Key Configuration Parameters

Expected Results

Copy Task Performance

Checkpoint Information

Text Classification Performance

Interactive Inference Capabilities

Ablation Studies

Testing

Available Scripts

Training and Evaluation

Interactive Inference

Documentation

Usage Examples

Monitoring and Logging

Troubleshooting

Common Issues and Solutions

Interactive Inference Issues

Performance Optimization Tips

Future Development Roadmap

Research References

Contributing

License

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages