MLX
A full-lifecycle ML workbench for Claude Code — from paper to production in one plugin.
Quick Start · Skills · Agents · Datasets · Architecture · Contributing
MLX is a Claude Code plugin that gives your agent the complete machine learning toolkit — research papers across 7 academic sources, discover and download datasets from 5 free repositories, explore and clean data, engineer features, train models, run experiments, build AI applications with LLMs and RAG, deploy models to production, generate podcasts and content from papers, manage notebooks, extract YouTube video content, and learn ML interactively with 3 university-grade courses. 7 specialized agents, 13 skills.
# Add the marketplace, then install the plugin
/plugin marketplace add damionrashford/mlx
/plugin install mlx@damionrashford-mlxOr install directly:
git clone https://github.com/damionrashford/mlx.git
claude --plugin-dir ./mlx| Requirement | Install |
|---|---|
| Python 3.10+ | brew install python or apt install python3 |
| pdftotext (optional, for PDF extraction) | brew install poppler or apt install poppler-utils |
| notebooklm (optional, for podcast generation) | pip install notebooklm |
| yt-dlp (optional, for YouTube extraction) | pip install yt-dlp |
| youtube-transcript-api (optional, for transcripts) | pip install youtube-transcript-api |
Most features require no API keys or accounts. The media skill's content generation requires a Google account with NotebookLM access.
Plugin settings cannot auto-configure permissions. For the smoothest experience, add these to your user or project settings:
{
"permissions": {
"allow": [
"Bash(python3 *)",
"Bash(pip install *)",
"Bash(which *)",
"Read(*)",
"Glob(*)"
]
}
}MLX ships 13 skills that cover the full ML and data lifecycle. Each is invocable as a slash command or triggered automatically by natural language.
| Skill | Command | What it does |
|---|---|---|
| research | /research transformer attention |
Search papers from 7 sources, find/download datasets from 5 sources, structured paper review |
| prototype | /prototype ./paper.pdf |
Convert a research paper into a working code project (Python, TS, Rust, Go) |
| data-prep | /data-prep data/train.csv |
EDA + cleaning + feature engineering: profiling, distributions, missing values, transforms, encodings |
| analyze | /analyze data/sales.csv |
Statistical tests, A/B testing, cohort analysis, segmentation, KPIs, pre-delivery QA/validation |
| visualize | /visualize data/metrics.csv |
Charts, dashboards, and reports with matplotlib, seaborn, or plotly |
| train | /train data/features.csv |
Train, evaluate, and iterate on models with experiment tracking |
| evaluate | /evaluate results.tsv |
Multi-dimensional model evaluation, LLM-as-judge, bias detection |
| notebook | /notebook analysis.ipynb |
Clean, organize, document, and convert Jupyter notebooks |
| serve | /serve model.joblib |
Deploy models: inference API, Docker, CI/CD, monitoring, model cards |
| context-engineering | natural language | Context window management, memory systems, multi-agent patterns for LLM apps |
| media | /media paper.pdf |
YouTube extraction + NotebookLM content generation (podcasts, videos, quizzes, reports, slides) |
| mcp-builder | natural language | Build MCP servers to connect LLMs with external services |
| learn | /learn transformers |
Interactive ML education with 3 courses (CS229, Applied ML, ML Engineering), 53+ lessons, quizzes, and interview prep |
research → prototype → data-prep → train → evaluate → serve → notebook
│ │ │ │ │
│ find │ media │ explore │ build & iterate │ document
│ papers │ & content │ & prep │ on models │ results
└──────────┴────────────┴──────────┴────────────────────┘
media ──── extract YouTube content + generate podcasts/videos
learn ──── study ML concepts interactively
Agent coverage:
ml-researcher ── find papers, datasets, review, media, prototype
data-analyst ─── data-prep, analyze, visualize, report
data-scientist ─ full pipeline: data → trained model
ml-engineer ──── optimize: features, tuning, ablations
ai-engineer ──── LLM apps: RAG, prompts, agents, MCP servers
ml-ops ────────── deploy: serialize, serve, Docker, monitor
ml-tutor ──────── learn ML: courses, quizzes, interview prep
Search across 7 free academic sources — no API keys, no rate-limit hassle.
| Source | Search | Fetch | Download | Best for |
|---|---|---|---|---|
| arXiv | yes | yes | yes | ML/AI preprints |
| Semantic Scholar | yes | yes | — | Citations, open-access PDFs |
| Papers with Code | yes | yes | — | Papers linked to GitHub repos |
| Hugging Face | yes | via arXiv | — | Trending daily papers |
| JMLR | yes | yes | yes | Peer-reviewed ML journal |
| ACL Anthology | — | by ID | yes | NLP conference papers |
| OpenScholar | — | — | — | Q&A synthesis over 45M papers |
# Search arXiv
/research transformer attention mechanisms
# Multi-source concurrent search
python3 scripts/scientific_search.py "BERT NLP" --max 10
# Download a paper
python3 scripts/download.py 2401.12345 --output ./papers
# Extract text from PDF
python3 scripts/extract.py ./papers/2401.12345.pdf --max-pages 20Search, inspect, and download ML datasets from 5 free sources — all without API keys.
| Source | Search | Info | Download | Format | Best for |
|---|---|---|---|---|---|
| HuggingFace | yes | yes | yes | Parquet | NLP, vision, audio (100K+ datasets) |
| OpenML | yes | yes | yes | ARFF/CSV | Tabular benchmarks (5K+ datasets) |
| UCI | yes | yes | yes | CSV/ZIP | Classic ML datasets (600+) |
| Papers with Code | yes | yes | links | — | Datasets linked to papers |
| Kaggle | yes | — | CLI | — | Competition & community (200K+) |
# Search for datasets
/research search sentiment analysis datasets
# Or use the datasets script directly
python3 scripts/datasets.py search "image classification" --source huggingface --limit 5
# Inspect a dataset (columns, splits, size)
python3 scripts/datasets.py info imdb --source huggingface
# Download dataset files
python3 scripts/datasets.py download imdb --source huggingface --output ./datasets --split train
# Download from OpenML (auto-converts ARFF to CSV)
python3 scripts/datasets.py download 61 --source openml --output ./datasetsMLX includes 7 specialized agents that orchestrate skills for complex workflows.
| Agent | Skills Used | When to Use |
|---|---|---|
| ml-researcher | research, prototype, media | Find papers, discover datasets, review methodology, generate podcasts, extract YouTube content, prototype algorithms |
| data-analyst | data-prep, analyze, visualize, evaluate, notebook | Answer business questions: statistics, A/B tests, dashboards, KPIs, reports, QA validation |
| data-scientist | research, data-prep, train, evaluate, notebook | Full ML pipeline: find data, explore, clean, engineer features, model, evaluate |
| ml-engineer | data-prep, train, evaluate, notebook | Focused iteration: feature engineering, hyperparameter sweeps, ablations |
| ai-engineer | research, prototype, evaluate, context-engineering, mcp-builder, notebook | Build AI apps: LLM integration, RAG pipelines, prompt engineering, agent architectures |
| ml-ops | train, serve, notebook | Deploy models: serialization, serving code, Docker, CI/CD, monitoring, model cards |
| ml-tutor | learn, research, evaluate, notebook | Interactive ML education: study concepts, quiz prep, mock interviews, system design practice |
"Find papers about attention mechanisms" → ml-researcher
"Review this paper's methodology" → ml-researcher
"Turn this paper into a podcast" → ml-researcher
"What drove revenue growth last quarter?" → data-analyst
"Create a dashboard of our KPIs" → data-analyst
"Run an A/B test analysis on this experiment" → data-analyst
"I have a CSV, build me a model" → data-scientist
"Tune the hyperparameters on this model" → ml-engineer
"Build a RAG chatbot over my docs" → ai-engineer
"Deploy this model with Docker" → ml-ops
"Teach me about transformers" → ml-tutor
"Quiz me on backpropagation" → ml-tutor
"Extract the transcript from this lecture" → ml-researcher (media skill)
Each agent follows a strict protocol:
- ml-researcher: Scope → Search → Filter → Deep analysis → Review → Dataset discovery → Media → Synthesis → Prototype
- data-analyst: Question → Explore → Clean → Analyze → Visualize → Validate → Report
- data-scientist: Find data → Understand → Explore → Clean → Engineer → Train → Iterate → Report
- ml-engineer: Baseline → Features → Model selection → Tuning → Ablation → Final eval → Document
- ai-engineer: Requirements → Model selection → Prompt engineering → RAG/embeddings → Eval → Integration → Document
- ml-ops: Model audit → Serialization → Inference API → Containerize → CI/CD → Monitoring → Model card → Reproducibility package
- ml-tutor: Assess level → Navigate courses → Teach interactively → Check understanding → Challenge with tradeoffs → Track progress
mlx/
├── .claude-plugin/
│ └── plugin.json # Plugin manifest
├── skills/
│ ├── research/ # Paper search + dataset discovery + paper review
│ │ ├── SKILL.md
│ │ ├── scripts/
│ │ │ ├── search.py # 7-source paper search
│ │ │ ├── fetch.py # Paper metadata by ID
│ │ │ ├── download.py # PDF download
│ │ │ ├── extract.py # PDF text extraction
│ │ │ ├── datasets.py # 5-source dataset search & download
│ │ │ ├── scientific_search.py # Concurrent multi-source search
│ │ │ └── analyze_document.py # Document analysis (PDF, DOCX, TXT)
│ │ └── references/
│ │ ├── sources.md # API endpoints & rate limits
│ │ └── api-reference.md # Full API documentation
│ ├── prototype/ # Paper → code conversion
│ │ ├── SKILL.md
│ │ ├── scripts/
│ │ │ ├── main.py # Extraction + generation pipeline
│ │ │ ├── analyzers/ # Paper analysis modules
│ │ │ ├── extractors/ # Content extraction modules
│ │ │ └── generators/ # Code generation modules
│ │ ├── references/
│ │ │ ├── analysis-methodology.md
│ │ │ ├── extraction-patterns.md
│ │ │ └── generation-rules.md
│ │ └── assets/examples/ # Example files
│ ├── data-prep/ # EDA + cleaning + feature engineering
│ │ ├── SKILL.md
│ │ ├── scripts/
│ │ │ ├── eda.py # Full EDA pipeline
│ │ │ ├── clean.py # Automated data cleaning
│ │ │ └── engineer_features.py # Auto feature transforms
│ │ └── references/
│ │ └── pipeline.md # EDA → Clean → Engineer pipeline
│ ├── analyze/ # Statistical & business analysis + QA validation
│ │ ├── SKILL.md
│ │ ├── scripts/
│ │ │ ├── descriptive_stats.py
│ │ │ ├── hypothesis_test.py
│ │ │ ├── ab_test.py
│ │ │ ├── cohort_analysis.py
│ │ │ ├── rfm_segmentation.py
│ │ │ ├── trend_analysis.py
│ │ │ └── validate.py # Pre-delivery QA checks
│ │ └── references/
│ │ └── analysis-methods.md
│ ├── visualize/ # Charts, dashboards, data reports
│ │ ├── SKILL.md
│ │ ├── scripts/
│ │ │ ├── chart_templates.py
│ │ │ └── format_number.py
│ │ └── references/
│ │ └── chart-selection.md
│ ├── train/ # Model training + experiment tracking
│ │ ├── SKILL.md
│ │ ├── scripts/
│ │ │ └── analyze_results.py
│ │ └── references/
│ │ └── model-selection.md
│ ├── evaluate/ # Multi-dimensional model evaluation
│ │ ├── SKILL.md
│ │ └── references/
│ │ └── metrics.md
│ ├── notebook/ # Jupyter notebook management
│ │ ├── SKILL.md
│ │ ├── scripts/
│ │ │ └── assess.py # Notebook quality assessment
│ │ └── references/
│ │ └── best-practices.md
│ ├── media/ # YouTube extraction + NotebookLM content generation
│ │ ├── SKILL.md
│ │ ├── scripts/
│ │ │ ├── extract.py # YouTube metadata, transcript, comments, download
│ │ │ ├── auth.py # NotebookLM authentication
│ │ │ ├── generate.py # Generate podcast, video, quiz, etc.
│ │ │ └── manage.py # List/manage notebooks & artifacts
│ │ └── references/
│ │ └── formats.md # Generation types + extraction modes
│ ├── serve/ # Model serving & deployment
│ │ ├── SKILL.md
│ │ └── references/
│ │ └── deployment-patterns.md
│ ├── context-engineering/ # LLM context window management
│ │ ├── SKILL.md
│ │ └── references/
│ │ └── patterns.md
│ ├── mcp-builder/ # MCP server development
│ │ ├── SKILL.md
│ │ ├── LICENSE.txt
│ │ ├── scripts/
│ │ │ ├── evaluation.py
│ │ │ ├── connections.py
│ │ │ ├── example_evaluation.xml
│ │ │ └── requirements.txt
│ │ └── references/
│ │ ├── mcp_best_practices.md
│ │ ├── python_mcp_server.md
│ │ ├── node_mcp_server.md
│ │ └── evaluation.md
│ └── learn/ # Interactive ML education
│ ├── SKILL.md
│ ├── courses/
│ │ ├── cs229/ # Stanford CS229 (17 chapters, 5 parts)
│ │ ├── applied-ml/ # UMich Applied ML (4 modules, slides, notebooks)
│ │ └── ml-engineering/ # ML Engineering (36 lessons, 9 modules)
│ └── references/ # Decision frameworks, learning path, papers
├── agents/
│ ├── ml-researcher.md # Research, media & prototyping agent
│ ├── data-analyst.md # Business analysis & visualization agent
│ ├── data-scientist.md # Full-pipeline data science agent
│ ├── ml-engineer.md # Model optimization agent
│ ├── ai-engineer.md # AI application builder agent
│ ├── ml-ops.md # Deployment & operations agent
│ └── ml-tutor.md # Interactive ML education agent
├── hooks/
│ ├── hooks.json # ML-aware pre/post tool hooks
│ └── scripts/ # Hook shell scripts
│ ├── session-context.sh
│ ├── compact-reinject.sh
│ ├── validate-ml-code.sh
│ ├── watch-training.sh
│ ├── save-experiment-state.sh
│ └── ml-error-advisor.sh
├── LICENSE # MIT License
└── .gitignore
MLX includes ML-aware hooks that run automatically:
- SessionStart: Scans project for ML state (models, datasets, results.tsv) and restores experiment context on compaction
- PreToolUse (Write/Edit): Validates training scripts for data leakage, random seed usage, and hardcoded paths
- PostToolUse (Bash): Captures training metrics from command output
- PostToolUseFailure (Bash): Suggests fixes for common ML errors (missing packages, CUDA issues)
- PreCompact: Saves experiment state before context compaction
- Zero cost: Every API and data source is free with no keys required
- Stdlib first: Core scripts use Python stdlib (
urllib,xml,json) — no pip dependencies for basic functionality - Progressive complexity: Start with a slash command, scale to autonomous agent workflows
- Experiment discipline: One variable per experiment, validation-only decisions, mandatory results tracking
- No data leakage: Hooks enforce train/eval separation and random seed hygiene
| Framework | Used in |
|---|---|
| scikit-learn | train, data-prep, analyze |
| XGBoost | train |
| LightGBM | train |
| PyTorch | train |
| pandas | data-prep, analyze |
| scipy | analyze (hypothesis testing) |
| matplotlib | visualize (static charts) |
| seaborn | visualize (statistical plots) |
| plotly | visualize (interactive dashboards) |
| polars | data-prep (alternative) |
| PySpark | data-prep (distributed) |
MLX uses a lightweight TSV-based experiment tracker — no MLflow server, no database, just a file.
id metric val_score test_score memory_mb status description
exp000 accuracy 0.8523 0.8401 4096 KEEP baseline
exp001 accuracy 0.8612 0.8498 4096 KEEP lr=0.001
exp002 accuracy 0.8590 - 4096 DISCARD lr=0.003 (overfit)
exp003 accuracy 0.8634 0.8521 4352 KEEP dropout=0.1
Status: KEEP (improved) | DISCARD (same or worse) | CRASH (error/OOM/NaN)
The ml-engineer agent runs autonomous experiment loops — 8-10 experiments/hour with automatic keep/discard decisions.
All rate limits are enforced automatically in the scripts.
| Source | Delay | Notes |
|---|---|---|
| arXiv | 3s | Max 200 results per query |
| Semantic Scholar | 4s | ~100 req/5min |
| Papers with Code | 3s | Max 50 results per page |
| JMLR | 3s per volume | Scrapes volume index pages |
| HuggingFace Datasets | none | Be reasonable |
| OpenML | 2s | Returns 412 on no results |
| UCI | 2s | 600+ datasets |
| Kaggle | 2s | Falls back to scraping if API requires auth |
To submit MLX to the official Anthropic plugin marketplace:
- Claude.ai: claude.ai/settings/plugins/submit
- Console: platform.claude.com/plugins/submit
- Fork the repository
- Add your skill to
skills/your-skill/SKILL.md - If your skill needs scripts, add them to
skills/your-skill/scripts/ - Add quick-reference docs to
skills/your-skill/references/ - Update
plugin.jsonif adding new keywords - Submit a pull request
See the Claude Code plugin docs for the expected directory layout and plugins reference for the full manifest schema.
MIT License. See LICENSE for details.
Built for Claude Code by Damion Rashford