GitHub - damionrashford/mlx: Full-lifecycle ML workbench for Claude Code — research papers, discover datasets, train models, deploy to production. 14 skills, 6 agents, zero API keys.

MLX

A full-lifecycle ML workbench for Claude Code — from paper to production in one plugin.

Quick Start · Skills · Agents · Datasets · Architecture · Contributing

MLX is a Claude Code plugin that gives your agent the complete machine learning toolkit — research papers across 7 academic sources, discover and download datasets from 5 free repositories, explore and clean data, engineer features, train models, run experiments, build AI applications with LLMs and RAG, deploy models to production, generate podcasts and content from papers, manage notebooks, extract YouTube video content, and learn ML interactively with 3 university-grade courses. 7 specialized agents, 13 skills.

Quick Start

# Add the marketplace, then install the plugin
/plugin marketplace add damionrashford/mlx
/plugin install mlx@damionrashford-mlx

Or install directly:

git clone https://github.com/damionrashford/mlx.git
claude --plugin-dir ./mlx

Prerequisites

Requirement	Install
Python 3.10+	`brew install python` or `apt install python3`
pdftotext (optional, for PDF extraction)	`brew install poppler` or `apt install poppler-utils`
notebooklm (optional, for podcast generation)	`pip install notebooklm`
yt-dlp (optional, for YouTube extraction)	`pip install yt-dlp`
youtube-transcript-api (optional, for transcripts)	`pip install youtube-transcript-api`

Most features require no API keys or accounts. The media skill's content generation requires a Google account with NotebookLM access.

Recommended Permissions

Plugin settings cannot auto-configure permissions. For the smoothest experience, add these to your user or project settings:

{
  "permissions": {
    "allow": [
      "Bash(python3 *)",
      "Bash(pip install *)",
      "Bash(which *)",
      "Read(*)",
      "Glob(*)"
    ]
  }
}

Skills

MLX ships 13 skills that cover the full ML and data lifecycle. Each is invocable as a slash command or triggered automatically by natural language.

Skill	Command	What it does
research	`/research transformer attention`	Search papers from 7 sources, find/download datasets from 5 sources, structured paper review
prototype	`/prototype ./paper.pdf`	Convert a research paper into a working code project (Python, TS, Rust, Go)
data-prep	`/data-prep data/train.csv`	EDA + cleaning + feature engineering: profiling, distributions, missing values, transforms, encodings
analyze	`/analyze data/sales.csv`	Statistical tests, A/B testing, cohort analysis, segmentation, KPIs, pre-delivery QA/validation
visualize	`/visualize data/metrics.csv`	Charts, dashboards, and reports with matplotlib, seaborn, or plotly
train	`/train data/features.csv`	Train, evaluate, and iterate on models with experiment tracking
evaluate	`/evaluate results.tsv`	Multi-dimensional model evaluation, LLM-as-judge, bias detection
notebook	`/notebook analysis.ipynb`	Clean, organize, document, and convert Jupyter notebooks
serve	`/serve model.joblib`	Deploy models: inference API, Docker, CI/CD, monitoring, model cards
context-engineering	natural language	Context window management, memory systems, multi-agent patterns for LLM apps
media	`/media paper.pdf`	YouTube extraction + NotebookLM content generation (podcasts, videos, quizzes, reports, slides)
mcp-builder	natural language	Build MCP servers to connect LLMs with external services
learn	`/learn transformers`	Interactive ML education with 3 courses (CS229, Applied ML, ML Engineering), 53+ lessons, quizzes, and interview prep

Lifecycle Flow

research → prototype → data-prep → train → evaluate → serve → notebook
   │          │            │          │                    │
   │  find    │  media     │  explore │  build & iterate   │  document
   │  papers  │  & content │  & prep  │  on models         │  results
   └──────────┴────────────┴──────────┴────────────────────┘
   media ──── extract YouTube content + generate podcasts/videos
   learn ──── study ML concepts interactively

Agent coverage:
  ml-researcher ── find papers, datasets, review, media, prototype
  data-analyst ─── data-prep, analyze, visualize, report
  data-scientist ─ full pipeline: data → trained model
  ml-engineer ──── optimize: features, tuning, ablations
  ai-engineer ──── LLM apps: RAG, prompts, agents, MCP servers
  ml-ops ────────── deploy: serialize, serve, Docker, monitor
  ml-tutor ──────── learn ML: courses, quizzes, interview prep

Paper Research

Search across 7 free academic sources — no API keys, no rate-limit hassle.

Source	Search	Fetch	Download	Best for
arXiv	yes	yes	yes	ML/AI preprints
Semantic Scholar	yes	yes	—	Citations, open-access PDFs
Papers with Code	yes	yes	—	Papers linked to GitHub repos
Hugging Face	yes	via arXiv	—	Trending daily papers
JMLR	yes	yes	yes	Peer-reviewed ML journal
ACL Anthology	—	by ID	yes	NLP conference papers
OpenScholar	—	—	—	Q&A synthesis over 45M papers

# Search arXiv
/research transformer attention mechanisms

# Multi-source concurrent search
python3 scripts/scientific_search.py "BERT NLP" --max 10

# Download a paper
python3 scripts/download.py 2401.12345 --output ./papers

# Extract text from PDF
python3 scripts/extract.py ./papers/2401.12345.pdf --max-pages 20

Dataset Discovery

Search, inspect, and download ML datasets from 5 free sources — all without API keys.

Source	Search	Info	Download	Format	Best for
HuggingFace	yes	yes	yes	Parquet	NLP, vision, audio (100K+ datasets)
OpenML	yes	yes	yes	ARFF/CSV	Tabular benchmarks (5K+ datasets)
UCI	yes	yes	yes	CSV/ZIP	Classic ML datasets (600+)
Papers with Code	yes	yes	links	—	Datasets linked to papers
Kaggle	yes	—	CLI	—	Competition & community (200K+)

# Search for datasets
/research search sentiment analysis datasets

# Or use the datasets script directly
python3 scripts/datasets.py search "image classification" --source huggingface --limit 5

# Inspect a dataset (columns, splits, size)
python3 scripts/datasets.py info imdb --source huggingface

# Download dataset files
python3 scripts/datasets.py download imdb --source huggingface --output ./datasets --split train

# Download from OpenML (auto-converts ARFF to CSV)
python3 scripts/datasets.py download 61 --source openml --output ./datasets

Agents

MLX includes 7 specialized agents that orchestrate skills for complex workflows.

Agent	Skills Used	When to Use
ml-researcher	research, prototype, media	Find papers, discover datasets, review methodology, generate podcasts, extract YouTube content, prototype algorithms
data-analyst	data-prep, analyze, visualize, evaluate, notebook	Answer business questions: statistics, A/B tests, dashboards, KPIs, reports, QA validation
data-scientist	research, data-prep, train, evaluate, notebook	Full ML pipeline: find data, explore, clean, engineer features, model, evaluate
ml-engineer	data-prep, train, evaluate, notebook	Focused iteration: feature engineering, hyperparameter sweeps, ablations
ai-engineer	research, prototype, evaluate, context-engineering, mcp-builder, notebook	Build AI apps: LLM integration, RAG pipelines, prompt engineering, agent architectures
ml-ops	train, serve, notebook	Deploy models: serialization, serving code, Docker, CI/CD, monitoring, model cards
ml-tutor	learn, research, evaluate, notebook	Interactive ML education: study concepts, quiz prep, mock interviews, system design practice

Agent Routing

"Find papers about attention mechanisms"      → ml-researcher
"Review this paper's methodology"             → ml-researcher
"Turn this paper into a podcast"               → ml-researcher
"What drove revenue growth last quarter?"      → data-analyst
"Create a dashboard of our KPIs"              → data-analyst
"Run an A/B test analysis on this experiment"  → data-analyst
"I have a CSV, build me a model"              → data-scientist
"Tune the hyperparameters on this model"       → ml-engineer
"Build a RAG chatbot over my docs"             → ai-engineer
"Deploy this model with Docker"                → ml-ops
"Teach me about transformers"                  → ml-tutor
"Quiz me on backpropagation"                   → ml-tutor
"Extract the transcript from this lecture"     → ml-researcher (media skill)

Each agent follows a strict protocol:

ml-researcher: Scope → Search → Filter → Deep analysis → Review → Dataset discovery → Media → Synthesis → Prototype
data-analyst: Question → Explore → Clean → Analyze → Visualize → Validate → Report
data-scientist: Find data → Understand → Explore → Clean → Engineer → Train → Iterate → Report
ml-engineer: Baseline → Features → Model selection → Tuning → Ablation → Final eval → Document
ai-engineer: Requirements → Model selection → Prompt engineering → RAG/embeddings → Eval → Integration → Document
ml-ops: Model audit → Serialization → Inference API → Containerize → CI/CD → Monitoring → Model card → Reproducibility package
ml-tutor: Assess level → Navigate courses → Teach interactively → Check understanding → Challenge with tradeoffs → Track progress

Architecture

mlx/
├── .claude-plugin/
│   └── plugin.json              # Plugin manifest
├── skills/
│   ├── research/                # Paper search + dataset discovery + paper review
│   │   ├── SKILL.md
│   │   ├── scripts/
│   │   │   ├── search.py        # 7-source paper search
│   │   │   ├── fetch.py         # Paper metadata by ID
│   │   │   ├── download.py      # PDF download
│   │   │   ├── extract.py       # PDF text extraction
│   │   │   ├── datasets.py      # 5-source dataset search & download
│   │   │   ├── scientific_search.py  # Concurrent multi-source search
│   │   │   └── analyze_document.py   # Document analysis (PDF, DOCX, TXT)
│   │   └── references/
│   │       ├── sources.md       # API endpoints & rate limits
│   │       └── api-reference.md # Full API documentation
│   ├── prototype/               # Paper → code conversion
│   │   ├── SKILL.md
│   │   ├── scripts/
│   │   │   ├── main.py          # Extraction + generation pipeline
│   │   │   ├── analyzers/       # Paper analysis modules
│   │   │   ├── extractors/      # Content extraction modules
│   │   │   └── generators/      # Code generation modules
│   │   ├── references/
│   │   │   ├── analysis-methodology.md
│   │   │   ├── extraction-patterns.md
│   │   │   └── generation-rules.md
│   │   └── assets/examples/     # Example files
│   ├── data-prep/               # EDA + cleaning + feature engineering
│   │   ├── SKILL.md
│   │   ├── scripts/
│   │   │   ├── eda.py           # Full EDA pipeline
│   │   │   ├── clean.py         # Automated data cleaning
│   │   │   └── engineer_features.py  # Auto feature transforms
│   │   └── references/
│   │       └── pipeline.md      # EDA → Clean → Engineer pipeline
│   ├── analyze/                 # Statistical & business analysis + QA validation
│   │   ├── SKILL.md
│   │   ├── scripts/
│   │   │   ├── descriptive_stats.py
│   │   │   ├── hypothesis_test.py
│   │   │   ├── ab_test.py
│   │   │   ├── cohort_analysis.py
│   │   │   ├── rfm_segmentation.py
│   │   │   ├── trend_analysis.py
│   │   │   └── validate.py      # Pre-delivery QA checks
│   │   └── references/
│   │       └── analysis-methods.md
│   ├── visualize/               # Charts, dashboards, data reports
│   │   ├── SKILL.md
│   │   ├── scripts/
│   │   │   ├── chart_templates.py
│   │   │   └── format_number.py
│   │   └── references/
│   │       └── chart-selection.md
│   ├── train/                   # Model training + experiment tracking
│   │   ├── SKILL.md
│   │   ├── scripts/
│   │   │   └── analyze_results.py
│   │   └── references/
│   │       └── model-selection.md
│   ├── evaluate/                # Multi-dimensional model evaluation
│   │   ├── SKILL.md
│   │   └── references/
│   │       └── metrics.md
│   ├── notebook/                # Jupyter notebook management
│   │   ├── SKILL.md
│   │   ├── scripts/
│   │   │   └── assess.py        # Notebook quality assessment
│   │   └── references/
│   │       └── best-practices.md
│   ├── media/                   # YouTube extraction + NotebookLM content generation
│   │   ├── SKILL.md
│   │   ├── scripts/
│   │   │   ├── extract.py       # YouTube metadata, transcript, comments, download
│   │   │   ├── auth.py          # NotebookLM authentication
│   │   │   ├── generate.py      # Generate podcast, video, quiz, etc.
│   │   │   └── manage.py        # List/manage notebooks & artifacts
│   │   └── references/
│   │       └── formats.md       # Generation types + extraction modes
│   ├── serve/                   # Model serving & deployment
│   │   ├── SKILL.md
│   │   └── references/
│   │       └── deployment-patterns.md
│   ├── context-engineering/     # LLM context window management
│   │   ├── SKILL.md
│   │   └── references/
│   │       └── patterns.md
│   ├── mcp-builder/             # MCP server development
│   │   ├── SKILL.md
│   │   ├── LICENSE.txt
│   │   ├── scripts/
│   │   │   ├── evaluation.py
│   │   │   ├── connections.py
│   │   │   ├── example_evaluation.xml
│   │   │   └── requirements.txt
│   │   └── references/
│   │       ├── mcp_best_practices.md
│   │       ├── python_mcp_server.md
│   │       ├── node_mcp_server.md
│   │       └── evaluation.md
│   └── learn/                   # Interactive ML education
│       ├── SKILL.md
│       ├── courses/
│       │   ├── cs229/           # Stanford CS229 (17 chapters, 5 parts)
│       │   ├── applied-ml/      # UMich Applied ML (4 modules, slides, notebooks)
│       │   └── ml-engineering/  # ML Engineering (36 lessons, 9 modules)
│       └── references/          # Decision frameworks, learning path, papers
├── agents/
│   ├── ml-researcher.md         # Research, media & prototyping agent
│   ├── data-analyst.md          # Business analysis & visualization agent
│   ├── data-scientist.md        # Full-pipeline data science agent
│   ├── ml-engineer.md           # Model optimization agent
│   ├── ai-engineer.md           # AI application builder agent
│   ├── ml-ops.md                # Deployment & operations agent
│   └── ml-tutor.md              # Interactive ML education agent
├── hooks/
│   ├── hooks.json               # ML-aware pre/post tool hooks
│   └── scripts/                 # Hook shell scripts
│       ├── session-context.sh
│       ├── compact-reinject.sh
│       ├── validate-ml-code.sh
│       ├── watch-training.sh
│       ├── save-experiment-state.sh
│       └── ml-error-advisor.sh
├── LICENSE                      # MIT License
└── .gitignore

Hooks

MLX includes ML-aware hooks that run automatically:

SessionStart: Scans project for ML state (models, datasets, results.tsv) and restores experiment context on compaction
PreToolUse (Write/Edit): Validates training scripts for data leakage, random seed usage, and hardcoded paths
PostToolUse (Bash): Captures training metrics from command output
PostToolUseFailure (Bash): Suggests fixes for common ML errors (missing packages, CUDA issues)
PreCompact: Saves experiment state before context compaction

Design Principles

Zero cost: Every API and data source is free with no keys required
Stdlib first: Core scripts use Python stdlib (urllib, xml, json) — no pip dependencies for basic functionality
Progressive complexity: Start with a slash command, scale to autonomous agent workflows
Experiment discipline: One variable per experiment, validation-only decisions, mandatory results tracking
No data leakage: Hooks enforce train/eval separation and random seed hygiene

Supported Frameworks

Framework	Used in
scikit-learn	train, data-prep, analyze
XGBoost	train
LightGBM	train
PyTorch	train
pandas	data-prep, analyze
scipy	analyze (hypothesis testing)
matplotlib	visualize (static charts)
seaborn	visualize (statistical plots)
plotly	visualize (interactive dashboards)
polars	data-prep (alternative)
PySpark	data-prep (distributed)

Experiment Tracking

MLX uses a lightweight TSV-based experiment tracker — no MLflow server, no database, just a file.

id        metric    val_score  test_score  memory_mb  status   description
exp000    accuracy  0.8523     0.8401      4096       KEEP     baseline
exp001    accuracy  0.8612     0.8498      4096       KEEP     lr=0.001
exp002    accuracy  0.8590     -           4096       DISCARD  lr=0.003 (overfit)
exp003    accuracy  0.8634     0.8521      4352       KEEP     dropout=0.1

Status: KEEP (improved) | DISCARD (same or worse) | CRASH (error/OOM/NaN)

The ml-engineer agent runs autonomous experiment loops — 8-10 experiments/hour with automatic keep/discard decisions.

Rate Limits

All rate limits are enforced automatically in the scripts.

Source	Delay	Notes
arXiv	3s	Max 200 results per query
Semantic Scholar	4s	~100 req/5min
Papers with Code	3s	Max 50 results per page
JMLR	3s per volume	Scrapes volume index pages
HuggingFace Datasets	none	Be reasonable
OpenML	2s	Returns 412 on no results
UCI	2s	600+ datasets
Kaggle	2s	Falls back to scraping if API requires auth

Submit to Official Marketplace

To submit MLX to the official Anthropic plugin marketplace:

Claude.ai: claude.ai/settings/plugins/submit
Console: platform.claude.com/plugins/submit

Contributing

Fork the repository
Add your skill to skills/your-skill/SKILL.md
If your skill needs scripts, add them to skills/your-skill/scripts/
Add quick-reference docs to skills/your-skill/references/
Update plugin.json if adding new keywords
Submit a pull request

See the Claude Code plugin docs for the expected directory layout and plugins reference for the full manifest schema.

License

MIT License. See LICENSE for details.

Built for Claude Code by Damion Rashford

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Quick Start

Prerequisites

Recommended Permissions

Skills

Lifecycle Flow

Paper Research

Dataset Discovery

Agents

Agent Routing

Architecture

Hooks

Design Principles

Supported Frameworks

Experiment Tracking

Rate Limits

Submit to Official Marketplace

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.claude-plugin		.claude-plugin
agents		agents
hooks		hooks
skills		skills
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Quick Start

Prerequisites

Recommended Permissions

Skills

Lifecycle Flow

Paper Research

Dataset Discovery

Agents

Agent Routing

Architecture

Hooks

Design Principles

Supported Frameworks

Experiment Tracking

Rate Limits

Submit to Official Marketplace

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages