Skip to content

damionrashford/mlx

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MLX

A full-lifecycle ML workbench for Claude Code — from paper to production in one plugin.

Quick Start · Skills · Agents · Datasets · Architecture · Contributing

License: MIT Version Python 3.10+ Claude Code Plugin No API Keys


MLX is a Claude Code plugin that gives your agent the complete machine learning toolkit — research papers across 7 academic sources, discover and download datasets from 5 free repositories, explore and clean data, engineer features, train models, run experiments, build AI applications with LLMs and RAG, deploy models to production, generate podcasts and content from papers, manage notebooks, extract YouTube video content, and learn ML interactively with 3 university-grade courses. 7 specialized agents, 13 skills.

Quick Start

# Add the marketplace, then install the plugin
/plugin marketplace add damionrashford/mlx
/plugin install mlx@damionrashford-mlx

Or install directly:

git clone https://github.com/damionrashford/mlx.git
claude --plugin-dir ./mlx

Prerequisites

Requirement Install
Python 3.10+ brew install python or apt install python3
pdftotext (optional, for PDF extraction) brew install poppler or apt install poppler-utils
notebooklm (optional, for podcast generation) pip install notebooklm
yt-dlp (optional, for YouTube extraction) pip install yt-dlp
youtube-transcript-api (optional, for transcripts) pip install youtube-transcript-api

Most features require no API keys or accounts. The media skill's content generation requires a Google account with NotebookLM access.

Recommended Permissions

Plugin settings cannot auto-configure permissions. For the smoothest experience, add these to your user or project settings:

{
  "permissions": {
    "allow": [
      "Bash(python3 *)",
      "Bash(pip install *)",
      "Bash(which *)",
      "Read(*)",
      "Glob(*)"
    ]
  }
}

Skills

MLX ships 13 skills that cover the full ML and data lifecycle. Each is invocable as a slash command or triggered automatically by natural language.

Skill Command What it does
research /research transformer attention Search papers from 7 sources, find/download datasets from 5 sources, structured paper review
prototype /prototype ./paper.pdf Convert a research paper into a working code project (Python, TS, Rust, Go)
data-prep /data-prep data/train.csv EDA + cleaning + feature engineering: profiling, distributions, missing values, transforms, encodings
analyze /analyze data/sales.csv Statistical tests, A/B testing, cohort analysis, segmentation, KPIs, pre-delivery QA/validation
visualize /visualize data/metrics.csv Charts, dashboards, and reports with matplotlib, seaborn, or plotly
train /train data/features.csv Train, evaluate, and iterate on models with experiment tracking
evaluate /evaluate results.tsv Multi-dimensional model evaluation, LLM-as-judge, bias detection
notebook /notebook analysis.ipynb Clean, organize, document, and convert Jupyter notebooks
serve /serve model.joblib Deploy models: inference API, Docker, CI/CD, monitoring, model cards
context-engineering natural language Context window management, memory systems, multi-agent patterns for LLM apps
media /media paper.pdf YouTube extraction + NotebookLM content generation (podcasts, videos, quizzes, reports, slides)
mcp-builder natural language Build MCP servers to connect LLMs with external services
learn /learn transformers Interactive ML education with 3 courses (CS229, Applied ML, ML Engineering), 53+ lessons, quizzes, and interview prep

Lifecycle Flow

research → prototype → data-prep → train → evaluate → serve → notebook
   │          │            │          │                    │
   │  find    │  media     │  explore │  build & iterate   │  document
   │  papers  │  & content │  & prep  │  on models         │  results
   └──────────┴────────────┴──────────┴────────────────────┘
   media ──── extract YouTube content + generate podcasts/videos
   learn ──── study ML concepts interactively

Agent coverage:
  ml-researcher ── find papers, datasets, review, media, prototype
  data-analyst ─── data-prep, analyze, visualize, report
  data-scientist ─ full pipeline: data → trained model
  ml-engineer ──── optimize: features, tuning, ablations
  ai-engineer ──── LLM apps: RAG, prompts, agents, MCP servers
  ml-ops ────────── deploy: serialize, serve, Docker, monitor
  ml-tutor ──────── learn ML: courses, quizzes, interview prep

Paper Research

Search across 7 free academic sources — no API keys, no rate-limit hassle.

Source Search Fetch Download Best for
arXiv yes yes yes ML/AI preprints
Semantic Scholar yes yes Citations, open-access PDFs
Papers with Code yes yes Papers linked to GitHub repos
Hugging Face yes via arXiv Trending daily papers
JMLR yes yes yes Peer-reviewed ML journal
ACL Anthology by ID yes NLP conference papers
OpenScholar Q&A synthesis over 45M papers
# Search arXiv
/research transformer attention mechanisms

# Multi-source concurrent search
python3 scripts/scientific_search.py "BERT NLP" --max 10

# Download a paper
python3 scripts/download.py 2401.12345 --output ./papers

# Extract text from PDF
python3 scripts/extract.py ./papers/2401.12345.pdf --max-pages 20

Dataset Discovery

Search, inspect, and download ML datasets from 5 free sources — all without API keys.

Source Search Info Download Format Best for
HuggingFace yes yes yes Parquet NLP, vision, audio (100K+ datasets)
OpenML yes yes yes ARFF/CSV Tabular benchmarks (5K+ datasets)
UCI yes yes yes CSV/ZIP Classic ML datasets (600+)
Papers with Code yes yes links Datasets linked to papers
Kaggle yes CLI Competition & community (200K+)
# Search for datasets
/research search sentiment analysis datasets

# Or use the datasets script directly
python3 scripts/datasets.py search "image classification" --source huggingface --limit 5

# Inspect a dataset (columns, splits, size)
python3 scripts/datasets.py info imdb --source huggingface

# Download dataset files
python3 scripts/datasets.py download imdb --source huggingface --output ./datasets --split train

# Download from OpenML (auto-converts ARFF to CSV)
python3 scripts/datasets.py download 61 --source openml --output ./datasets

Agents

MLX includes 7 specialized agents that orchestrate skills for complex workflows.

Agent Skills Used When to Use
ml-researcher research, prototype, media Find papers, discover datasets, review methodology, generate podcasts, extract YouTube content, prototype algorithms
data-analyst data-prep, analyze, visualize, evaluate, notebook Answer business questions: statistics, A/B tests, dashboards, KPIs, reports, QA validation
data-scientist research, data-prep, train, evaluate, notebook Full ML pipeline: find data, explore, clean, engineer features, model, evaluate
ml-engineer data-prep, train, evaluate, notebook Focused iteration: feature engineering, hyperparameter sweeps, ablations
ai-engineer research, prototype, evaluate, context-engineering, mcp-builder, notebook Build AI apps: LLM integration, RAG pipelines, prompt engineering, agent architectures
ml-ops train, serve, notebook Deploy models: serialization, serving code, Docker, CI/CD, monitoring, model cards
ml-tutor learn, research, evaluate, notebook Interactive ML education: study concepts, quiz prep, mock interviews, system design practice

Agent Routing

"Find papers about attention mechanisms"      → ml-researcher
"Review this paper's methodology"             → ml-researcher
"Turn this paper into a podcast"               → ml-researcher
"What drove revenue growth last quarter?"      → data-analyst
"Create a dashboard of our KPIs"              → data-analyst
"Run an A/B test analysis on this experiment"  → data-analyst
"I have a CSV, build me a model"              → data-scientist
"Tune the hyperparameters on this model"       → ml-engineer
"Build a RAG chatbot over my docs"             → ai-engineer
"Deploy this model with Docker"                → ml-ops
"Teach me about transformers"                  → ml-tutor
"Quiz me on backpropagation"                   → ml-tutor
"Extract the transcript from this lecture"     → ml-researcher (media skill)

Each agent follows a strict protocol:

  • ml-researcher: Scope → Search → Filter → Deep analysis → Review → Dataset discovery → Media → Synthesis → Prototype
  • data-analyst: Question → Explore → Clean → Analyze → Visualize → Validate → Report
  • data-scientist: Find data → Understand → Explore → Clean → Engineer → Train → Iterate → Report
  • ml-engineer: Baseline → Features → Model selection → Tuning → Ablation → Final eval → Document
  • ai-engineer: Requirements → Model selection → Prompt engineering → RAG/embeddings → Eval → Integration → Document
  • ml-ops: Model audit → Serialization → Inference API → Containerize → CI/CD → Monitoring → Model card → Reproducibility package
  • ml-tutor: Assess level → Navigate courses → Teach interactively → Check understanding → Challenge with tradeoffs → Track progress

Architecture

mlx/
├── .claude-plugin/
│   └── plugin.json              # Plugin manifest
├── skills/
│   ├── research/                # Paper search + dataset discovery + paper review
│   │   ├── SKILL.md
│   │   ├── scripts/
│   │   │   ├── search.py        # 7-source paper search
│   │   │   ├── fetch.py         # Paper metadata by ID
│   │   │   ├── download.py      # PDF download
│   │   │   ├── extract.py       # PDF text extraction
│   │   │   ├── datasets.py      # 5-source dataset search & download
│   │   │   ├── scientific_search.py  # Concurrent multi-source search
│   │   │   └── analyze_document.py   # Document analysis (PDF, DOCX, TXT)
│   │   └── references/
│   │       ├── sources.md       # API endpoints & rate limits
│   │       └── api-reference.md # Full API documentation
│   ├── prototype/               # Paper → code conversion
│   │   ├── SKILL.md
│   │   ├── scripts/
│   │   │   ├── main.py          # Extraction + generation pipeline
│   │   │   ├── analyzers/       # Paper analysis modules
│   │   │   ├── extractors/      # Content extraction modules
│   │   │   └── generators/      # Code generation modules
│   │   ├── references/
│   │   │   ├── analysis-methodology.md
│   │   │   ├── extraction-patterns.md
│   │   │   └── generation-rules.md
│   │   └── assets/examples/     # Example files
│   ├── data-prep/               # EDA + cleaning + feature engineering
│   │   ├── SKILL.md
│   │   ├── scripts/
│   │   │   ├── eda.py           # Full EDA pipeline
│   │   │   ├── clean.py         # Automated data cleaning
│   │   │   └── engineer_features.py  # Auto feature transforms
│   │   └── references/
│   │       └── pipeline.md      # EDA → Clean → Engineer pipeline
│   ├── analyze/                 # Statistical & business analysis + QA validation
│   │   ├── SKILL.md
│   │   ├── scripts/
│   │   │   ├── descriptive_stats.py
│   │   │   ├── hypothesis_test.py
│   │   │   ├── ab_test.py
│   │   │   ├── cohort_analysis.py
│   │   │   ├── rfm_segmentation.py
│   │   │   ├── trend_analysis.py
│   │   │   └── validate.py      # Pre-delivery QA checks
│   │   └── references/
│   │       └── analysis-methods.md
│   ├── visualize/               # Charts, dashboards, data reports
│   │   ├── SKILL.md
│   │   ├── scripts/
│   │   │   ├── chart_templates.py
│   │   │   └── format_number.py
│   │   └── references/
│   │       └── chart-selection.md
│   ├── train/                   # Model training + experiment tracking
│   │   ├── SKILL.md
│   │   ├── scripts/
│   │   │   └── analyze_results.py
│   │   └── references/
│   │       └── model-selection.md
│   ├── evaluate/                # Multi-dimensional model evaluation
│   │   ├── SKILL.md
│   │   └── references/
│   │       └── metrics.md
│   ├── notebook/                # Jupyter notebook management
│   │   ├── SKILL.md
│   │   ├── scripts/
│   │   │   └── assess.py        # Notebook quality assessment
│   │   └── references/
│   │       └── best-practices.md
│   ├── media/                   # YouTube extraction + NotebookLM content generation
│   │   ├── SKILL.md
│   │   ├── scripts/
│   │   │   ├── extract.py       # YouTube metadata, transcript, comments, download
│   │   │   ├── auth.py          # NotebookLM authentication
│   │   │   ├── generate.py      # Generate podcast, video, quiz, etc.
│   │   │   └── manage.py        # List/manage notebooks & artifacts
│   │   └── references/
│   │       └── formats.md       # Generation types + extraction modes
│   ├── serve/                   # Model serving & deployment
│   │   ├── SKILL.md
│   │   └── references/
│   │       └── deployment-patterns.md
│   ├── context-engineering/     # LLM context window management
│   │   ├── SKILL.md
│   │   └── references/
│   │       └── patterns.md
│   ├── mcp-builder/             # MCP server development
│   │   ├── SKILL.md
│   │   ├── LICENSE.txt
│   │   ├── scripts/
│   │   │   ├── evaluation.py
│   │   │   ├── connections.py
│   │   │   ├── example_evaluation.xml
│   │   │   └── requirements.txt
│   │   └── references/
│   │       ├── mcp_best_practices.md
│   │       ├── python_mcp_server.md
│   │       ├── node_mcp_server.md
│   │       └── evaluation.md
│   └── learn/                   # Interactive ML education
│       ├── SKILL.md
│       ├── courses/
│       │   ├── cs229/           # Stanford CS229 (17 chapters, 5 parts)
│       │   ├── applied-ml/      # UMich Applied ML (4 modules, slides, notebooks)
│       │   └── ml-engineering/  # ML Engineering (36 lessons, 9 modules)
│       └── references/          # Decision frameworks, learning path, papers
├── agents/
│   ├── ml-researcher.md         # Research, media & prototyping agent
│   ├── data-analyst.md          # Business analysis & visualization agent
│   ├── data-scientist.md        # Full-pipeline data science agent
│   ├── ml-engineer.md           # Model optimization agent
│   ├── ai-engineer.md           # AI application builder agent
│   ├── ml-ops.md                # Deployment & operations agent
│   └── ml-tutor.md              # Interactive ML education agent
├── hooks/
│   ├── hooks.json               # ML-aware pre/post tool hooks
│   └── scripts/                 # Hook shell scripts
│       ├── session-context.sh
│       ├── compact-reinject.sh
│       ├── validate-ml-code.sh
│       ├── watch-training.sh
│       ├── save-experiment-state.sh
│       └── ml-error-advisor.sh
├── LICENSE                      # MIT License
└── .gitignore

Hooks

MLX includes ML-aware hooks that run automatically:

  • SessionStart: Scans project for ML state (models, datasets, results.tsv) and restores experiment context on compaction
  • PreToolUse (Write/Edit): Validates training scripts for data leakage, random seed usage, and hardcoded paths
  • PostToolUse (Bash): Captures training metrics from command output
  • PostToolUseFailure (Bash): Suggests fixes for common ML errors (missing packages, CUDA issues)
  • PreCompact: Saves experiment state before context compaction

Design Principles

  • Zero cost: Every API and data source is free with no keys required
  • Stdlib first: Core scripts use Python stdlib (urllib, xml, json) — no pip dependencies for basic functionality
  • Progressive complexity: Start with a slash command, scale to autonomous agent workflows
  • Experiment discipline: One variable per experiment, validation-only decisions, mandatory results tracking
  • No data leakage: Hooks enforce train/eval separation and random seed hygiene

Supported Frameworks

Framework Used in
scikit-learn train, data-prep, analyze
XGBoost train
LightGBM train
PyTorch train
pandas data-prep, analyze
scipy analyze (hypothesis testing)
matplotlib visualize (static charts)
seaborn visualize (statistical plots)
plotly visualize (interactive dashboards)
polars data-prep (alternative)
PySpark data-prep (distributed)

Experiment Tracking

MLX uses a lightweight TSV-based experiment tracker — no MLflow server, no database, just a file.

id        metric    val_score  test_score  memory_mb  status   description
exp000    accuracy  0.8523     0.8401      4096       KEEP     baseline
exp001    accuracy  0.8612     0.8498      4096       KEEP     lr=0.001
exp002    accuracy  0.8590     -           4096       DISCARD  lr=0.003 (overfit)
exp003    accuracy  0.8634     0.8521      4352       KEEP     dropout=0.1

Status: KEEP (improved) | DISCARD (same or worse) | CRASH (error/OOM/NaN)

The ml-engineer agent runs autonomous experiment loops — 8-10 experiments/hour with automatic keep/discard decisions.

Rate Limits

All rate limits are enforced automatically in the scripts.

Source Delay Notes
arXiv 3s Max 200 results per query
Semantic Scholar 4s ~100 req/5min
Papers with Code 3s Max 50 results per page
JMLR 3s per volume Scrapes volume index pages
HuggingFace Datasets none Be reasonable
OpenML 2s Returns 412 on no results
UCI 2s 600+ datasets
Kaggle 2s Falls back to scraping if API requires auth

Submit to Official Marketplace

To submit MLX to the official Anthropic plugin marketplace:

Contributing

  1. Fork the repository
  2. Add your skill to skills/your-skill/SKILL.md
  3. If your skill needs scripts, add them to skills/your-skill/scripts/
  4. Add quick-reference docs to skills/your-skill/references/
  5. Update plugin.json if adding new keywords
  6. Submit a pull request

See the Claude Code plugin docs for the expected directory layout and plugins reference for the full manifest schema.

License

MIT License. See LICENSE for details.


Built for Claude Code by Damion Rashford

About

Full-lifecycle ML workbench for Claude Code — research papers, discover datasets, train models, deploy to production. 14 skills, 6 agents, zero API keys.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors