Protein fitness modeling using ProtTrans language models via Docker
An MCP (Model Context Protocol) server for protein fitness prediction with 4 core tools:
- Extract ProtTrans embeddings from protein sequences
- Calculate ProtBERT log-likelihood scores for mutations
- Train regression fitness models with 5-fold cross-validation
- Predict fitness for new protein sequences using trained models
The fastest way to get started. A pre-built Docker image is automatically published to GitHub Container Registry on every release.
# Pull the latest image
docker pull ghcr.io/macromnex/prottrans_mcp:latest
# Register with Claude Code (runs as current user to avoid permission issues)
claude mcp add prottrans -- docker run -i --rm --user `id -u`:`id -g` --gpus all --ipc=host -v `pwd`:`pwd` ghcr.io/macromnex/prottrans_mcp:latestNote: Run from your project directory. `pwd` expands to the current working directory.
Requirements:
- Docker with GPU support (
nvidia-dockeror Docker with NVIDIA runtime) - Claude Code installed
That's it! The ProtTrans MCP server is now available in Claude Code.
Build the image yourself and install it into Claude Code. Useful for customization or offline environments.
# Clone the repository
git clone https://github.com/MacromNex/prottrans_mcp.git
cd prottrans_mcp
# Build the Docker image
docker build -t prottrans_mcp:latest .
# Register with Claude Code (runs as current user to avoid permission issues)
claude mcp add prottrans -- docker run -i --rm --user `id -u`:`id -g` --gpus all --ipc=host -v `pwd`:`pwd` prottrans_mcp:latestNote: Run from your project directory. `pwd` expands to the current working directory.
Requirements:
- Docker with GPU support
- Claude Code installed
- Git (to clone the repository)
About the Docker Flags:
-i— Interactive mode for Claude Code--rm— Automatically remove container after exit--user `id -u`:`id -g`— Runs the container as your current user, so output files are owned by you (not root)--gpus all— Grants access to all available GPUs--ipc=host— Uses host IPC namespace for PyTorch shared memory-v— Mounts your project directory so the container can access your data
After adding the MCP server, you can verify it's working:
# List registered MCP servers
claude mcp list
# You should see 'prottrans' in the outputIn Claude Code, you can now use all 4 ProtTrans tools:
prottrans_extract_embeddingsprottrans_calculate_llhprottrans_train_fitness_modelprottrans_predict_fitness
- Detailed documentation: See detail.md for comprehensive guides on:
- Available MCP tools and parameters
- Local Python environment setup (alternative to Docker)
- Example workflows and use cases
- Data format requirements
- Troubleshooting
Once registered, you can use the ProtTrans tools directly in Claude Code. Here are some common workflows:
Can you help train a ProtTrans model for data at /path/to/example/ and save it to /path/to/results/prot-t5_fitness using the prottrans MCP server with ProtT5-XL model. Please create the embeddings first if not ready.
Can you help calculate ProtBERT likelihood for data at /path/to/data.csv with wild-type sequence at /path/to/wt.fasta using the prottrans MCP server?
I have protein variant data at /path/to/variants.csv with log_fitness column. Please:
1. Extract ProtT5-XL embeddings using prottrans_extract_embeddings
2. Train an SVR fitness model using prottrans_train_fitness_model with 5-fold CV
3. Report the mean Spearman correlation performance
Docker not found?
docker --version # Install Docker if missingGPU not accessible?
- Ensure NVIDIA Docker runtime is installed
- Check with:
docker run --gpus all ubuntu nvidia-smi
Claude Code not found?
# Install Claude Code
npm install -g @anthropic-ai/claude-codeOut of GPU memory?
- ProtT5-XL requires 8-16 GB VRAM
- Use
device: "cpu"for CPU inference (slower) - Use ProtAlbert for lower memory requirements
MIT — Based on ProtTrans by Elnaggar et al.