An analysis application that allows users to simulate single-nucleotide variations (SNVs) in genes and predict whether they are “likely benign” or “likely pathogenic”, using the Evo2 LLM Model.
This project consists of:
- A modified Evo2 LLM tokenizer for efficient inference with SNVs.
- A Next.js frontend to display gene information, allow input of nucleotide changes, and view predictions.
- Deployment to a Modal container, bundling the Evo2 model with custom tweaks.
- View gene details and reference sequences.
- Introduce a single-nucleotide variation in the sequence.
- Predict the impact of that mutation using the Evo2 LLM.
- Classify the result as:
Likely Benign
Likely Pathogenic
After cloning the original Evo2 Repository, a small change was made in the tokenizer when trying to run in the Modal Container (It is recommended to use H100 GPUs
):
# File: vortex/model/tokenizer.py (under CharLevelTokenizer)
def tokenize(self, text: str):
return list(np.frombuffer(text.encode("utf-8"), dtype=np.uint8))
The updated Evo2 directory is added to the Modal container with (as mentioned under main.py):
.add_local_dir("evo2", remote_path="/evo2", ignore=["*.venv", "*.ipynb"], copy=True)
-
Next.js with
ShadCN
– frontend UI for sequence interaction -
Python (with Modal) – model backend for Evo2 inference
-
Evo2 LLM – large language model for protein/nucleotide analysis
-
NumPy, Pandas, FastAPI, Matplotlib – backend utility libraries
-
Scikit-learn - for evaluation metrics
-
Clone and install dependencies (from their respective directories)
cd backend/ pip install -r requirements.txt
cd frontend/ npm install
-
Modify the Evo2 tokenizer (already included if using this repo)
-
Run locally (Modal / Backend):
cd backend/ modal init modal run main.py
-
Start the Next.js frontend:
cd frontend/ npm run dev