Skip to content

An Analysis application that allows single nucleotide variations in genes to help predict "pathogenic" or not using the Evo2 LLM Model

Notifications You must be signed in to change notification settings

alien-droid/evo2-variant-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 

Repository files navigation

🧬 Gene Variant Pathogenicity Predictor

An analysis application that allows users to simulate single-nucleotide variations (SNVs) in genes and predict whether they are “likely benign” or “likely pathogenic”, using the Evo2 LLM Model.


🚀 Overview

This project consists of:

  • A modified Evo2 LLM tokenizer for efficient inference with SNVs.
  • A Next.js frontend to display gene information, allow input of nucleotide changes, and view predictions.
  • Deployment to a Modal container, bundling the Evo2 model with custom tweaks.

🧪 Functionality

  • View gene details and reference sequences.
  • Introduce a single-nucleotide variation in the sequence.
  • Predict the impact of that mutation using the Evo2 LLM.
  • Classify the result as:
    • Likely Benign
    • Likely Pathogenic

🛠️ Modifications

After cloning the original Evo2 Repository, a small change was made in the tokenizer when trying to run in the Modal Container (It is recommended to use H100 GPUs):

# File: vortex/model/tokenizer.py (under CharLevelTokenizer)

def tokenize(self, text: str):
    return list(np.frombuffer(text.encode("utf-8"), dtype=np.uint8))

The updated Evo2 directory is added to the Modal container with (as mentioned under main.py):

.add_local_dir("evo2", remote_path="/evo2", ignore=["*.venv", "*.ipynb"], copy=True)

🧱 Tech Stack

  • Next.js with ShadCN – frontend UI for sequence interaction

  • Python (with Modal) – model backend for Evo2 inference

  • Evo2 LLM – large language model for protein/nucleotide analysis

  • NumPy, Pandas, FastAPI, Matplotlib – backend utility libraries

  • Scikit-learn - for evaluation metrics

📦 Setup & Usage

  1. Clone and install dependencies (from their respective directories)

    cd backend/
    pip install -r requirements.txt
    cd frontend/
    npm install
  2. Modify the Evo2 tokenizer (already included if using this repo)

  3. Run locally (Modal / Backend):

    cd backend/
    modal init 
    modal run main.py
  4. Start the Next.js frontend:

    cd frontend/
    npm run dev

About

An Analysis application that allows single nucleotide variations in genes to help predict "pathogenic" or not using the Evo2 LLM Model

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published