Skip to content

OliverT1/p-IgGen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

p-IgGen

p-IgGen is a paired antibody auto-regressive langauge model. This package provides utlity functions for generating and scoring antibody sequences using p-IgGen, with model weights stored on HuggingFace.

Features

  • Generate full-length antibody sequences.
  • Generate a heavy chain given a light chain and vice versa.
  • Generate full-length antibody sequences given an initial sequence.
  • Calculate log likelihoods of sequences.
  • VH and VL chains of generated sequences can be optionally seperated using ANARCI.

Installation

We advise installing using a conda environment.

Prerequisites

  • Conda

Step-by-Step Setup

  1. Create a new conda environemnt:

    conda create -n my_env
    conda activate my_env
    conda install python=3.11 pip -y
  2. Install this repository:

    pip install git+https://github.com/OliverT1/p-IgGen.git
  3. Install optional ANARCI dependency (for --separate_chains option):

    conda install -c bioconda anarci

Usage

Command Line Interface

Generate Sequences

To generate new antibody sequences, use the piggen_generate command:

piggen_generate --output_file output_sequences.txt --n_sequences 100 

Sequences are generated by default in direction VH->VL, from C-term to N-term. Alternatively, they can be genreated in reverse from VL->VH, from N-term to C-term using the --backwards flag. This allows generation given an VH or VL sequence of any length.

Note:

  • If --backwards is used, the --initial_sequence should be provided in reverse, starting from the N-term of the VL chain.
  • If heavy_chain_file or light_chain_file this inversion is handled autmoamtically, and the VH and VL chains should be provide in the standard direction.

Options:

  • --developable: Use the developable model.
  • --heavy_chain_file FILE: File containing heavy chain sequences to generate light chains from.
  • --light_chain_file FILE: File containing light chain sequences to generate heavy chains from.
  • --initial_sequence TEXT: Initial sequence to generate from.
  • --n_sequences INTEGER: Number of sequences to generate, per input sequence if applicable.
  • --top_p FLOAT: Top-p sampling value (default: 0.95).
  • --temp FLOAT: Temperature for generation (default: 1.2).
  • --bottom_n_percent FLOAT: Bottom n percent of sequences to discard based on likelihood (default: 5).
  • --backwards: Generate sequences in reverse.
  • --output_file FILE: File to save the generated sequences (required).
  • --separate_chains: Output VH and VL sequences separately, requires ANARCI.

Using bottom_n_percent requires n_sequences to be at least 100, otherwise this option is ignored.

Calculate Log Likelihoods

To calculate the log likelihoods of sequences, use the piggen_likelihood command:

sh
python cli.py likelihood --sequence_file input_sequences.txt --output_file log_likelihoods.txt

Options:

  • --developable: Use the developable model.
  • --sequence_file FILE: The file containing sequences to calculate log likelihoods.
  • --batch_size INTEGER: Batch size for processing sequences.
  • --output_file FILE: File to save the log likelihoods.
  • --local: Load model from local path.

Examples

Generate Light Chains for Provided Heavy Chain :

piggen_generate --heavy_chain_file heavy_chains.txt --n_seqeunces 5 --top_p 0.95 --temp 1.25 --output_file generated_sequences.txt

Heavy chains should be seperate by new lines. Here, five light chains will be generated for each heavy chain.

Calculate Log Likelihoods for Sequences:

piggen_likelihood --sequence_file sequences.txt --batch_size 2 --output_file log_likelihoods.txt

About

p-IgGen: A Generative Paired Antibody Language Model

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages