DefensePredictor: A Machine Learning Model to Discover Novel Prokaryotic Immune Systems

Python package to run DefensePredictor, a machine-learning model that leverages embeddings from a protein language model, ESM2, to classify proteins as anti-phage defensive.

Installation

In a fresh conda or other virutal environment, run:

pip install defense_predictor
defense_predictor_download

The first command downloads the python package from PyPI and the second downloads the model weights. Once model weights are downloaded you do not need to run this command again.

Requirements

Requires python >= 3.10

Usage

defense_predictor can be run as python code

import defense_predictor as dfp

ncbi_feature_table = 'GCF_003333385.1_ASM333338v1_feature_table.txt'
ncbi_cds_from_genomic = 'GCF_003333385.1_ASM333338v1_cds_from_genomic.fna'
ncbi_protein_fasta = 'GCF_003333385.1_ASM333338v1_protein.faa'
output_df = dfp.run_defense_predictor(ncbi_feature_table=ncbi_feature_table,
                                      ncbi_cds_from_genomic=ncbi_cds_from_genomic,
                                      ncbi_protein_fasta=ncbi_protein_fasta)
output_df.head()

Or from the command line

defense_predictor \
     --ncbi_feature_table GCF_003333385.1_ASM333338v1_feature_table.txt \
     --ncbi_cds_from_genomic GCF_003333385.1_ASM333338v1_cds_from_genomic.fna \ 
     --ncbi_protein_fasta GCF_003333385.1_ASM333338v1_protein.faa \
     --output GCF_003333385_defense_predictor_output.csv

defense_predictor outputs the predicted probability and log-odds of defense for each input protein. We reccomend using a stringent log-odds cutoff of 7.2 to call a protein predicted defensive.

To see an example you can run the defense_predictor_example.ipynb in colab:

We reccomend running defense_predictor on a computer with a cuda-enabled GPU, to maximize computational efficiency.

Inputs

Input files can be downloaded from the ftp webpage for any gemone of interest, which is linked on its assembly page. Input files can be generated from an unannotated nuceotide assembly using NCBI's Prokaryotic Genome Annotation Pipeline.

Alternatively, defense_predictor accepts inputs generated from prokka using the arguments prokka_gff, prokka_ffn, and prokka_faa.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.github/workflows		.github/workflows
defense_predictor		defense_predictor
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
defense_predictor_example.ipynb		defense_predictor_example.ipynb
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DefensePredictor: A Machine Learning Model to Discover Novel Prokaryotic Immune Systems

Installation

Requirements

Usage

Inputs

About

Releases

Packages

Languages

License

PeterDeWeirdt/defense_predictor

Folders and files

Latest commit

History

Repository files navigation

DefensePredictor: A Machine Learning Model to Discover Novel Prokaryotic Immune Systems

Installation

Requirements

Usage

Inputs

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages