DELEAT is a bioinformatic analysis pipeline for the design of large-scale genome deletions in bacterial genomes.
An article describing implementation details and a usage example has been published in BMC Bioinformatics. Please cite as: Solana, J., Garrote-Sánchez, E. & Gil, R. DELEAT: gene essentiality prediction and deletion design for bacterial genome reduction. BMC Bioinformatics 22, 444 (2021). https://doi.org/10.1186/s12859-021-04348-5
DELEAT uses a machine learning logistic regression classifier trained on a selection of organisms from the Database of Essential Genes (DEG) to assign an essentiality score for each gene in the genome, and then uses this information to determine non-essential regions.
Once deletions are designed, DELEAT provides the user with useful information in the shape of reports and a circular genome plot to visualise the potential genome reduction.
- Clone the repository:
git clone https://github.com/jime-sg/deleat.git deleat && cd deleat
. - Create a Conda env from deleat_env.txt file:
conda create --name deleat-v0.1 --file deleat_env.txt
. This will install all dependencies. - Add DELEAT to your PATH: edit ~/.bashrc file to include
alias deleat="python /your/path/to/deleat/deleat-v0.1/deleat.py"
(change/your/path/to/
to the appropiate path).
- Clone the repository:
git clone https://github.com/jime-sg/deleat.git deleat && cd deleat
. - Build the Docker image:
docker build -t deleat .
- Run the container with an interactive shell, mounting your GenBank annotation file for analysis:
docker run -it -v <genbank_file_path>:/home/genbank/<genbank_filename> deleat
First, activate the Conda env: conda activate deleat-v0.1
General usage: deleat <step name> <step arguments>
Steps (use deleat <step name> -h
to see command-line usage for each step):
- predict-essentiality
- define-deletions
- revise-deletions
- summarise
- design-all-primers / design-primers
Get predicted essentiality scores for all genes in a bacterial genome.
Usage: deleat predict-essentiality -g <GenBank file> -o <output dir> [-p1 <ori>] [-p2 <ter>] -n <n_threads>
You only need to provide the GenBank annotation file of your organism of interest as input data. Essentiality scores will be calculated for all genes annotated with a unique locus_tag identifier. If you know the ori and ter replication coordinates, you should indicate those as well -- otherwise they will be inferred by the GC skew method.
Results are saved to a modified-I GenBank file (.gbm1
), which includes essentiality scores. This file can be visualised with genome browsing tools such as Artemis, and scores can be edited manually if necessary.
Compute table of proposed deletions.
Usage: deleat define-deletions -g1 <modified-I GenBank file> -o <output dir> -l <min deletion length> -e <essentiality score threshold> [-m <non-coding margin>]
An essentiality threshold of about 0.75 is recommended, but feel free to adjust it for your specific case.
The default non-coding margin around essential genes which is to be retained is 200 bp, but you can change this setting if you know of shorter/longer cis-regulatory elements in your genome.
Results are saved to a modified-II GenBank file (.gbm2
), which includes essentiality scores and proposed deletions, and a deletion table in CSV format.
Redefine deletion list after manual curation.
Usage: deleat revise-deletions -g2 <modified-II GenBank file> -t <modified table of proposed deletions> -o <output dir>
Edit the output table of step 2 as you need to in order to manually curate the list of deletions. You can eliminate a deletion by either deleting or commenting (#) the corresponding line. Step 3 takes the first three columns (deletion name and coordinates) as input and updates output files accordingly.
Generate final reports about deletion design and genome reduction process.
Usage: deleat summarise -g3 <modified-II GenBank file> -o <output dir> [-p1 <ori>] [-p2 <ter>]
Design primers for large genome deletions by megapriming.
Usage:
-
deleat design-all-primers -g3 <modified-II GenBank file> -o <output dir> -e <restriction enzyme> [-L <min length for homologous recombination>]
-
deleat design-primers -g <genome sequence (FASTA)> -o <output dir> -n <deletion name> -d1 <del start> -d2 <del end> -e <restriction enzyme> [-L <min length for homologous recombination>]