Skip to content

This script is used to convert the output of ABRICATE tabulated output into a matrix samples with the genes as columns and the presence/absence of the gene as the values.

License

Notifications You must be signed in to change notification settings

ikergarcia1996/Gene2Tab

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Gene2Tab

Description

This script is used to convert the output of ABRICATE tabulated output into a matrix samples with the genes as columns and the presence/absence of the gene as the values.

The script can use as input a directory containing multiple .tab files or a single .tab file. The output will be a .csv file. Each file should contain the #File Sequence and GENE columns, the other ones are optional.

By default, the script will only consider genes with a coverage and identity of 90% or more. This can be changed with the --min_coverage and --min_identity flags.

Example:

File1.tab

#FILE	SEQUENCE	START	END	STRAND	GENE	COVERAGE	COVERAGE_MAP	GAPS	%COVERAGE	%IDENTITY	DATABASE	ACCESSION	PRODUCT	RESISTANCE
Isolate1	S1-length_513	53228	54445	+	ceoA	1-1218/1218	========/======	2/2	99.92	99.67	card	U97042:0-1218	ceoA is a periplasmic linker subunit of the CeoAB-OpcM efflux pump	aminoglycoside;fluoroquinolone
Isolate1	S1-length_513	54491	57575	+	ceoB	1-3084/3084	========/======	1/1	100	99.87	card	U97042:1263-4347	ceoB is a cytoplasmic membrane component of the CeoAB-OpcM efflux pump	aminoglycoside;fluoroquinolone
Isolate1	S1-length_513	57702	59240	+	opcM	1-1536/1536	========/======	5/5	99.93	99.16	card	U38944.1:0-1536	OpcM is an outer membrane factor protein found in Burkholderia cepacia. It is part of the CeoAB-OpcM complex.	aminoglycoside;fluoroquinolone
Isolate1	S1-length_233	145199	146378	+	amrA	24-1200/1200	========/======	15/23	97.25	80.84	card	BX571965.1:2152165-2150965	amrA is the efflux pump subunit of the AmrAB-OprM multidrug efflux complex. amrA corresponds to 1 locus in Pseudomonas aeruginosa PAO1 and 1 locus in Pseudomonas aeruginosa LESB58.	aminoglycoside
Isolate1	S1-length_233	146394	149503	+	amrB	1-3110/3132	========/======	2/2	99.27	88.17	card	BX571965.1:2150949-2147817	amrB is the membrane fusion protein of the AmrAB-OprM multidrug efflux complex.	aminoglycoside
Isolate1	S1-length_265	25329	26405	+	Burkholderia_pseudomallei_Omp38	1-1122/1122	========/======	14/59	95.37	80.78	card	AY312416:0-1122	Heterologous expression of Burkholderia pseudomallei Omp38 (BpsOmp38) in Omp-deficient E. coli host cells lowers their permeability and in consequence their antimicrobial susceptibility to penicillin G cefoxitin ceftazidime and imipenem.	carbapenem;cephalosporin;cephamycin;monobactam;penam;penem
Isolate2	S2-length_512	25329	26405	+	Burkholderia_pseudomallei_Omp38	1-1122/1122	========/======	14/59	95.37	80.78	card	AY312416:0-1122	Heterologous expression of Burkholderia pseudomallei Omp38 (BpsOmp38) in Omp-deficient E. coli host cells lowers their permeability and in consequence their antimicrobial susceptibility to penicillin G cefoxitin ceftazidime and imipenem.	carbapenem;cephalosporin;cephamycin;monobactam;penam;penem

File2.tab

#FILE	SEQUENCE	START	END	STRAND	GENE	COVERAGE	COVERAGE_MAP	GAPS	%COVERAGE	%IDENTITY	DATABASE	ACCESSION	PRODUCT	RESISTANCE
Isolate2	S2-length_512	25329	26405	+	Burkholderia_pseudomallei_Omp38	1-1122/1122	========/======	14/59	95.37	80.78	card	AY312416:0-1122	Heterologous expression of Burkholderia pseudomallei Omp38 (BpsOmp38) in Omp-deficient E. coli host cells lowers their permeability and in consequence their antimicrobial susceptibility to penicillin G cefoxitin ceftazidime and imipenem.	carbapenem;cephalosporin;cephamycin;monobactam;penam;penem

Will be converted to:

Sample,ceoA,ceoB,opcM,amrA,amrB,Burkholderia_pseudomallei_Omp38
BUR-BAB-IMI-102146,1,1,1,1,1,1
AA2,0,0,0,0,0,1

If the --transpose flag is used, the output will be:

Sample,Isolate1,Isolate2
ceoA,1,0
ceoB,1,0
opcM,1,0
amrA,1,0
amrB,1,0
Burkholderia_pseudomallei_Omp38,1,1

Installation

pip install gene2tab

Usage

See all the available options with:

gene2tab -h

Running the script. The input can be a single .tab file or a directory containing multiple .tab files. The output will be a .csv file.

gene2tab -i [output_directory or file.tab] -o output.csv --min_coverage 0.9 --min_identity 0.9 

You can also transpose the output with the --transpose flag.

gene2tab -i [output_directory or file.tab] -o output.csv --min_coverage 0.9 --min_identity 0.9 --transpose

If your files use a different delimiter than tab, you can specify it with the --input_file_delimiter flag.

gene2tab -i [output_directory or file.tab] -o output.csv --min_coverage 0.9 --min_identity 0.9 --input_file_delimiter ','

If you want a different delimiter in the output file, you can specify it with the --output_file_delimiter flag.

gene2tab -i [output_directory or file.tab] -o output.csv --min_coverage 0.9 --min_identity 0.9 --output_delimiter ';'

License

Apache License 2.0

About

This script is used to convert the output of ABRICATE tabulated output into a matrix samples with the genes as columns and the presence/absence of the gene as the values.

Topics

Resources

License

Stars

Watchers

Forks

Languages