Extract genes from multi-sequence file

Charlotte Houldcroft & Krishna Kumar

The python script extracts open reading frames (ORFs) from a multiple sequence alignment. It works by taking an MSA and removing all - from the reference sequence (must be top of file). It then matches the refseq to the refseq CDS for a given ORF and extracts that region from all the sequences in the MSA, thus producing a gene-by-gene 'chunked' MSA

Installation instructions

pip3 install virtualenv
virtualenv env
source env/bin/activate
pip3 install -r requirements.txt

Run Python code to extract genes

python3 extract-orf.py multialigned_sequence.fasta genes.fasta

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.gitignore		.gitignore
README.md		README.md
extract-orf.py		extract-orf.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Extract genes from multi-sequence file

Installation instructions

Run Python code to extract genes

About

Releases

Packages

Contributors 2

Languages

wadhamite/virus-align

Folders and files

Latest commit

History

Repository files navigation

Extract genes from multi-sequence file

Installation instructions

Run Python code to extract genes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages