Skip to content

Code to help analyse multiple sequence alignments

Notifications You must be signed in to change notification settings

wadhamite/virus-align

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Extract genes from multi-sequence file

Charlotte Houldcroft & Krishna Kumar

The python script extracts open reading frames (ORFs) from a multiple sequence alignment. It works by taking an MSA and removing all - from the reference sequence (must be top of file). It then matches the refseq to the refseq CDS for a given ORF and extracts that region from all the sequences in the MSA, thus producing a gene-by-gene 'chunked' MSA

Installation instructions

pip3 install virtualenv
virtualenv env
source env/bin/activate
pip3 install -r requirements.txt

Run Python code to extract genes

python3 extract-orf.py multialigned_sequence.fasta genes.fasta 

About

Code to help analyse multiple sequence alignments

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages