find all instances of an amino acid sequence motif in protein structures and parse their internal coordinates
-
Install dependencies
-
Download the protein structures from the PDB
-
Write chain query file (optional)
- A list specifying which chains to search, and which protein structure files to search in.
This file is a long-format csv file containing the protein entry ids, the path to each
protein file from the root directory that contains all the protein structure files, and
the chain entry identifiers for each protein file, e.g.
Protein ID, File Path, Chain 1i6w, 1i6w.pdb.ent.gz, A 1i6w, 1i6w.pdb.ent.gz, B 1gci, 1gci.pdb.gz, A ...
- A list specifying which chains to search, and which protein structure files to search in.
-
Run motif-conformations.py from the command-line:
./motif-conformations.py [-h] [-q QUERY_LIST] [-g] motif structure_directory {pdb,cif} output_file
- run
./motif-conformations.py -h
or./motif-conformations.py --help
for
command line usage help. - use
-q
to specify the path to the chain query file you generated above in 3.
If you do not specify a file, then the program will attempt to parse all files in the
structure_directory
and seekmotif
in all the protein chains it finds - use the
-g
flag if the protein structure files are gzipped motif
is the amino acid sequence to seek. Note that the program currently only
seeks exact matches, and will not interpret promotif or regex motifsstructure_directory
is the path of the the root directory that contains all the
protein structure files- choose
cif
if the protein structure files are in the PDBx/MMCIF (.cif) format - choose
pdb
if the protein structure files are in the .pdb format output_file
is the name of the output file that the program will write
- run
Read the API Documentation to take a deeper look at the code.