Comparison of molecular structures parsers (PDB, mmCif) available in biopython
Python 2.7 or 3, BioPython, texttable and PDB and mmCif files for your structure(s).
BioParserCompare compares PDB.PDBParser and PDB.MMCIFDict modules output (header info).
It creates a report for single structures, and if run in test mode, a general report of average times and missing values.
You can use it to quickly find placement of some data in the structures returned by PDBParser and MMCIFDict.
You can also use it to see human-friendly form of data from your structure files.
PDB.MMCIFParser does not support the header info. In general, it does not contain many info, and it is advised to used PDB.MMCIFDict instead.
You can still use read_files.read_mmmcif_file(struct_name) to get the MMCIFParser result :)
-
PDB parsing time, mmCif parsing time (MMCIFParser + MMCIFDict)
-
Table comapring the values available in both structures
-
List of fields from PDB header not found in Cif (mostly not mapped)
-
Average PDB parsing time, average mmCif parsing time (MMCIFParser + MMCIFDict)
-
List of fields not found for any of the structures
Some fields that have non-deterministic values were hard to map, and some are not mapped yet. You can always contribute by adding your mappings to the pdb2cif_header_reference.py file!
You can download both mmcif and PDB files at:
http://www.rcsb.org/pdb/home/home.do#Subcategory-download_structures
I did not yet found a solution to download mmCif automatically.
You can download PDB files in your code using:
pdbl = PDBList()
pdbl.retrieve_pdb_file('1FAT')
Find more at: http://biopython.org/wiki/The_Biopython_Structural_Bioinformatics_FAQ
Download this repo and install BioPython and texttable :)
Import the BioParserCompare read_files.py and analyse a single structure:
You must add your structure PDB and mmCif files to 'data/' directory!
import read_files as bpc
my_struct = 'abcd'
bcp.analize_structure(my_struct)
It will produce the report for a single structure under 'reports/'
You must add your structure PDB and mmCif files to 'data/' directory!
List your structures names in the test_structures.txt file (some are there already, just remove if not needed). Then run read_files.py.
It will create a report for every structure in 'reports/' and a main report in the main BioParserCompare directory.
Run pick_structures.py, it will pick a few structures from the 'author.idx' file. Then tun read_files.py, it will create a report for every structure in 'reports/' and a main report in the main BioParserCompare directory. All needed files are in the repo.
You can edit the picking algorythm if you want. You must add any new PDB and mmCif files to 'data/' directory!
You must add your structure PDB and mmCif files to 'data/' directory!
import read_files as bpc
my_struct = 'abcd'
pdb_struct, pdb_parsing_time = bpc.read_pdb_file(my_struct)
cif_struct, cif_dict, cif_parsing_time = bpc.read_mmmcif_file(my_struct)
Olga Borgula (nnnnodahlia@gmail.com)