Here is all the code we use in a published paper in PEARC18:
Combining HPC and Big Data Infrastructures in Large-Scale Post-Processing of Simulation Data: A Case Study
ACM Digital Library Link: https://dl.acm.org/citation.cfm?id=3229279
distance.py and hbond.py are from MDTraj, I only modified these two files
mdtraj.py is for running MDTraj
optSeqCode is the optimized sequential code
spark.ipynb is the latest spark code using optimized code
originalSeqCode.py is the very original code