HPC and BigDataPipeline

Here is all the code we use in a published paper in PEARC18:

Combining HPC and Big Data Infrastructures in Large-Scale Post-Processing of Simulation Data: A Case Study

Files Introduction

distance.py and hbond.py are from MDTraj, I only modified these two files

mdtraj.py is for running MDTraj

optSeqCode is the optimized sequential code

spark.ipynb is the latest spark code using optimized code

originalSeqCode.py is the very original code

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md
distance.py		distance.py
hbond.py		hbond.py
mdtraj.py		mdtraj.py
optSeqCode.py		optSeqCode.py
originalSeqCode.py		originalSeqCode.py
spark.ipynb		spark.ipynb