Goals

Generate MetAtlas dataset containing minimized structures and energy for each molecule in neutral, protonated, and deprotonated states.
Correlate dataset to structural features through machine learning.
- Want to use the Harvard autoencoder. It is based on sentence recognition, but each work is representing one letter in smiles string. I want to look at defining chemical groups as words.
Try and predict protonation/deprotonation sites and energies of molecules outside the data set.
Predict fragmentation spectra
1. Generate energy costs for fragmenting bonds in molecules. We'll need to do a subset as this is combinatorial.
2. Correlate autoencoded structures (with and without protonation information) to experimentally available mass spectra.
3. From a structure outside the testset, predict a mass spectrum.
4. If we can do this, we can start to enumerate predicted mass spectra and we can train a neural net to assign experimental spectra.

Getting started

A conda environment can be made using the provided env.yaml file. It has a lot of dependencies due to bloat-y modules such as openbabel and psi4. The primary mode of package installation is conda, with pip being the fallback in cases where no conda package exists.

Requirements

Orca v3
Fireworks
Python modules a. mendeleev b. pybel
Database config file

Generating data

A pre-populated CSV file containing the metatlas database is included, with molecules stored in the 'inchi' format (metatlas_inchi_inchikey.csv).

The Fireworksframework (developed by Jain, et. al, here at LBL), is used as an interface between the slurm queueing system on NERSC and the the mongodb database that is used to catalog dataset information.

There is currently a mongodb instance managed and running on NERSC; I have the credentials and can share them with whomever requires them.

Defining Fireworks Tasks

In order to create a firework task that runs a simulation of some kind, one must define a new class that derives from FiretaskBase and has a method run_task associated with it. Example tasks can be seen in metatlas.py. `metatlas.py** contains all of the firetasks that have been written thus far.

Using the newly-defined Fireworks Tasks

In order for the conda-installed fireworks package to use your newly defined Firetasks that are defined in metatlas.py, you must make a soft link in the package's user_objects directory:

cd /path/to/anaconda/lib/python2.7/site-packages/fireworks/user_objects
ln -s /path/to/MetAtlas/metatlas.py

Once a new Task has been defined, the next thing is to get them into the Fireworks mongodb via the queueing system. This requires a simple script that reads in a new set of molecules (likely in SMILES format), converting them to an input that can be fed into a Task, and then launched into the queue. `main.py** has a simple example of doing this.

Running all the calculations on Edison

Turns out this is kinda non-trivial and I've clearly forgotten how to do it. So let me refresh my memory and keep track of what's going on so this doesn't happen again.

Remember, you must create the same soft link on the NERSC system in order for it to find your newly made tasks, see section 'Using the newly-defined...' above.

Actually submitting jobs to the queue

From the MOM node of a compute facility with a queuing system, I basically just left a running 'qlaunch' instance that continually keeps the queue with m number of jobs. Make sure you're running this from a scratch directory so as to not overload the home disk!

qlaunch rapidfire -m 50 --nlaunches infinite

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
data		data
malt @ 4390707		malt @ 4390707
scripts		scripts
test		test
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
chebi_and_metacyc.ipynb		chebi_and_metacyc.ipynb
env.yaml		env.yaml
main.py		main.py
metatlas.py		metatlas.py
metatlas_lpad.ipynb		metatlas_lpad.ipynb
molecular-graph-example.ipynb		molecular-graph-example.ipynb
protonation.ipynb		protonation.ipynb
pymongo-example.ipynb		pymongo-example.ipynb
run_servers.sh		run_servers.sh
update_050718.ipynb		update_050718.ipynb
update_060518.ipynb		update_060518.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Goals

Getting started

Requirements

Generating data

Defining Fireworks Tasks

Using the newly-defined Fireworks Tasks

Running all the calculations on Edison

Actually submitting jobs to the queue

About

Releases

Packages

Contributors 3

Languages

wadejong/MetAtlas

Folders and files

Latest commit

History

Repository files navigation

Goals

Getting started

Requirements

Generating data

Defining Fireworks Tasks

Using the newly-defined Fireworks Tasks

Running all the calculations on Edison

Actually submitting jobs to the queue

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages