Nupack-serve is a wrapper for NUPACK: a software
suite for the analysis and design of nucleic acid structures, devices, and
systems (Wolfe et al. 2017).
This wrapper modifies some of NUPACK's functions in order to:
- Incoporate NUPACK within computational pipelines
- Enhance the machine-readability of results
- Restrict computation to CPU-bound operations
Nupack-serve was initially created to port NUPACK's mfe, complexes and concentrations within the Triplexer pipeline. However, the C functions written to remove I/O bound operation and provide provenance medatada and JSON output can be adopted by the remaining NUPACK utilities.
NUPACK's functionalities rely on the reading/writing of files on the hard
drive. This design principle is handy when results need to be overviewed by
users. However, when placed within the context of computational pipelines, I/O
operations can exert a negative impact on the performances of the underlying
system.
For the same reason, when results are provided for human-only consultation, it
becomes harder to reuse them in an automated manner. Additional programs need
to be written to parse the generated output files, and this might become an
additional source of errors.
Nupack-serve is a working proof-of-concept to address these limitations, and
bring NUPACK functionalities to reproducible bioinformatics pipelines.
This package modifies only three of the many operations provided by NUPACK:
mfe, complexes, and concentrations; which str used within the context of
the Triplexer pipeline to compute
the minimum free energy of secondary structures, the equilibrium base-pairing
properties for each complex in a test tube, and the equilibrium concentration
for each complex.
Following, we illustrate the benefits of this solution using NUPACK's mfe as an example.
Mfe computes the minimum free energy (MFE) secondary structure(s), of
sequence(s) S over the ensemble of the complex C.
For multiple strands, mfe's command line invocation should specify the
-multi
flag, and provide an input file of the form FILE.in
with the
following information:
- The number of distinct strand species
- The sequence for each distinct strand species (each on a separate line)
- As many integers as the range of 1 to # of S sequences, representing the strand ordering in the complex C.
So, to compute the MFE of the complex resulting from the binding of human
target gene E2F1's binding
site with the two putatively cooperating miRNAs
hsa-miR-205
and hsa-miR-342-3p,
we have to provide the following test.in
file:
3
ccgggggugaaugugugugagcaugugugugugcauguaccggggaaugaaggu
uccuucauuccaccggagucug
ucucacacagaaaucgcacccgu
1 2 3
and call mfe with the following command line invocation:
$ mfe -multi test
NUPACK stores the result of mfe's call in a file called test.mfe
, which
contains the following lines:
% NUPACK 3.2.2
% Program: mfe
% Start time: Thu Apr 23 16:32:20 2020
% Command: mfe -multi test
% Sequence: ccgggggugaaugugugugagcaugugugugugcauguaccggggaaugaaggu+uccuucauuccaccggagucug+ucucacacagaaaucgcacccgu
% v(pi): 1
% Parameters: RNA, 1995
% Dangles setting: 1
% Temperature (C): 37.0
% Sodium concentration: 1.0000 M
% Magnesium concentration: 0.0000 M
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
99
-50.563
.((((.((((.(.(((((((((((((......)))))..((((((((((((((.+.)))))))))).))))......+.))))))))..).)))).)))).
2 98
3 97
4 96
5 95
7 93
8 92
9 91
10 90
12 88
14 85
15 84
16 83
17 82
18 81
19 80
20 79
21 78
22 37
23 36
24 35
25 34
26 33
40 70
41 69
42 68
43 67
44 65
45 64
46 63
47 62
48 61
49 60
50 59
51 58
52 57
53 56
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
This flow of execution presents limitations that we address using nupack-serve:
If we want to compute one million MFE calculations, we end up having to provide
one million .in
files, and obtain another million .mfe
files as result.
This behavior heavily impacts on performances.
For this reason, we force mfe to avoid I/O operations on disk.
$ mfe -multi
Requesting input manually.
Enter number of strands: 3 <--- automated input
Enter sequence for strand type 1:
ccgggggugaaugugugugagcaugugugugugcauguaccggggaaugaaggu <--- automated input
Enter sequence for strand type 2:
uccuucauuccaccggagucug <--- automated input
Enter sequence for strand type 3:
ucucacacagaaaucgcacccgu <--- automated input
Enter strand permutation (e.g. 1 2 4 3 2):
1 2 3 <--- input
As humans, we are able to immediately tell which parts of the resulting
.mfe
file are reserved to report the MFE calculation, and which to the
metadata of the whole operation. However to retrieve specific details of each
run, e.g. NUPACK's version, the resulting complex's sequence, or the
simulation's temperature, we would need to write dedicated parsers to run against
each .mfe
file. This represents an additional source for errors.
For this reason, we instruct mfe to return results and provenance metadata in the
JSON format.
{
"version": "3.2.2",
"command": "mfe -multi",
"cutoff": 0.001,
"sequence": "ccgggggugaaugugugugagcaugugugugugcauguaccggggaaugaaggu+uccuucauuccaccggagucug+ucucacacagaaaucgcacccgu",
"dangles": "1",
"temperature (C)": 37.0,
"concentration Na (M)": 1.0000,
"concentration Mg (M)": 0.0000,
"pseudoknots": false,
"sequence length (nt)": 99,
"minimum free energy (Kcal/mol)": -50.563,
"dot-bracket": ".((((.((((.(.(((((((((((((......)))))..((((((((((((((..)))))))))).)))).......))))))))..).)))).))))",
"pairs": [
(2,98),(3,97),(4,96),(5,95),(7,93),(8,92),(9,91),(10,90),(12,88),(14,85),(15,84),(16,83),(17,82),(18,81),(19,80),(20,79),(21,78),(22,37),(23,36),(24,35),(25,34),(26,33),(40,70),(41,69),(42,68),(43,67),(44,65),(45,64),(46,63),(47,62),(48,61),(49,60),(50,59),(51,58),(52,57),(53,56)
]
}
NUPACK has a list of software dependencies and requires users to compile its
source code. While this doesn't represent a problem for the computer savvy
users who have full control over their software installations, many do not fall
within this category.
For this reason, we provide NUPACK's functions within a docker container.
The only requirement is Docker, which can be installed in different ways depending on the underlying operative system:
- Unix users should follow the Docker installation for Linux
- MacOS 10.13+ users should follow the Docker installation for Mac
- Windows 10+ users, should follow the Docker installation for Windows
- For legacy systems, users can rely on the Docker Toolbox.
To run nupack-serve, you need to run its docker container. Type:
docker run -d -p 9000:8000 quay.io/bagnacan/nupack-serve
The nupack-serve app is now listening on your host's port 9000.
You can test its docker container by trying one of the provided examples.
In your browser, type:
localhost:9000
You should obtain something like this:
{
"usage":"Nupack-serve is a wrapper for NUPACK: a software suite for the analysis and design of nucleic acid structures, devices, and systems (Wolfe et al. 2017).",
"homepage":"https://github.com/sbi-rostock/nupack-serve",
"license":"NUPACK Software License Agreement" <-- here cut for brevity
}
You can now:
- Incoporate NUPACK within computational pipelines
- Enhance the machine-readability of results (JSON)
- Avoid read/writes to disk
Nupack-serve wraps NUPACK's mfe, complexes and concentrations in a python application which returns:
- NUPACK's license agreement
- The exit status of the subprocess which executed the called NUPACK function
- Results and provenance metadata in the JSON format
We provide sample executions of nupack-serve mfe, nupack-serve complexes,
and nupack-serve concentrations.
Examples refer to the physico-chemical properties of the RNA triplex formed by:
- Human target gene E2F1
- Human miRNA hsa-miR-205
- Human miRNA hsa-miR-342-3p
Sample nupack-serve operations are available at
localhost:9000/example/{OPERATION}
.
To test nupack-serve mfe, open your browser and type:
localhost:9000/example/mfe
You should obtain something like this:
{
"license":"NUPACK Software License Agreement", <-- here cut for brevity
"status":0, <-- exit status of the subprocess that executed mfe in the container
"result":{ <-- mfe result
"version":"3.2.2",
"command":"/usr/local/bin/mfe -multi",
"cutoff":0.001,
"sequence":"ccgggggugaaugugugugagcaugugugugugcauguaccggggaaugaaggu+uccuucauuccaccggagucug+ucucacacagaaaucgcacccgu",
"dangles":"1",
"temperature (C)":37.0,
"concentration Na (M)":1.0,
"concentration Mg (M)":0.0,
"pseudoknots":false,
"sequence length (nt)":99,
"minimum free energy (Kcal/mol)":-50.563,
"dot-bracket":".((((.((((.(.(((((((((((((......)))))..((((((((((((((..)))))))))).)))).......))))))))..).)))).))))",
"pairs":[
[2,98],[3,97],[4,96],[5,95],[7,93],[8,92],[9,91],[10,90],[12,88],[14,85],[15,84],[16,83],[17,82],[18,81],[19,80],[20,79],[21,78],[22,37],[23,36],[24,35],[25,34],[26,33],[40,70],[41,69],[42,68],[43,67],[44,65],[45,64],[46,63],[47,62],[48,61],[49,60],[50,59],[51,58],[52,57],[53,56]
]
}
}
Change operation to test nupack-serve complexes and nupack-serve concentrations.
Nupack-serve operations are available at
localhost:9000/{OPERATION}
.
To test nupack-serve mfe, open a file test.py
and write:
#!/usr/bin/env python3
import json
import requests
if __name__ == "__main__":
parameters = {
"number of sequences": "3",
"target sequence": "ccgggggugaaugugugugagcaugugugugugcauguaccggggaaugaaggu",
"mir1 sequence": "uccuucauuccaccggagucug",
"mir2 sequence": "ucucacacagaaaucgcacccgu",
"permutations": ["1 2 3"]
}
# request nupack-mfe with the defined parameterization
response = requests.get("http://localhost:9000/mfe", data=json.dumps(parameters))
# pretty print
if response.status_code == 200:
j = response.content.decode('utf8').replace("'", '"')
data = json.loads(j)
print(json.dumps(data, indent=4, sort_keys=True))
Then, launch it with:
$ ./test.py
You will obtain the same result as the one you obtain by pointing your browser
to localhost:9000/example/mfe
.
Change operation and parameterization to test nupack-serve complexes and nupack-serve concentrations.