-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #2 from isaacsas/surrogate_tutorial
Surrogate tutorial
- Loading branch information
Showing
10 changed files
with
269 additions
and
16 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,9 +1,12 @@ | ||
[deps] | ||
CSV = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b" | ||
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4" | ||
JLD = "4138dd39-2aa7-5051-a626-17a0bb65d9c8" | ||
Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80" | ||
SPRFittingPaper2023 = "f49ade76-f424-4ced-b837-3ff4885edccf" | ||
|
||
[compat] | ||
CSV = "0.10.14" | ||
Documenter = "1.0" | ||
Plots = "1.40.1" | ||
SPRFittingPaper2023 = "0.1.0" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
36 changes: 36 additions & 0 deletions
36
docs/src/parallel_surrogate_files/make_surrogate_metadata.jl
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
using SPRFitting, JLD | ||
|
||
# directory these scripts and generated surrogate data will be in | ||
BASEDIR = "/project/fpkmc3d/surrogates/rebuild_lut_highandlow" | ||
mkpath(BASEDIR) | ||
|
||
# within the directory the file to save the metadata within | ||
metadatafile = joinpath(BASEDIR, "surrogate_metadata.jld") | ||
|
||
# first we set biophysical parameters for the forward simulations | ||
# that build the surrogate. | ||
logkon_range = (-5.0, 2.0) | ||
logkoff_range = (-4.0, -1.0) | ||
logkonb_range = (-3.0, 1.5) | ||
reach_range = (2.0, 35.0) | ||
|
||
# now we set the numerical parameters, see SimParams for others | ||
tstop = 600.0 # simulation end time | ||
tsavelen = 601 # number of times to save | ||
tstop_AtoB = 150.0 # time to turn off the surrogate | ||
|
||
# size of the surrogate in each coordinate: nkon,nkoff,nkonb,nreach,tsavelen | ||
surrogate_size = (42,30,30,30,tsavelen) | ||
|
||
########################## END INPUT ############################# | ||
|
||
# make the output directory if not already present | ||
DIR = dirname(metadatafile) | ||
mkpath(DIR) | ||
|
||
surpars = SurrogateParams(; logkon_range, logkoff_range, logkonb_range, reach_range) | ||
tsave = collect(range(0.0, tstop, tsavelen)) | ||
simpars = SimParams(; tstop, tstop_AtoB, tsave) | ||
|
||
# save the surrogate | ||
save_surrogate_metadata(metadatafile, surrogate_size, surpars, simpars) |
13 changes: 13 additions & 0 deletions
13
docs/src/parallel_surrogate_files/merge_surrogate_slices.jl
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
using SPRFitting, JLD | ||
|
||
# the file in which you saved the metadata | ||
METADATAFILE = "/project/fpkmc3d/surrogates/rebuild_lut_highandlow/surrogate_metadata.jld" | ||
|
||
# the basename of the files with the output from each job | ||
OUTFILEBASENAME = "/project/fpkmc3d/surrogates/rebuild_lut_highandlow/surrogate_slice" | ||
|
||
nfiles = 900 # number of files with the given basename | ||
overwrite = false # whether to overwrite an existing output file | ||
|
||
# note this saves the data to a file named $OUTFILEBASENAME_merged.jld | ||
merge_surrogate_slices(METADATAFILE, OUTFILEBASENAME, nfiles; force=overwrite) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
#! /bin/bash | ||
|
||
# the path to where you have installed SPRFittingPaper2023 | ||
# In Julia you can get this via calling pkgdir(SPRFittingPaper2023) | ||
PKGPATH="/project/fpkmc3d/.julia_sai/dev/SPRFitting" | ||
|
||
# The output name for the metadata file you created | ||
METADATAFILE="/project/fpkmc3d/surrogates/rebuild_lut_highandlow/surrogate_metadata.jld" | ||
|
||
# the base name to append to each surrogate portion (i.e. what each cluster CPU builds) | ||
OUTFILEBASENAME="/project/fpkmc3d/surrogates/rebuild_lut_highandlow/surrogate_slice" | ||
|
||
# the julia script we run to build one subset of the surrogate | ||
JULIASCRIPT="${PKGPATH}/examples/make_surrogate_parallel.jl" | ||
|
||
# the total number of parameter sets we iterate over, in this case | ||
# 42*30*30*30 | ||
NUMPARAMSETS=1134000 | ||
|
||
# the number of independent jobs to create, each job handles NUMPARAMSETS / NCPUs parameter sets | ||
NCPUS=900 | ||
|
||
let IDXINCREMENT=$NUMPARAMSETS/$NCPUS | ||
let dn=$IDXINCREMENT-1 | ||
let idx=1 | ||
for (( n=1;n<=$NUMPARAMSETS;n+=$IDXINCREMENT)); do | ||
let last=$n+$dn | ||
qsub run_sim.sh "$JULIASCRIPT" "$PKGPATH" "$METADATAFILE" "${OUTFILEBASENAME}_${idx}.jld" $n $last | ||
let idx+=1 | ||
done |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
#!/bin/bash -l | ||
|
||
# Set SCC project | ||
#$ -P fpkmc3d | ||
|
||
# Specify hard time limit for the job. | ||
# The job will be aborted if it runs longer than this time. | ||
# The default time is 12 hours | ||
#$ -l h_rt=4:00:00 | ||
|
||
# Send an email when the job finishes or if it is aborted (by default no email is sent). | ||
#$ -m a | ||
|
||
# Give job a name | ||
#$ -N build_highlow_surrogate | ||
|
||
# Combine output and error files into a single file | ||
#$ -j y | ||
|
||
# Keep track of information related to the current job | ||
echo "==========================================================" | ||
echo "Start date : $(date)" | ||
echo "Job name : $JOB_NAME" | ||
echo "Job ID : $JOB_ID $SGE_TASK_ID" | ||
echo "==========================================================" | ||
|
||
|
||
/project/fpkmc3d/julia/bin/julia $1 $2 $3 $4 $5 $6 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,146 @@ | ||
# Surrogate Model Construction | ||
In this tutorial we will illustrate how to construct a (small) surrogate in | ||
serial on any computer, and our workflow for building large surrogates on | ||
clusters (specific to the Grid Engine queuing system based cluster we use). | ||
|
||
## Setup | ||
We begin by installing the packages we need in a clean environment: | ||
```julia | ||
using Pkg | ||
|
||
# create a new environment for the forward model simulation | ||
Pkg.activate("fwd_sim_env") | ||
Pkg.add(url="https://github.com/isaacsas/SPRFittingPaper2023.jl.git") | ||
Pkg.add("JLD") | ||
``` | ||
|
||
## Serial Surrogate Construction | ||
We first demonstrate how to build a (small) surrogate in serial (i.e. on a | ||
single CPU core). We start by loading the needed packages: | ||
```@example serialsur | ||
using SPRFittingPaper2023, JLD | ||
# import a useful but non-exported function: | ||
using SPRFittingPaper2023 | ||
``` | ||
|
||
We next define the parameter ranges we wish to tabulate the surrogate over. | ||
Reaction rates are specified via a range in **`log10`** space, i.e. | ||
`log10(kon)`, `log10(koff)`, and `log10(konb)`. The reach is specified in linear | ||
space: | ||
```@example serialsur | ||
logkon_range = (-3.0, 2.0) | ||
logkoff_range = (-4.0, -1.0) | ||
logkonb_range = (-3.0, 1.5) | ||
reach_range = (2.0, 35.0) # in nm | ||
surpars = SurrogateParams(; logkon_range, logkoff_range, logkonb_range, reach_range) | ||
``` | ||
We collect these parameters into a [`SurrogateParams`](@ref) structure. Note | ||
that here we will build the surrogate with a fixed internal antibody | ||
concentration of `1.0` nM and antigen concentration of | ||
```@example serialsur | ||
SPRFittingPaper2023.DEFAULT_SIM_ANTIGENCONCEN | ||
``` | ||
in units of μM. This is feasible to do because the antibody concentration only | ||
arises in product with $k_{\text{on}}$, so when fitting we can (internally) | ||
interpret the surrogate `logkon` values as representing $\log_{10}(k_{\text{on}} | ||
[\text{Ab}])$. Similarly, as explained in the methods section of [1], using a | ||
fixed antigen concentration is not problematic as we can analytically transform | ||
any fit reach value using the internal antigen concentration to a physical reach | ||
value corresponding to the true experimental antigen concentration. | ||
|
||
Next we specify temporal information for the surrogate: | ||
```@example serialsur | ||
tstop = 600.0 # simulation end time | ||
tsavelen = 601 # number of time points to save (must be integer currently) | ||
tstop_AtoB = 150.0 # time to remove free antibodies | ||
tsave = collect(range(0.0, tstop, tsavelen)) # times to save at | ||
simpars = SimParams(; tstop, tstop_AtoB, tsave) | ||
``` | ||
We collect these parameters into a [`SimParams`](@ref) object. Note that it | ||
contains many other parameters for which we typically just use the default value | ||
when building a surrogate (for example, by default distributing antigen | ||
particles uniformly within a cube). The default number of antigen is | ||
```@example serialsur | ||
simpars.N | ||
``` | ||
|
||
Finally, we specify how many points to tabulate over for each of the four | ||
parameters. Parameters are spaced uniformly in `log10` space for the rates and | ||
linear space for the reach: | ||
```@example serialsur | ||
# here the order is number of points for | ||
# [logkon, logkoff, logkonb, reach, time] | ||
surrogate_size = (3, 3, 3, 3, tsavelen) | ||
``` | ||
Note, this is a very small surrogate, which we would not use in any practical | ||
fitting assay. More typical values are given in our manuscript (often 30-50 | ||
points per parameter). | ||
|
||
We are now ready to build and save the surrogate | ||
```@example serialsur | ||
outfile = tempname() # just use a temporary file name | ||
save_surrogate(outfile, surrogate_size, surpars, simpars) | ||
``` | ||
|
||
Note that the surrogate by default saves curves that correspond to the | ||
```math | ||
\frac{\text{average number of bound antibodies}}{\text{ the number of antigen in the system}}. | ||
``` | ||
|
||
## Surrogate Format | ||
The surrogate is stored in a Julia [JLD](https://github.com/JuliaIO/JLD.jl) file. We can see the raw data in the surrogate via the JLD `load` | ||
command: | ||
```@example serialsur | ||
surdata = load(outfile) | ||
``` | ||
To access a given field we can say | ||
```@example serialsur | ||
surdata["tstop"] | ||
``` | ||
In particular, `surdata["FirstMoment"]` will correspond to the table of solution | ||
curves. | ||
|
||
To load the surrogate for use in fitting we instead use | ||
```@example serialsur | ||
sur = Surrogate(outfile) | ||
``` | ||
The use of the loaded surrogate for fitting will be illustrated in the next tutorial. | ||
|
||
|
||
## Parallel Surrogate Construction | ||
Below we explain our workflow for constructing the surrogate via parallel | ||
simulations using the Grid Engine queuing system. The basic workflow and scripts | ||
we link were designed for this system, but should be adaptable to other queue-based | ||
clusters. | ||
|
||
The basic approach is | ||
1. Save a metadata file with all information needed to build the surrogate. | ||
2. Construct pieces of the surrogate as independent single-core jobs on the cluster. | ||
3. Merge the pieces of the surrogate back together into a single complete surrogate. | ||
|
||
The surrogate used in [1] can be downloaded from | ||
[here](https://doi.org/10.6084/m9.figshare.26936854). Below we give the basic | ||
scripts and commands used in its construction (note constructing such a | ||
surrogate generally requires in total 500-2000 hours of cpu time on the Boston | ||
University cluster). | ||
|
||
Our workflow is | ||
1. Edit | ||
["make\_surrogate\_metadata.jl"](./parallel_surrogate_files/make_surrogate_metadata.jl) | ||
for your system and then run it in Julia (via | ||
`include("make_surrogate_metadata.jl")`). | ||
2. Edit the bash scripts ["queue_job.sh"](./parallel_surrogate_files/queue_job.sh) | ||
and ["run_sim.sh"](./parallel_surrogate_files/run_sim.sh) for your system as | ||
appropriate and run queue_job.sh to submit the parallel surrogate jobs. | ||
3. After all jobs finish, edit | ||
["merge\_surrogate\_slices.jl"](./parallel_surrogate_files/merge_surrogate_slices.jl) | ||
as appropriate and run it in Julia via `include("merge_surrogate_slices.jl")` to merge the output from each cluster job into one complete surrogate. | ||
|
||
The linked scripts should correspond to those used to construct the surrogate in [1]. | ||
|
||
## Bibliography | ||
1. A. Huhn, D. Nissley, ..., C. M. Deane, S. A. Isaacson, and O. Dushek, | ||
*Analysis of emergent bivalent antibody binding identifies the molecular | ||
reach as a critical determinant of SARS-CoV-2 neutralisation potency*, in | ||
review, [available on bioRxiv](https://www.biorxiv.org/content/10.1101/2023.09.06.556503v2) (2024). |