Skip to content

Commit

Permalink
Merge pull request #2 from isaacsas/surrogate_tutorial
Browse files Browse the repository at this point in the history
Surrogate tutorial
  • Loading branch information
isaacsas authored Sep 9, 2024
2 parents 779d538 + 4eaceb7 commit 16dfdda
Show file tree
Hide file tree
Showing 10 changed files with 269 additions and 16 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/CI.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ jobs:
- Core
version:
- '1'
- '1.6'
- '1.7'
steps:
- uses: actions/checkout@v2
- uses: julia-actions/setup-julia@v1
Expand Down
3 changes: 3 additions & 0 deletions docs/Project.toml
Original file line number Diff line number Diff line change
@@ -1,9 +1,12 @@
[deps]
CSV = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b"
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
JLD = "4138dd39-2aa7-5051-a626-17a0bb65d9c8"
Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
SPRFittingPaper2023 = "f49ade76-f424-4ced-b837-3ff4885edccf"

[compat]
CSV = "0.10.14"
Documenter = "1.0"
Plots = "1.40.1"
SPRFittingPaper2023 = "0.1.0"
1 change: 1 addition & 0 deletions docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ makedocs(
pages = Any[
"Home" => "index.md",
"Forward Model" => "forward_simulation.md",
"Surrogate Construction" => "surrogate_construction.md",
"API" => "SPRFitting_api.md"
],
warnonly = [:missing_docs]
Expand Down
20 changes: 11 additions & 9 deletions docs/src/forward_simulation.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
# Forward Model Simulation
In this tutorial we will show how to run forward model simulations given a set of parameters and plot the amount of bound antibodies.

We begin by installing the packages we need in a clean environment:
## Tutorial Setup
Let's first install the packages we need in a clean environment:
```julia
using Pkg

Expand All @@ -12,7 +13,8 @@ Pkg.add("Plots")
Pkg.add("CSV""
```
We now load the needed packages:
## Running Forward Simulations
We begin by loading the needed packages:
```@example fwdsim
using SPRFittingPaper2023, Plots, CSV
Expand All @@ -30,13 +32,13 @@ a cube, our standard model for bivalent SPR experiments. We first specify the
biophysical parameters to use in our simulations and collect them in a
[`BioPhysParams`](@ref) structure:
```@example fwdsim
antigenconcen = 13.8
antibodyconcens = [25.0, 100.0]
kon = 5.286e-05
koff = 0.040868575
konb = 0.7801815
reach = 31.89884387
CP = 128.569235 # coefficient of proprotionality to fit the SPR data
antigenconcen = 13.8 # assumed in μM
antibodyconcens = [25.0, 100.0] # assumed in nM
kon = 5.286e-05 # assumed in 1 / (nM * sec)
koff = 0.040868575 # assumed in 1 / sec
konb = 0.7801815 # assumed in 1 / sec
reach = 31.89884387 # assumed in nm
CP = 128.569235 # coefficient of proportionality to fit the SPR data
biopars = BioPhysParams(; kon, koff, konb, reach, antigenconcen, CP,
antibodyconcen = antibodyconcens[1])
```
Expand Down
6 changes: 0 additions & 6 deletions docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,12 +28,6 @@ You can use the package manager or "Pkg.add" to add any other needed packages to
that environment.



## Running in parallel
1. Save the metadata for the surrogate via modifying and running `examples/make_surrogate_metadata.jl`.
2. Modify and run `examples/make_slices.sh` to batch the surrogate data construction for your cluster.
3. Modify and run `examples/merge_surrogate_slices.jl` to merge the various data files back into the final surrogate.

## Bibliography
1. A. Huhn, D. Nissley, ..., C. M. Deane, S. A. Isaacson, and O. Dushek,
*Analysis of emergent bivalent antibody binding identifies the molecular
Expand Down
36 changes: 36 additions & 0 deletions docs/src/parallel_surrogate_files/make_surrogate_metadata.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
using SPRFitting, JLD

# directory these scripts and generated surrogate data will be in
BASEDIR = "/project/fpkmc3d/surrogates/rebuild_lut_highandlow"
mkpath(BASEDIR)

# within the directory the file to save the metadata within
metadatafile = joinpath(BASEDIR, "surrogate_metadata.jld")

# first we set biophysical parameters for the forward simulations
# that build the surrogate.
logkon_range = (-5.0, 2.0)
logkoff_range = (-4.0, -1.0)
logkonb_range = (-3.0, 1.5)
reach_range = (2.0, 35.0)

# now we set the numerical parameters, see SimParams for others
tstop = 600.0 # simulation end time
tsavelen = 601 # number of times to save
tstop_AtoB = 150.0 # time to turn off the surrogate

# size of the surrogate in each coordinate: nkon,nkoff,nkonb,nreach,tsavelen
surrogate_size = (42,30,30,30,tsavelen)

########################## END INPUT #############################

# make the output directory if not already present
DIR = dirname(metadatafile)
mkpath(DIR)

surpars = SurrogateParams(; logkon_range, logkoff_range, logkonb_range, reach_range)
tsave = collect(range(0.0, tstop, tsavelen))
simpars = SimParams(; tstop, tstop_AtoB, tsave)

# save the surrogate
save_surrogate_metadata(metadatafile, surrogate_size, surpars, simpars)
13 changes: 13 additions & 0 deletions docs/src/parallel_surrogate_files/merge_surrogate_slices.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
using SPRFitting, JLD

# the file in which you saved the metadata
METADATAFILE = "/project/fpkmc3d/surrogates/rebuild_lut_highandlow/surrogate_metadata.jld"

# the basename of the files with the output from each job
OUTFILEBASENAME = "/project/fpkmc3d/surrogates/rebuild_lut_highandlow/surrogate_slice"

nfiles = 900 # number of files with the given basename
overwrite = false # whether to overwrite an existing output file

# note this saves the data to a file named $OUTFILEBASENAME_merged.jld
merge_surrogate_slices(METADATAFILE, OUTFILEBASENAME, nfiles; force=overwrite)
30 changes: 30 additions & 0 deletions docs/src/parallel_surrogate_files/queue_job.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
#! /bin/bash

# the path to where you have installed SPRFittingPaper2023
# In Julia you can get this via calling pkgdir(SPRFittingPaper2023)
PKGPATH="/project/fpkmc3d/.julia_sai/dev/SPRFitting"

# The output name for the metadata file you created
METADATAFILE="/project/fpkmc3d/surrogates/rebuild_lut_highandlow/surrogate_metadata.jld"

# the base name to append to each surrogate portion (i.e. what each cluster CPU builds)
OUTFILEBASENAME="/project/fpkmc3d/surrogates/rebuild_lut_highandlow/surrogate_slice"

# the julia script we run to build one subset of the surrogate
JULIASCRIPT="${PKGPATH}/examples/make_surrogate_parallel.jl"

# the total number of parameter sets we iterate over, in this case
# 42*30*30*30
NUMPARAMSETS=1134000

# the number of independent jobs to create, each job handles NUMPARAMSETS / NCPUs parameter sets
NCPUS=900

let IDXINCREMENT=$NUMPARAMSETS/$NCPUS
let dn=$IDXINCREMENT-1
let idx=1
for (( n=1;n<=$NUMPARAMSETS;n+=$IDXINCREMENT)); do
let last=$n+$dn
qsub run_sim.sh "$JULIASCRIPT" "$PKGPATH" "$METADATAFILE" "${OUTFILEBASENAME}_${idx}.jld" $n $last
let idx+=1
done
28 changes: 28 additions & 0 deletions docs/src/parallel_surrogate_files/run_sim.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
#!/bin/bash -l

# Set SCC project
#$ -P fpkmc3d

# Specify hard time limit for the job.
# The job will be aborted if it runs longer than this time.
# The default time is 12 hours
#$ -l h_rt=4:00:00

# Send an email when the job finishes or if it is aborted (by default no email is sent).
#$ -m a

# Give job a name
#$ -N build_highlow_surrogate

# Combine output and error files into a single file
#$ -j y

# Keep track of information related to the current job
echo "=========================================================="
echo "Start date : $(date)"
echo "Job name : $JOB_NAME"
echo "Job ID : $JOB_ID $SGE_TASK_ID"
echo "=========================================================="


/project/fpkmc3d/julia/bin/julia $1 $2 $3 $4 $5 $6
146 changes: 146 additions & 0 deletions docs/src/surrogate_construction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
# Surrogate Model Construction
In this tutorial we will illustrate how to construct a (small) surrogate in
serial on any computer, and our workflow for building large surrogates on
clusters (specific to the Grid Engine queuing system based cluster we use).

## Setup
We begin by installing the packages we need in a clean environment:
```julia
using Pkg

# create a new environment for the forward model simulation
Pkg.activate("fwd_sim_env")
Pkg.add(url="https://github.com/isaacsas/SPRFittingPaper2023.jl.git")
Pkg.add("JLD")
```

## Serial Surrogate Construction
We first demonstrate how to build a (small) surrogate in serial (i.e. on a
single CPU core). We start by loading the needed packages:
```@example serialsur
using SPRFittingPaper2023, JLD
# import a useful but non-exported function:
using SPRFittingPaper2023
```

We next define the parameter ranges we wish to tabulate the surrogate over.
Reaction rates are specified via a range in **`log10`** space, i.e.
`log10(kon)`, `log10(koff)`, and `log10(konb)`. The reach is specified in linear
space:
```@example serialsur
logkon_range = (-3.0, 2.0)
logkoff_range = (-4.0, -1.0)
logkonb_range = (-3.0, 1.5)
reach_range = (2.0, 35.0) # in nm
surpars = SurrogateParams(; logkon_range, logkoff_range, logkonb_range, reach_range)
```
We collect these parameters into a [`SurrogateParams`](@ref) structure. Note
that here we will build the surrogate with a fixed internal antibody
concentration of `1.0` nM and antigen concentration of
```@example serialsur
SPRFittingPaper2023.DEFAULT_SIM_ANTIGENCONCEN
```
in units of μM. This is feasible to do because the antibody concentration only
arises in product with $k_{\text{on}}$, so when fitting we can (internally)
interpret the surrogate `logkon` values as representing $\log_{10}(k_{\text{on}}
[\text{Ab}])$. Similarly, as explained in the methods section of [1], using a
fixed antigen concentration is not problematic as we can analytically transform
any fit reach value using the internal antigen concentration to a physical reach
value corresponding to the true experimental antigen concentration.

Next we specify temporal information for the surrogate:
```@example serialsur
tstop = 600.0 # simulation end time
tsavelen = 601 # number of time points to save (must be integer currently)
tstop_AtoB = 150.0 # time to remove free antibodies
tsave = collect(range(0.0, tstop, tsavelen)) # times to save at
simpars = SimParams(; tstop, tstop_AtoB, tsave)
```
We collect these parameters into a [`SimParams`](@ref) object. Note that it
contains many other parameters for which we typically just use the default value
when building a surrogate (for example, by default distributing antigen
particles uniformly within a cube). The default number of antigen is
```@example serialsur
simpars.N
```

Finally, we specify how many points to tabulate over for each of the four
parameters. Parameters are spaced uniformly in `log10` space for the rates and
linear space for the reach:
```@example serialsur
# here the order is number of points for
# [logkon, logkoff, logkonb, reach, time]
surrogate_size = (3, 3, 3, 3, tsavelen)
```
Note, this is a very small surrogate, which we would not use in any practical
fitting assay. More typical values are given in our manuscript (often 30-50
points per parameter).

We are now ready to build and save the surrogate
```@example serialsur
outfile = tempname() # just use a temporary file name
save_surrogate(outfile, surrogate_size, surpars, simpars)
```

Note that the surrogate by default saves curves that correspond to the
```math
\frac{\text{average number of bound antibodies}}{\text{ the number of antigen in the system}}.
```

## Surrogate Format
The surrogate is stored in a Julia [JLD](https://github.com/JuliaIO/JLD.jl) file. We can see the raw data in the surrogate via the JLD `load`
command:
```@example serialsur
surdata = load(outfile)
```
To access a given field we can say
```@example serialsur
surdata["tstop"]
```
In particular, `surdata["FirstMoment"]` will correspond to the table of solution
curves.

To load the surrogate for use in fitting we instead use
```@example serialsur
sur = Surrogate(outfile)
```
The use of the loaded surrogate for fitting will be illustrated in the next tutorial.


## Parallel Surrogate Construction
Below we explain our workflow for constructing the surrogate via parallel
simulations using the Grid Engine queuing system. The basic workflow and scripts
we link were designed for this system, but should be adaptable to other queue-based
clusters.

The basic approach is
1. Save a metadata file with all information needed to build the surrogate.
2. Construct pieces of the surrogate as independent single-core jobs on the cluster.
3. Merge the pieces of the surrogate back together into a single complete surrogate.

The surrogate used in [1] can be downloaded from
[here](https://doi.org/10.6084/m9.figshare.26936854). Below we give the basic
scripts and commands used in its construction (note constructing such a
surrogate generally requires in total 500-2000 hours of cpu time on the Boston
University cluster).

Our workflow is
1. Edit
["make\_surrogate\_metadata.jl"](./parallel_surrogate_files/make_surrogate_metadata.jl)
for your system and then run it in Julia (via
`include("make_surrogate_metadata.jl")`).
2. Edit the bash scripts ["queue_job.sh"](./parallel_surrogate_files/queue_job.sh)
and ["run_sim.sh"](./parallel_surrogate_files/run_sim.sh) for your system as
appropriate and run queue_job.sh to submit the parallel surrogate jobs.
3. After all jobs finish, edit
["merge\_surrogate\_slices.jl"](./parallel_surrogate_files/merge_surrogate_slices.jl)
as appropriate and run it in Julia via `include("merge_surrogate_slices.jl")` to merge the output from each cluster job into one complete surrogate.

The linked scripts should correspond to those used to construct the surrogate in [1].

## Bibliography
1. A. Huhn, D. Nissley, ..., C. M. Deane, S. A. Isaacson, and O. Dushek,
*Analysis of emergent bivalent antibody binding identifies the molecular
reach as a critical determinant of SARS-CoV-2 neutralisation potency*, in
review, [available on bioRxiv](https://www.biorxiv.org/content/10.1101/2023.09.06.556503v2) (2024).

0 comments on commit 16dfdda

Please sign in to comment.