Skip to content

Commit

Permalink
finish fitting tutorial
Browse files Browse the repository at this point in the history
  • Loading branch information
isaacsas committed Sep 10, 2024
1 parent 5a240de commit b667ff0
Show file tree
Hide file tree
Showing 11 changed files with 3,261 additions and 2 deletions.
16 changes: 14 additions & 2 deletions docs/src/fitting.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,13 +90,25 @@ curvefile = joinpath(OUTDIR, "parameters.xlsx")
savefit(optsol, aligneddat, surrogate, simpars, curvefile)
```
For this example the file [here](./fitting_data/parameters.xlsx_fit.xlsx) shows
the resulting spreadsheet. Note that the second sheet within it shows the
the resulting Excel spreadsheet. Note that the second sheet within it shows the
parameter estimates.

## General Workflow
A more detailed workflow that processes multiple SPR inputs, includes monovalent
fits, and systematically writes the output is presented in GIVE_LOCATION.
fits, and systematically writes output files for each fit can be downloaded
[here](./fitting_workflow/Fitting%20Examples.zip). This file contains three sub-folders.

1. [Experiments](./fitting_workflow/Fitting%20Examples/Experiments/) contains a
set of CSVs corresponding to processed SPR experiments for fitting.
2. [Code](./fitting_workflow/Fitting%20Examples/Code/) contains a
[readme](./fitting_workflow/Fitting%20Examples/Code/readme.md) file with
instructions on how to use/modify the
[ParameterFitting_Example.jl](./fitting_workflow/Fitting%20Examples/Code/ParameterFitting_Example.jl)
script to fit a collection of experiments.
3. [Surrogates](./fitting_workflow/Fitting%20Examples/Surrogates/) is where you
should place the downloaded surrogate from the manuscript, which is available
[here](https://doi.org/10.6084/m9.figshare.26936854) (or whatever surrogate
you wish to use).

## Bibliography
1. A. Huhn, D. Nissley, ..., C. M. Deane, S. A. Isaacson, and O. Dushek,
Expand Down
Binary file added docs/src/fitting_workflow/Fitting Examples.zip
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1,204 @@
# LIBRARIES
using SPRFitting # Custom library for SPR fitting procedures
using Plots
using DataFrames, XLSX, CSV, DelimitedFiles


########################################################################################################################
########################################################################################################################
######################## INPUT ########################

experiment_name = "2024_Example_SPR_curves" #Folder name containing all the SPR files
# all files within the folder will be fit.
println(experiment_name)

# the algorithm obtains antigen concentration from file name, and antibody concentration from header in SPR file.
# it is therefore crutial to stick to the filenaming standard.
# filename: Data_FC[1-4]_[Date]_Protein[No]_[AB name]_Ligand-[AG name]-[AG conc]_aligned.csv
# Example filename: Data_FC2_260224_Protein01_FD-11A_Ligand-RBD-4.44_aligned.csv

# Surrogate
LUTname = # Surrogate file name (LUT = Look up table)
LUTfilename = "surrogate_high_and_low"


# the algorithm obtains lower and upper bounds for bivalent model.
logCP_optrange = (1.0, 5.0) # CP range is not specified in LUT and needs to be set extra
optpar_ranges = [logCP_optrange]
# Parameter bounds can also be set manually, if you want to restrict the parameter search space
# optpar_ranges = [(0,1), (1,2), (2,3), (3,4), (4,5)] #[(p1_min,p1_max),(p2_min,p2_max),(p3_min,p3_max),(Lmin,Lmax),(CP_min,CP_max)]


# Fitting
nfits = 100 # how many fits to run, the fit with the lowest fitness score is selected.
nsims = 100 # number of simulations to use when plotting
save_curves = true
visualise = true


# monovalent fitting, set mono_optimiser=nothing if not desired
# a monovalent model is fit once to determine the quality of the bivalent model fit
lb = [-8.0, -8.0, -8.0] # lower bounds on parameters in log space (kon,koff,CP)
ub = [8.0, 8.0, 8.0] # upper bounds on parameters in log space (kon,koff,CP)
mono_optimiser = default_mono_optimiser(lb, ub; solverkwargs = (abstol = 1e-8, reltol = 1e-8))

########################################################################################################################
########################################################################################################################

############ FUNCTIONS ############


function ensure_dir(dir)
# creats the path if it doesn't exist
if isdir(dir) == false
mkpath(dir)
end
end

function run_fits(nfits, surrogate, aligneddat, optpar_ranges)
# function to run fit and find the best fit parameters from nfits runs
# includes a loop, fit at least twice

# Arrays to store fitness scores and parameters
fitness = zeros(nfits)
physparams = zeros(nfits, 5)

# Perform the first fit
bbopt_output, best_pars = fit_spr_data(surrogate, aligneddat, optpar_ranges)
fitness[1] = bbopt_output.minimum
physparams[1,:] = best_pars

# Iterate over remaining fits
for i in 2:nfits
bbopt_output_new, best_pars_new = fit_spr_data(surrogate, aligneddat, optpar_ranges)
fitness[i] = bbopt_output_new.minimum
physparams[i,:] = best_pars_new
if bbopt_output_new.minimum < bbopt_output.minimum
bbopt_output = bbopt_output_new
best_pars = best_pars_new
end
end
return fitness, physparams, bbopt_output, best_pars
end


function saveParams_all_fits(params_array, fitness, filename)
# this saves the returned parameters from all fits, in case you want to see the variability in returned parameters
SavePara = zeros(Float64,length(fitness),6)
SavePara[:,1] .= fitness
SavePara[:,2:end] = params_array

# Create a DataFrame with the columns
df = DataFrame(SavePara,:auto)

Headers = [Symbol("Fitness"),
Symbol("kon"),
Symbol("koff"),
Symbol("konb"),
Symbol("Reach"),
Symbol("CP")]
# Rename the headers of each columns
rename!(df,Headers)

# Write the DataFrame to an xlsx file
XLSX.writetable(joinpath(OUTDIR, filename*"_Params.xlsx"),collect(eachcol(df)),names(df), overwrite=true)
end

function savebestParams(results_paramlist, filenames, expname)
# this function creates a datafile containing the best paramters for all SPR files in the experiment folder
df = DataFrame(fnames = filenames,
fitness = results_paramlist[:,1],
kon = results_paramlist[:,2],
koff = results_paramlist[:,3],
konb = results_paramlist[:,4],
reach = results_paramlist[:,5],
Cp = results_paramlist[:,6]
)
CSV.write(joinpath(OUTDIR, expname*"_BestParams.csv"), df)
end

############ RUN SCRIPT ############

# Directories
BASEDIR = splitdir(dirname(@__FILE__))[1]
EXPDIR = joinpath(BASEDIR,"Experiments", experiment_name)
RAWDIR = joinpath(EXPDIR, "Aligned")
OUTDIR = joinpath(EXPDIR, "Fitted")

#creating output directory
mkpath(OUTDIR)

#load surrogate
LUT_file = joinpath(BASEDIR,"Surrogates/" * LUTfilename * ".jld")
surrogate = Surrogate(LUT_file)
sps = surrogate.surpars

# Loop through files and do the fitting
not_hidden(fname::String) = fname[1] != '.' #excludes hidden files starting with .
SPRfiles = filter(not_hidden, readdir(RAWDIR))
println(SPRfiles)

resultcollection = zeros(length(SPRfiles), 6) # array collecting best paramters from all SPR files

for (fx, file) in enumerate(SPRfiles)
println("\n#####################\n", "Fitting file: ", fx, "/", length(SPRfiles), "\n", "File: ", file, "\n")

if occursin(r"^Data_", file) == true
filename = replace(replace(file,r"^Data_" => "" ), r"_aligned.csv$" => "")
println("Filename", filename, "\n")

fname = joinpath(RAWDIR, file)
aligneddat = get_aligned_data(joinpath(RAWDIR, file))

println("\nRunning fit: ", 1, "/", nfits)

fitness, physparams, bbopt_output, best_pars = run_fits(nfits, surrogate, aligneddat, optpar_ranges)

println("Best fit is: ")
@show best_pars

# save best result for all files
resultcollection[fx,1] = bbopt_output.minimum
resultcollection[fx,2:end] = best_pars

# for use with outputting so we don't modify the surrogate's parameters
simpars = deepcopy(surrogate.simpars)
simpars.nsims = nsims

# if we want to include a monovalent fit
if mono_optimiser !== nothing
# use the bivalent fits as our guess for the monovalent fit
# [kon, koff, CP]
u₀ = log10.( [best_pars[1], best_pars[2], best_pars[end]] )
monofit = monovalent_fit_spr_data(mono_optimiser, aligneddat, simpars.tstop_AtoB, u₀)
else
monofit = nothing
end



if visualise
print("saving plot...")
figfile = joinpath(OUTDIR, filename * "_fit_curves.png")
visualisefit(bbopt_output, aligneddat, surrogate, simpars, figfile)
if monofit !== nothing
figfile = joinpath(OUTDIR, filename * "_fit_curves_monovalent.png")
visualisefit(monofit, aligneddat, simpars.tstop_AtoB, figfile)
end
println("done")
end
if save_curves
print("saving spreadsheet and parameters...")
curvefile = joinpath(OUTDIR, filename)
savefit(bbopt_output, aligneddat, surrogate, simpars, curvefile; monofit)
saveParams_all_fits(physparams, fitness, filename)
println("done")
end


end
end

#saving the best parameters of all SPR files in a final data table
savebestParams(resultcollection, SPRfiles, experiment_name)

70 changes: 70 additions & 0 deletions docs/src/fitting_workflow/Fitting Examples/Code/readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# Skipt for running Particel model fit

This code automates the process of fitting Surface Plasmon Resonance (SPR) data using the bivalent Particle-based model.
It reads SPR experimental files, fits the data to extract kinetic parameters, and saves the results. The fitting process uses a surrogate model from a lookup table (LUT) and allows for the comparison of a monovalent model if desired.
The code is intended for batch processing, enabling the analysis of multiple SPR experiments efficiently.

## Requirements

Programming Language: Julia
Libraries: SPRFitting, Plots, DataFrames, XLSX, CSV, DelimitedFiles

## Input Files

The script processes aligned SPR curves. Column include: time, response of concentration 1, time, response of concentration 2, ...
The column headers have to be: time, [concentration 1], time, [concentration 2], ...
The data has been aligned to dissociation start, cut to 600 sec, and a few seconds at start and end of injection have been removed to remove artefacts due to injection.

### File Naming Convention:
The algorithm expects files to follow a specific naming format to extract concentration details:
Data_FC[1-4]_[Date]_Protein[No]_[AB name]_Ligand-[AG name]-[AG conc]_aligned.csv.

All files need to be stored in the following dicrectory hirarchy:
Experiment/[experiment name]/Aligned/


## User input

### Experiment identifiers
- Experiment Name (experiment_name): The name of the folder containing SPR files to be processed. All files in this folder will be analyzed.

### Fitting Configuration:
- Surrogate Model (LUTfilename): The name of the surrogate file (LUT) used for fitting. The file should be located in the "Surrogates" folder.
- Parameter Bounds (logCP_optrange and optpar_ranges) (optional): Define the search space for the parameters in log scale. The algorithm uses these bounds to optimize the fitting parameters. The algorithm can automatically extract the parameter bounds used for the surrogate model and use them for fitting.
- nfits: Number of fitting iterations to perform. The fit with the lowest fitness score is selected.
- nsims: Number of simulations to generate when visualizing fits.
- save_curves: Boolean flag whether to save the fitted curves.
- visualise: Boolean flag whether to visualize the fitting process.
- mono_optimiser: Optional monovalent model fitting configuration.

## Output

The algorithm creates a new folder "Fitted", where all output files are stored.
The following files are created:

### Best Fit:
[filename]_fit.xlsx
For each SPR file, the code saves the fitted curves (both for the bivalent and monovalent model fit) (Sheet 1) and the best-fitting parameters (Sheet 2) in an Excel file.
Parameters saved include fitness score, kon, koff, konb, reach, and CP.

## All Parameters
[filename]_Params.xlsx
For each SPR file, a CSV file containing the returned parameters from all nfits fitting iterations is saved. This can be used for quality control.

### Visualizations:
[filename]_curves.png /[filename]_curves_monovalent.png
If enabled, the code saves visualizations of the fitted curves as PNG files.
Both bivalent and monovalent fits (if applicable) are visualized.

### Best Fit Parameters:
[experiment_name]_BestParams.csv
A CSV file is created that compiles the best-fitting parameters from all SPR files analyzed.


## How to Run

Set the experiment_name to the folder containing your SPR files.
Specify the LUTfilename for the surrogate model.
Adjust the fitting configuration (e.g., nfits, nsims) as needed.
Ensure your SPR files follow the required naming convention.
Run the script through the terminal: julia ParameterFitting_Example.jl
Loading

0 comments on commit b667ff0

Please sign in to comment.