finish fitting tutorial

isaacsas · Sep 10, 2024 · b667ff0 · b667ff0
1 parent 5a240de
commit b667ff0
Show file tree

Hide file tree

Showing 11 changed files with 3,261 additions and 2 deletions.
diff --git a/docs/src/fitting.md b/docs/src/fitting.md
@@ -90,13 +90,25 @@ curvefile = joinpath(OUTDIR, "parameters.xlsx")
 savefit(optsol, aligneddat, surrogate, simpars, curvefile)
 ```
 For this example the file [here](./fitting_data/parameters.xlsx_fit.xlsx) shows
-the resulting spreadsheet. Note that the second sheet within it shows the
+the resulting Excel spreadsheet. Note that the second sheet within it shows the
 parameter estimates.
 
 ## General Workflow 
 A more detailed workflow that processes multiple SPR inputs, includes monovalent
-fits, and systematically writes the output is presented in GIVE_LOCATION.
+fits, and systematically writes output files for each fit can be downloaded
+[here](./fitting_workflow/Fitting%20Examples.zip). This file contains three sub-folders. 
 
+1. [Experiments](./fitting_workflow/Fitting%20Examples/Experiments/) contains a
+   set of CSVs corresponding to processed SPR experiments for fitting.
+2. [Code](./fitting_workflow/Fitting%20Examples/Code/) contains a
+   [readme](./fitting_workflow/Fitting%20Examples/Code/readme.md) file with
+   instructions on how to use/modify the
+   [ParameterFitting_Example.jl](./fitting_workflow/Fitting%20Examples/Code/ParameterFitting_Example.jl)
+   script to fit a collection of experiments.
+3. [Surrogates](./fitting_workflow/Fitting%20Examples/Surrogates/) is where you
+   should place the downloaded surrogate from the manuscript, which is available
+   [here](https://doi.org/10.6084/m9.figshare.26936854) (or whatever surrogate
+   you wish to use).
 
 ## Bibliography
 1. A. Huhn, D. Nissley, ..., C. M. Deane, S. A. Isaacson, and O. Dushek,

diff --git a/docs/src/fitting_workflow/Fitting Examples.zip b/docs/src/fitting_workflow/Fitting Examples.zip
diff --git a/docs/src/fitting_workflow/Fitting Examples/Code/ParameterFitting_Example.jl b/docs/src/fitting_workflow/Fitting Examples/Code/ParameterFitting_Example.jl
@@ -0,0 +1,204 @@
+# LIBRARIES
+using SPRFitting                    # Custom library for SPR fitting procedures
+using Plots
+using DataFrames, XLSX, CSV, DelimitedFiles
+
+
+########################################################################################################################
+########################################################################################################################
+######################## INPUT ######################## 
+
+experiment_name =  "2024_Example_SPR_curves" #Folder name containing all the SPR files
+# all files within the folder will be fit. 
+println(experiment_name)
+
+# the algorithm obtains antigen concentration from file name, and antibody concentration from header in SPR file. 
+# it is therefore crutial to stick to the filenaming standard. 
+# filename: Data_FC[1-4]_[Date]_Protein[No]_[AB name]_Ligand-[AG name]-[AG conc]_aligned.csv
+# Example filename: Data_FC2_260224_Protein01_FD-11A_Ligand-RBD-4.44_aligned.csv
+
+# Surrogate
+LUTname = # Surrogate file name (LUT = Look up table)
+LUTfilename = "surrogate_high_and_low"
+
+
+# the algorithm obtains lower and upper bounds for bivalent model. 
+logCP_optrange   = (1.0, 5.0) # CP range is not specified in LUT and needs to be set extra
+optpar_ranges = [logCP_optrange]
+# Parameter bounds can also be set manually, if you want to restrict the parameter search space
+# optpar_ranges = [(0,1), (1,2), (2,3), (3,4), (4,5)] #[(p1_min,p1_max),(p2_min,p2_max),(p3_min,p3_max),(Lmin,Lmax),(CP_min,CP_max)]
+
+
+# Fitting 
+nfits       = 100   # how many fits to run, the fit with the lowest fitness score is selected.
+nsims       = 100  # number of simulations to use when plotting
+save_curves = true
+visualise   = true
+
+
+# monovalent fitting, set mono_optimiser=nothing if not desired
+# a monovalent model is fit once to determine the quality of the bivalent model fit
+lb = [-8.0, -8.0, -8.0]   # lower bounds on parameters in log space (kon,koff,CP)
+ub = [8.0, 8.0, 8.0]      # upper bounds on parameters in log space (kon,koff,CP)
+mono_optimiser = default_mono_optimiser(lb, ub; solverkwargs = (abstol = 1e-8, reltol = 1e-8))
+
+########################################################################################################################
+########################################################################################################################
+
+############ FUNCTIONS ############
+
+
+function ensure_dir(dir)
+   # creats the path if it doesn't exist 
+    if isdir(dir) == false
+        mkpath(dir)
+    end
+end
+
+function run_fits(nfits, surrogate, aligneddat, optpar_ranges)
+    # function to run fit and find the best fit parameters from nfits runs
+    # includes a loop, fit at least twice
+
+    # Arrays to store fitness scores and parameters
+    fitness = zeros(nfits)
+    physparams = zeros(nfits, 5)
+
+    # Perform the first fit
+    bbopt_output, best_pars = fit_spr_data(surrogate, aligneddat, optpar_ranges)
+    fitness[1] = bbopt_output.minimum
+    physparams[1,:] = best_pars
+
+    # Iterate over remaining fits
+    for i in 2:nfits
+        bbopt_output_new, best_pars_new = fit_spr_data(surrogate, aligneddat, optpar_ranges)
+        fitness[i] = bbopt_output_new.minimum
+        physparams[i,:] = best_pars_new
+        if bbopt_output_new.minimum < bbopt_output.minimum
+            bbopt_output = bbopt_output_new
+            best_pars = best_pars_new
+        end
+    end
+    return fitness, physparams, bbopt_output, best_pars
+end
+
+
+function saveParams_all_fits(params_array, fitness, filename)
+    # this saves the returned parameters from all fits, in case you want to see the variability in returned parameters
+    SavePara = zeros(Float64,length(fitness),6)
+    SavePara[:,1] .= fitness
+    SavePara[:,2:end] = params_array
+
+    # Create a DataFrame with the columns
+    df = DataFrame(SavePara,:auto)
+
+    Headers = [Symbol("Fitness"), 
+               Symbol("kon"), 
+               Symbol("koff"),
+               Symbol("konb"),
+               Symbol("Reach"),
+               Symbol("CP")]
+    # Rename the headers of each columns
+    rename!(df,Headers)
+
+    # Write the DataFrame to an xlsx file
+    XLSX.writetable(joinpath(OUTDIR, filename*"_Params.xlsx"),collect(eachcol(df)),names(df), overwrite=true)
+end    
+
+function savebestParams(results_paramlist, filenames, expname)
+    # this function creates a datafile containing the best paramters for all SPR files in the experiment folder
+    df = DataFrame(fnames = filenames, 
+               fitness = results_paramlist[:,1],
+               kon = results_paramlist[:,2],
+               koff = results_paramlist[:,3],
+               konb = results_paramlist[:,4],
+               reach = results_paramlist[:,5],
+               Cp = results_paramlist[:,6]
+               )
+    CSV.write(joinpath(OUTDIR, expname*"_BestParams.csv"), df)
+end
+
+############ RUN SCRIPT ############
+
+# Directories 
+BASEDIR = splitdir(dirname(@__FILE__))[1]
+EXPDIR = joinpath(BASEDIR,"Experiments", experiment_name)
+RAWDIR = joinpath(EXPDIR, "Aligned") 
+OUTDIR = joinpath(EXPDIR, "Fitted")
+
+#creating output directory
+mkpath(OUTDIR)
+
+#load surrogate
+LUT_file = joinpath(BASEDIR,"Surrogates/" * LUTfilename * ".jld")
+surrogate = Surrogate(LUT_file)
+sps = surrogate.surpars
+
+# Loop through files and do the fitting 
+not_hidden(fname::String) = fname[1] != '.' #excludes hidden files starting with .
+SPRfiles = filter(not_hidden, readdir(RAWDIR))
+println(SPRfiles)
+
+resultcollection = zeros(length(SPRfiles), 6) # array collecting best paramters from all SPR files
+
+for (fx, file) in enumerate(SPRfiles)
+    println("\n#####################\n", "Fitting file: ", fx, "/", length(SPRfiles), "\n", "File: ", file, "\n")
+
+    if occursin(r"^Data_", file) == true
+        filename = replace(replace(file,r"^Data_" => "" ), r"_aligned.csv$" => "")
+        println("Filename", filename, "\n")
+
+        fname = joinpath(RAWDIR, file)
+        aligneddat = get_aligned_data(joinpath(RAWDIR, file))
+
+        println("\nRunning fit: ", 1, "/", nfits)
+
+        fitness, physparams, bbopt_output, best_pars = run_fits(nfits, surrogate, aligneddat, optpar_ranges)
+
+        println("Best fit is: ")
+        @show best_pars
+
+        # save best result for all files
+        resultcollection[fx,1] = bbopt_output.minimum
+        resultcollection[fx,2:end] = best_pars
+
+        # for use with outputting so we don't modify the surrogate's parameters
+        simpars = deepcopy(surrogate.simpars)
+        simpars.nsims = nsims
+
+        # if we want to include a monovalent fit
+        if mono_optimiser !== nothing
+            # use the bivalent fits as our guess for the monovalent fit
+            # [kon, koff, CP]
+            u₀ = log10.( [best_pars[1], best_pars[2], best_pars[end]] )
+            monofit = monovalent_fit_spr_data(mono_optimiser, aligneddat, simpars.tstop_AtoB, u₀)
+        else
+            monofit = nothing
+        end
+
+
+
+        if visualise
+            print("saving plot...")
+            figfile = joinpath(OUTDIR, filename * "_fit_curves.png")
+            visualisefit(bbopt_output, aligneddat, surrogate, simpars, figfile)
+            if monofit !== nothing
+                figfile = joinpath(OUTDIR, filename * "_fit_curves_monovalent.png")
+                visualisefit(monofit, aligneddat, simpars.tstop_AtoB, figfile)
+            end
+            println("done")
+        end
+        if save_curves
+            print("saving spreadsheet and parameters...")
+            curvefile = joinpath(OUTDIR, filename)
+            savefit(bbopt_output, aligneddat, surrogate, simpars, curvefile; monofit)
+            saveParams_all_fits(physparams, fitness, filename)
+            println("done")
+        end
+
+
+    end
+end
+
+#saving the best parameters of all SPR files in a final data table
+savebestParams(resultcollection, SPRfiles, experiment_name)
+
diff --git a/docs/src/fitting_workflow/Fitting Examples/Code/readme.md b/docs/src/fitting_workflow/Fitting Examples/Code/readme.md
@@ -0,0 +1,70 @@
+# Skipt for running Particel model fit
+
+This code automates the process of fitting Surface Plasmon Resonance (SPR) data using the bivalent Particle-based model. 
+It reads SPR experimental files, fits the data to extract kinetic parameters, and saves the results. The fitting process uses a surrogate model from a lookup table (LUT) and allows for the comparison of a monovalent model if desired. 
+The code is intended for batch processing, enabling the analysis of multiple SPR experiments efficiently.
+
+## Requirements
+
+Programming Language: Julia
+Libraries: SPRFitting, Plots, DataFrames, XLSX, CSV, DelimitedFiles
+
+## Input Files
+
+The script processes aligned SPR curves. Column include: time, response of concentration 1, time,  response of concentration 2, ... 
+The column headers have to be: time, [concentration 1], time, [concentration 2], ...
+The data has been aligned to dissociation start, cut to 600 sec, and a few seconds at start and end of injection have been removed to remove artefacts due to injection. 
+
+### File Naming Convention:
+The algorithm expects files to follow a specific naming format to extract concentration details:
+Data_FC[1-4]_[Date]_Protein[No]_[AB name]_Ligand-[AG name]-[AG conc]_aligned.csv.
+
+All files need to be stored in the following dicrectory hirarchy: 
+Experiment/[experiment name]/Aligned/
+
+
+## User input
+
+### Experiment identifiers
+- Experiment Name (experiment_name): The name of the folder containing SPR files to be processed. All files in this folder will be analyzed.
+
+### Fitting Configuration:
+- Surrogate Model (LUTfilename): The name of the surrogate file (LUT) used for fitting. The file should be located in the "Surrogates" folder.
+- Parameter Bounds (logCP_optrange and optpar_ranges) (optional): Define the search space for the parameters in log scale. The algorithm uses these bounds to optimize the fitting parameters. The algorithm can automatically extract the parameter bounds used for the surrogate model and use them for fitting. 
+- nfits: Number of fitting iterations to perform. The fit with the lowest fitness score is selected.
+- nsims: Number of simulations to generate when visualizing fits.
+- save_curves: Boolean flag whether to save the fitted curves.
+- visualise: Boolean flag whether to visualize the fitting process.
+- mono_optimiser: Optional monovalent model fitting configuration.
+
+## Output
+
+The algorithm creates a new folder "Fitted", where all output files are stored. 
+The following files are created:
+
+### Best Fit:
+[filename]_fit.xlsx
+For each SPR file, the code saves the fitted curves (both for the bivalent and monovalent model fit) (Sheet 1) and the best-fitting parameters (Sheet 2) in an Excel file.
+Parameters saved include fitness score, kon, koff, konb, reach, and CP.
+
+## All Parameters 
+[filename]_Params.xlsx
+For each SPR file, a CSV file containing the returned parameters from all nfits fitting iterations is saved. This can be used for quality control. 
+
+### Visualizations:
+[filename]_curves.png /[filename]_curves_monovalent.png
+If enabled, the code saves visualizations of the fitted curves as PNG files.
+Both bivalent and monovalent fits (if applicable) are visualized.
+
+### Best Fit Parameters:
+[experiment_name]_BestParams.csv
+A CSV file is created that compiles the best-fitting parameters from all SPR files analyzed.
+
+
+## How to Run
+
+Set the experiment_name to the folder containing your SPR files.
+Specify the LUTfilename for the surrogate model.
+Adjust the fitting configuration (e.g., nfits, nsims) as needed.
+Ensure your SPR files follow the required naming convention.
+Run the script through the terminal: julia ParameterFitting_Example.jl