-
Notifications
You must be signed in to change notification settings - Fork 12
4. Fitting
Spinmob provides a nonlinear least-squares fitter based on the lmfit package. It is capable of fitting arbitrary curves to an arbitrary number of data sets simultaneously while (optionally) providing an interactive visualization of the data, functions, and residuals. It also allows you to non-destructively pre-process the data (trim, coarsen), fix parameters, constrain parameters, and more.
The primary goal of this object, however, is to lower the barrier between your data and a research-grade fitting engine.
In []: import spinmob as s
First let's make a fitter object:
In []: f = s.data.fitter()
Let's inspect the object by typing its name:
In []: my_fitter
Out[]:
FITTER SETTINGS
autoplot True
coarsen [1]
first_figure 0
fpoints [1000]
plot_all_data [False]
plot_bg [True]
plot_errors [True]
plot_fit [True]
plot_guess [True]
plot_guess_zoom [False]
scale_eydata [1.0]
silent False
style_bg [{'marker': '', 'color': 'k', 'ls': '-'}]
style_data [{'marker': 'o', 'color': 'b', 'ls': '', 'mec': 'b'}]
style_fit [{'marker': '', 'color': 'r', 'ls': '-'}]
style_guess [{'marker': '', 'color': '0.25', 'ls': '-'}]
subtract_bg [False]
xlabel [None]
xmax [None]
xmin [None]
xscale ['linear']
ylabel [None]
ymax [None]
ymin [None]
yscale ['linear']
INPUT PARAMETERS
NO FIT RESULTS
Look at all those settings! You can change any of them using the set()
function, e.g., f.set(ymin=32, plot_guess=False)
and retrieve their values with dictionary-like indexing, e.g., f['ymin']
. These settings will affect how the data is processed, and how it is visualized. As a shortcut, calling f()
is identical to f.set()
, so the same could be achieved with f(ymin=32, plot_guess=False)
.
Let's now specify a function we will use to fit some data:
In []: f.set_functions('a*x+cos(b*x)+c', 'a,b=2,c')
Here, the first argument is the function (in this case a string), the second is a comma-delimited list of fit parameters (also a string), with optional guess values. Function strings are exposed to all of numpy (e.g. cos
and sqrt
), and you can include your own dictionary of globals via set_functions()
's g
argument (or simply as additional keyword arguments), i.e.,
In []: f.set_functions('a*x + cos(b*x) + c', 'a,b=2,c')
In []: f.set_functions('a*x + pants(b*x) + c', 'a,b=2,c', pants=numpy.cos)
In []: f.set_functions('a*x + pants(b*x) + c', 'a,b=2,c', g={'pants':numpy.cos})
are all equivalent (provided you have imported numpy
!). This is especially useful if you want to define your own arbitrarily complicated function with, e.g., for
loops or something fancy-pants inside. This first argument can be a single string function (as shown above), or an actual python function, provided the first argument is the "x" variable, and the subsequent arguments are the fit parameters. Furthermore, you can specify a mixed list of such functions for simultaneously fitting multiple data sets.
Typing f
again, we see the fitter's INPUT PARAMETERS
state has changed to reflect the parameters we supplied:
INPUT PARAMETERS
a = 1, vary=1 min=-INF max=INF expr=None
b = 2, vary=1 min=-INF max=INF expr=None
c = 1, vary=1 min=-INF max=INF expr=None
NO FIT RESULTS
Notice the default guess value is always 1
. You can always set guess values later using set()
as above, e.g., f.set(c=12.2)
or f(c=12.2)
as a shortcut. Finally, you can also specify constant parameters and/or background functions (discussed below or see help(f.set_functions)
).
Next we need to supply some data to the engine. For example
In []: f.set_data([1,2,3,4,5], [1,2,1,4,3], 0.7)
will set the xdata, ydata, and y-uncertainty. All three arguments can also be numpy arrays (Perhaps databox columns?) or lists of numpy arrays (one per functions). By default (autoplot=True
), this command should pop up a plot of the data (blue) and guess (gray):
The way (or even whether!) these things are plotted can be changed using f.set() as described above. Note for multiple data sets, if you only supply one xdata set, it will assume the same xdata for each ydata set and vis versa. You can also "get away" with things like specifying single numbers or None
. Try it and see what happens!
Now that we have a function and data, we can fun the engine. That's a typo I decided to keep.
In []: f.fit()
This will perform the least-squares minimization, updating the plots and residuals:
You can see the results of the fit on the plot or by again inspecting the f
object as we did in step 1, and note the last category has changed:
FIT RESULTS (reduced chi^2 = 2.9 +/- 1.0, 2 DOF)
a = 0.33 +/- 0.39
b = 1.47 +/- 0.23
c = 1.2 +/- 1.3
There is the result! Or, at least some key information printed out. The reduced chi^2 in this case is 2.9, which is 1.9 standard deviations above the expected value of 1.
But... You do know the most likely value of reduced chi^2 for 2 degrees of freedom (DOF) is actually zero, right? You didn't? Well, you're certainly not alone (almost anywhere and at any level of science!). Here is the probability distribution for 2 DOF:
Not very strongly peaked at 1, is it. So is this a "good" fit? What does "good" mean? But I rant. The point is, the number after the +/-
is in fact the standard deviation of the reduced chi^2 distribution, which always has a mean of 1, but which only starts to look like a Gaussian centered on 1 when DOF~100.
Importantly, the full fit results reside in f.results
, which is a MinimizerResult object spit out by lmfit.minimize(). You can read more about these at those links, hopefully marveling at the comparative lack of effort we just put in, but I will list a few important properties to remember:
-
f.results.chisqr
andf.results.redchi
are the chi^2 and reduced chi^2 values. -
f.results.params
(shortcut tof.p_fit
) is a Parameters object just likef.p_in
, but with fit values and errors, e.g.f.p_fit['a'].value
is the value0.3325...
,f.p_fit['a'].stderr
is the standard error0.393...
, andf.p_fit.correl
is a dictionary of correlations with the other parameters. You can also use these as numbers, e.g.,f.p_fit['a']+f.p_fit['b']
produces1.80...
-
f.covar
is the covariance matrix.
Note that if you do not specify the y error bars (eydata) the fitter will make a completely arbitrary guess based on the max and min values of the supplied data. As a result, the reduced chi^2 will be meaningless, the parameter errors / correlations will be totally wrong, and your eyes will eventually cease to smile when your mouth tries to, because you are hollow inside. You should always estimate and justify your errors, because I am grading your reports. However, if you are not my student and you just want a dirty estimate of the error in the fit parameters, you can cheat a bit using f.autoscale_eydata_and_fit() a few times until your reduced chi^2 is 1. BE CAREFUL interpreting this. It takes a lot of effort to justify, basically amounting to estimating the error by sampling the data (which contains the science you are trying to fit!) itself, and assumes that your curve perfectly describes the data and that the fluctuations are totally random, Gaussian, and uncorrelated. Good luck explaining this on a final report, much less how all of your reduced chi^2 values happen to be exactly 1.
You can also specify "background functions" by setting the parameter bg
just as you do for f
. For example:
In []: f = s.data.fitter()
In []: f.set_functions(f='a/(1.0+(x-x0)**2/w**2) + b + c*x',
p='a=1, x0=1, w=1, b=0, c=0',
bg='b+c*x')
In []: f.set_data([1,2,3,4,5,6,7], [1,2,3.5,2,1,0,-1], 0.3).fit()
will produce this:
Notice both the guess and fit functions now display the specified background function, for reference. The background function is purely for visualization and has no effect on the fit. Specifying bg
essentially amounts to specifying what part of the fit function is due to background.
If you like, you can specify that the background function be subtracted from the data prior to plotting by setting f(subtract_bg=False)
. The result is then:
Notice the fit background was subtracted from everything, including the guess (which is why the guess is skewed despite the guess c
being 0
). Prior to the fit, however, the guess background will be subtracted from everything.
Sometimes it is difficult to include the fit function model into a string. Instead of inputing a string to the fitter, you can also input an arbitrarily complicated function. For example, suppose we have the function:
def loud_cos(x, a, b, c):
"""
Loud cosine with an offset, 'a*x*cos(b*x)+c'.
"""
# Make some noise (or do crazy calculations)
for n in range(10): print("Yay!")
# Return some cosine
return a * x * np.cos(b*x) + c
You can then set the f
parameter of the fitter to this function directly:
f = s.data.fitter()
f.set_functions(f=loud_cos, p='a,b,c')
Note this technique requires that the function takes the "x" data as the first parameter, and the fit parameters as the rest, and that the user-specified parameter list match the order of the parameter list in the function. Alternatively (my preferred method), you can use a globally defined function within the usual function string, e.g.:
f = s.data.fitter()
f.set_functions(f='d+y(x,a,b,c)**2', p='d,b,c,a', y=loud_cos)
In this case, you must also specify what y
means (the final argument) so the fitter knows what these objects actually are. Here I used y
in the function string, and specified that y
is loud_cos
so the fitter will know what to do with y
when it sees it. You can send as many special objects as you like using additional keyword arguments. (For older versions of spinmob, specify a "globals dictionar", as the keyword argument `g=dict(y=loud_cos)'.)
You can also specify multiple data sets and a function to fit each using the same set of parameters. You accomplish this by sending a list of functions and a list of data sets, for example:
f = s.data.fitter()
f.set_functions(['a*sin(x)+b', 'a+b*x+c**2'], p='a,b,c')
f.set_data([[1,2,3,4,5], [1,2,3,4,5]], [[1,3,1,4,1], [1,5,1,5,1]], 1.2)
In this case, a figure for each data set should pop up, and you can then proceed as normal. The default arguments for s.data.fitter()
and f.set_data()
have another an example of this.
These are some of the more commonly used functions when fitting (assumes your fitter object is called f
).
-
f.set()
orf()
- allows you to change one or more parameters or settings -
f.ginput()
- click the specified figure to get x,y values (great for guessing!) -
f.trim()
- keeps only the currently visible data for fitting (use the zoom button on the graph!) -
f.untrim()
- removes trim. -
f.zoom()
- zooms out or in (trimming) -
f.fix()
- fix one or more of the parameters -
s.tweaks.ubertidy()
- auto-formats figure for Inkscape (use the save button on the graph and save in 'svg' format) -
f(a=11, b=2, plot_guess=False)
- change some guess values and settings (disabling the guess plot) -
f(coarsen=3, subtract_bg=[True, False])
- coarsen the data (by averaging every 4 data points) prior to fitting (```coarsen=0''' means "no coarsening"), and also subtract the background function from only the first data set.
Note when changing a setting, specifying a list allows you to change settings for each plot, and specifying just a value will change the setting for all plots.
Most of the settings are self-explanatory (just try them!), but my favorites are
- coarsen - how much to bin/average the data prior to fitting (0 means "no coarsening")
- xmin, xmax - x-range over which we should actually fit the data
Enjoy!
Up next: 5. Settings, Dialogs, Printing