4. Fitting

Overview

Spinmob provides a nonlinear least-squares fitter based on the lmfit package. It is capable of fitting arbitrary curves to an arbitrary number of data sets simultaneously while (optionally) providing an interactive visualization of the data, functions, and residuals. It also allows you to non-destructively pre-process the data (trim, coarsen), fix parameters, constrain parameters, and more.

The primary goal of this object, however, is to lower the barrier between your data and a research-grade fitting engine.

Basic Workflow

1. Import spinmob

In []: import spinmob as s

2. Create a fitter object, and define fit functions

First let's make a fitter object:

In []: f = s.data.fitter()

Let's inspect the object by typing its name:

In []: my_fitter
Out[]:

FITTER SETTINGS
  autoplot        True
  coarsen         [1]
  first_figure    0
  fpoints         [1000]
  plot_all_data   [False]
  plot_bg         [True]
  plot_errors     [True]
  plot_fit        [True]
  plot_guess      [True]
  plot_guess_zoom [False]
  scale_eydata    [1.0]
  silent          False
  style_bg        [{'marker': '', 'color': 'k', 'ls': '-'}]
  style_data      [{'marker': 'o', 'color': 'b', 'ls': '', 'mec': 'b'}]
  style_fit       [{'marker': '', 'color': 'r', 'ls': '-'}]
  style_guess     [{'marker': '', 'color': '0.25', 'ls': '-'}]
  subtract_bg     [False]
  xlabel          [None]
  xmax            [None]
  xmin            [None]
  xscale          ['linear']
  ylabel          [None]
  ymax            [None]
  ymin            [None]
  yscale          ['linear']

INPUT PARAMETERS

NO FIT RESULTS

Look at all those settings! You can change any of them using the set() function, e.g., f.set(ymin=32, plot_guess=False) and retrieve their values with dictionary-like indexing, e.g., f['ymin']. These settings will affect how the data is processed, and how it is visualized. As a shortcut, calling f() is identical to f.set(), so the same could be achieved with f(ymin=32, plot_guess=False).

Let's now specify a function we will use to fit some data:

In []: f.set_functions('a*x+cos(b*x)+c', 'a,b=2,c')

Here, the first argument is the function (in this case a string), the second is a comma-delimited list of fit parameters (also a string), with optional guess values. Function strings are exposed to all of numpy (e.g. cos and sqrt), and you can include your own dictionary of globals via set_functions()'s g argument (or simply as additional keyword arguments), i.e.,

In []: f.set_functions('a*x + cos(b*x)   + c', 'a,b=2,c')
In []: f.set_functions('a*x + pants(b*x) + c', 'a,b=2,c', pants=numpy.cos)
In []: f.set_functions('a*x + pants(b*x) + c', 'a,b=2,c', g={'pants':numpy.cos})

are all equivalent (provided you have imported numpy!). This is especially useful if you want to define your own arbitrarily complicated function with, e.g., for loops or something fancy-pants inside. This first argument can be a single string function (as shown above), or an actual python function, provided the first argument is the "x" variable, and the subsequent arguments are the fit parameters. Furthermore, you can specify a mixed list of such functions for simultaneously fitting multiple data sets.

Typing f again, we see the fitter's INPUT PARAMETERS state has changed to reflect the parameters we supplied:

INPUT PARAMETERS
  a = 1, vary=True min=-INF max=INF expr=None
  b = 2, vary=True min=-INF max=INF expr=None
  c = 1, vary=True min=-INF max=INF expr=None

NO FIT RESULTS

Notice the default guess value is always 1. You can always set guess values later using set() as above, e.g., f.set(c=12.2) or f(c=12.2) as a shortcut. Finally, you can also specify constant parameters and/or background functions (discussed below or see help(f.set_functions)). Finally, you will see a bunch more information: vary is whether the parameter is allowed to vary, min and max are the simple constraints on the parameter, and expr is the functional constraint. All of this can be accessed via the automatically generated Parameters object f.p_in. You can adjust them directly with commands such as f.p_in['a'].min=-12.2. See the link (or just play with it!) to learn more. expr looks particularly powerful because it allows you to constrain the fit parameters in very interesting ways.

3. Give the fitter some data.

Next we need to supply some data to the engine. For example

In []: f.set_data([1,2,3,4,5], [1,2,1,4,3], 0.7)

will set the xdata, ydata, and y-uncertainty. All three arguments can also be numpy arrays (Perhaps databox columns?) or lists of numpy arrays (one per functions). By default (autoplot=True), this command should pop up a plot of the data (blue) and guess (gray):

The way (or even whether!) these things are plotted can be changed using f.set() as described above. Note for multiple data sets, if you only supply one xdata set, it will assume the same xdata for each ydata set and vis versa. You can also "get away" with things like specifying single numbers or None. Try it and see what happens!

4. Fit.

Now that we have a function and data, we can fun the engine. That's a typo I decided to keep.

In []: f.fit()

This will perform the least-squares minimization, updating the plots and residuals:

You can see the results of the fit on the plot or by again inspecting the f object as we did in step 1, and note the last category has changed:

FIT RESULTS (reduced chi^2 = 2.9 +/- 1.0, 2 DOF)
  a = 0.33 +/- 0.39
  b = 1.47 +/- 0.23
  c = 1.2 +/- 1.3

There is the result! Or, at least some key information printed out. The reduced chi^2 in this case is 2.9, which is 1.9 standard deviations above the expected value of 1.

But... You do know the most likely value of reduced chi^2 for 2 degrees of freedom (DOF) is actually zero, right? You didn't? Well, you're certainly not alone (almost anywhere and at any level of science!). Here is the probability distribution for 2 DOF:

Not very strongly peaked at 1, is it. So is this a "good" fit? What does "good" mean? But I rant. The point is, the number after the +/- is in fact the standard deviation of the reduced chi^2 distribution, which always has a mean of 1, but which only starts to look like a Gaussian centered on 1 when DOF~100.

Importantly, the full fit results reside in f.results, which is a MinimizerResult object spit out by lmfit.minimize(). You can read more about these at those links, hopefully marveling at the comparative lack of effort we just put in, but I will list a few important properties to remember:

f.results.chisqr and f.results.redchi are the chi^2 and reduced chi^2 values.
f.results.params (identical to f.p_fit) is a Parameters object just like f.p_in, but with fit values and errors, e.g. f.p_fit['a'].value is the value 0.3325..., f.p_fit['a'].stderr is the standard error 0.393..., and f.p_fit.correl is a dictionary of correlations with the other parameters. You can also use these as numbers, e.g., f.p_fit['a']+f.p_fit['b'] produces 1.80...
f.results.covar is the covariance matrix.
Fitter also has some shortcut functions, such as f.get_fit_parameters() (returns an ordered list of parameters), f.get_fit_values() (returns a numpy array of values) and f.get_fit_standard errors() (returns a numpy array of standard errors).

Note that if you do not specify the y error bars (eydata) the fitter will make a completely arbitrary guess based on the max and min values of the supplied data. As a result, the reduced chi^2 will be meaningless, the parameter errors / correlations will be totally wrong, and your eyes will eventually cease to smile when your mouth tries to, because you are hollow inside. You should always estimate and justify your errors, because I am grading your reports. However, if you are not my student and you just want a dirty estimate of the error in the fit parameters, you can cheat a bit using f.autoscale_eydata_and_fit() a few times until your reduced chi^2 is 1. BE CAREFUL interpreting this. It takes a lot of effort to justify, basically amounting to estimating the error by sampling the data (which contains the science you are trying to fit!) itself, and assumes that your curve perfectly describes the data and that the fluctuations are totally random, Gaussian, and uncorrelated. Good luck explaining this on a final report, much less how all of your reduced chi^2 values happen to be exactly 1.

Background Functions

You can also specify "background functions" by setting the parameter bg just as you do for f. For example:

In []: f = s.data.fitter()
In []: f.set_functions(f='a/(1.0+(x-x0)**2/w**2) + b + c*x',
                       p='a=1, x0=1, w=1, b=0, c=0',
                       bg='b+c*x')
In []: f.set_data([1,2,3,4,5,6,7], [1,2,3.5,2,1,0,-1], 0.3).fit()

will produce this:

Notice both the guess and fit functions now display the specified background function, for reference. The background function is purely for visualization and has no effect on the fit. Specifying bg essentially amounts to specifying what part of the fit function is due to background.

If you like, you can specify that the background function be subtracted from the data prior to plotting by setting f(subtract_bg=False). The result is then:

Notice the fit background was subtracted from everything, including the guess (which is why the guess is skewed despite the guess c being 0). Prior to the fit, however, the guess background will be subtracted from everything.

Advanced Function Fitting

Sometimes it is difficult to include the fit function model into a string. Instead of inputing a string to the fitter, you can also input an arbitrarily complicated function. For example, suppose we have the function:

def loud_cos(x, a, b, c):
    """
    Loud cosine with an offset, 'a*x*cos(b*x)+c'. 
    """

    # Make some noise (or do crazy calculations)
    for n in range(10): print("Yay!")
    
    # Return some cosine
    return a * x * np.cos(b*x) + c

You can then set the f parameter of the fitter to this function directly:

f = s.data.fitter()
f.set_functions(f=loud_cos, p='a,b,c')

Note this technique requires that the function takes the "x" data as the first parameter, and the fit parameters as the rest, and that the user-specified parameter list match the order of the parameter list in the function. Alternatively (my preferred method), you can use a globally defined function within the usual function string, e.g.:

f = s.data.fitter()
f.set_functions(f='d+y(x,a,b,c)**2', p='d,b,c,a', y=loud_cos)

In this case, you must also specify what y means (the final argument) so the fitter knows what these objects actually are. Here I used y in the function string, and specified that y is loud_cos so the fitter will know what to do with y when it sees it. You can send as many special objects as you like using additional keyword arguments. (For older versions of spinmob, specify a "globals dictionar", as the keyword argument `g=dict(y=loud_cos)'.)

Multiple Data Sets

You can also specify multiple data sets and a function to fit each using the same set of parameters. You accomplish this by sending a list of functions and a list of data sets, for example:

f = s.data.fitter()
f.set_functions(['a*sin(x)+b', 'a+b*x+c**2'], p='a,b,c')
f.set_data([[1,2,3,4,5], [1,2,3,4,5]], [[1,3,1,4,1], [1,5,1,5,1]], 1.2)

In this case, a figure for each data set should pop up, and you can then proceed as normal. The default arguments for s.data.fitter() and f.set_data() have another an example of this.

Handy Functions

These are some of the more commonly used functions when fitting (assumes your fitter object is called f).

f.set() or f() - allows you to change one or more parameters or settings
f.ginput() - click the specified figure to get x,y values (great for guessing!)
f.trim() - keeps only the currently visible data for fitting (use the zoom button on the graph!)
f.untrim() - removes trim.
f.zoom() - zooms out or in (trimming)
f.fix() - fix one or more of the parameters
s.tweaks.ubertidy() - auto-formats figure for Inkscape (use the save button on the graph and save in 'svg' format)
f(a=11, b=2, plot_guess=False) - change some guess values and settings (disabling the guess plot)
f(coarsen=3, subtract_bg=[True, False]) - coarsen the data (by averaging every 4 data points) prior to fitting (coarsen=0 means "no coarsening"), and also subtract the background function from only the first data set.

Note when changing a setting, specifying a list allows you to change settings for each plot, and specifying just a value will change the setting for all plots.

Most of the settings are self-explanatory (just try them!), but my favorites are

coarsen - how much to bin/average the data prior to fitting (0 means "no coarsening")
xmin, xmax - x-range over which we should actually fit the data

Enjoy!

Up next: 5. Settings, Dialogs, Printing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly