Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a lesson on how to use RooFit. #128

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,8 @@
----

* [Self guided lessons](self-guided-lessons/README.md)
* [Getting started with RooFit](self-guided-lessons/intro_roofit.md)
* [Fitting with RooFit](self-guided-lessons/fitting_roofit.md)

----

Expand Down
Binary file added self-guided-lessons/C_unbinned.pdf
Binary file not shown.
Binary file added self-guided-lessons/C_unbinned.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
203 changes: 203 additions & 0 deletions self-guided-lessons/fitting_roofit.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,203 @@
# Fitting with RooFit

{% objectives "Learning Objectives" %}
* Learn how to generate data.
* Learn how to fit using RooFit.
* Learn a few complex tools.
{% endobjectives %}

In the next few paragraphs, we will present how to fit some data composed of a signal component and a background component.
Before fitting, we should ask ourselves the following questions:

- What should our signal look like?
- What does our signal really look like? Do we have to include a resolution/acceptance to our PDF?
- What does our background look like?

After having answered these points, we can start writing our fitting code.

Generally, data is imported from a ROOT file and the fits are carried on from variables of the corresponding tree. In the following example we will randomly generate data so that the code can be run without any external input.
In this tutorial, we generated two different data samples. One is binned and the other one unbinned, and the combined unbinned data sample is fitted.

[`RooDataHist`](https://root.cern.ch/doc/master/classRooDataHist.html) and [`RooDataSets`](https://root.cern.ch/root/html516/RooDataSet.html) are RooFit data storage containers. RooDataSet is used for unbinned datasets while RooDataHist is the binned equivalent.

{% callout "What's binned? What's unbinned?" %}
Binned means that the number of events is counted in each bin and the fits are carried on using this information. This implies that the result of the fit will depend on the number of bins that is chosen. In the case of an unbinned dataset, each event position is taken into account and the fit results will not depend on a given number of bins. Using RooDataHist will often result much faster. Imagine performing an unbinned fit using tens of millions of events, it can sometimes take hours!
{% endcallout %}

`RooDataHist` and `RooDataSet` are declared in very similar ways, except that RooDataHist is built from a TH (TH1, TH2, etc…, the binned ROOT histograms) while RooDataSet is built directly from variables of a TTree.
There are various ways to build these two objects (that you can find in various examples on the web) and we will show one possibility.

````
RooDataSet data("Imported_variable", "Imported_variable", read_tree, mass);
RooDataHist signal ("signal_curve", "signal ditribution", mass, histo_signal);
````

with `mass` being a `RooRealVar`.

RooFit also offers the possibility to generate data both in a binned and unbinned fashion. This is a very useful tool to test fit models without having to load millions of events from a Tree. This data should follow a statistical model (gaussian, exponential, etc…) in the form of a RooAbsPdf.

````
RooGaussian gaus ("RooFit_Gaussian", "RooFit_Gaussian", mass, mean, sigma);
RooExponential expo("RooFit_exponential", "RooFit_exponential", mass, slope);
````

where `mass`, `mean`, `sigma` and `slope` are `RooRealVar` objects.
Here the gaussian will correspond to a fit to the signal and the exponential to a fit to the background.

The binned and unbinned datasets can then be creating using the generate and generateBinned methods. We generate 10000 events:

````
RooDataSet* d = gaus.generate(mass, 10000);
RooDataHist* d_binned = gaus.generateBinned(mass, 10000);
````

We can then create a model that sums the signal and background components using the `RooAddPdf` method. It is possible to specify both signal and background abundances, or only the relative fraction (as in this case). In this case, we want the relative signal abundance to be at the level of 80%:

````
RooRealVar N_f("N_f", "Signal relative abundance", 0.8);
RooAddPdf model1("Total_model", "Total generation model", gaus, expo, N_f);
````

We can now have a customised `RooDataSet` that mixes signal and background and save the `RooDataSet` as a ROOT Object.

````
RooDataSet* d = model1.generate(mass, 10000);
const TTree* write_tree = d -> tree();
write_tree -> Write();
````

Now let's fit! `

RooFit allows us to perform complicated fits. It is built around solid statistical tools. Its default method uses a log likelihood minimisation thanks to [Minuit](https://seal.web.cern.ch/seal/snapshot/work-packages/mathlibs/minuit/) (a very old piece of code that is still used today).
The fit will be binned if you use RooDataHist and unbinned if you choose RooDataSet.
To fit, it is quite easy and it can be done in different ways:

- Fit directly to data, using the following method which return a [`RooFitResult`](https://root.cern.ch/doc/v612/classRooFitResult.html) object:

{% callout "Output of the fits" %}
It is very important to take time to read the output of the fit that appears on the terminal. A fit can sometimes be displayed eventhough it does not converge.
Messages like "err matrix not pos-def", "no error matrix", or "p.d.f value is less than zero (-0.000000), forcing value to zero" clearly show that something is wrong!
{% endcallout %}

````
RooFitResult* result_unbinned = model_unbinned.fitTo(data, Save(True)); in this case Minuit is called by the function fitTo
````

- Write a chi2 or a maximum likelihood function and then call Minuit directly

````
RooChi2Var * chi2 = new RooChi2Var("chi2","chi2",*model_unbinned,*data,NumCPU(8), Extended(kTRUE)) ;
//where NumCPU(8) is the number of CPUs used in the Fit
//Extended(kTRUE) means that the fit is extended
RooMinuit m1(*chi2) ;
m1.setVerbose(kTRUE); //write info on log
m1.setPrintLevel(3); //write all the information on the development on the shell/log
// m1.setEps(1e-15);//set the precision to a known value
m1.setStrategy(1); //0, 1 or 2 depending on the function to minimise
m1.migrad(); //call migrad for first derivative
m1.hesse(); //call hesse for second derivative (more precise)
m1.save()->Print("v"); //print values
````

For more information, click [here](https://root.cern.ch/doc/master/classRooMinuit.html#a73477af6d519f8b91ab0538bee1fc2f8)



In more complicated fit where there are more than three parameters, it is good habit to collect them in a .txt file(`params_to_fit.txt`), in order to set them to their initial value in a more comfortable way. The set of parameters of the fit can be defined after the definition of the model as a `RooArgSet`.

````
RooArgSet* params = model_unbinned->getParameters(RooArgSet(mass)) ;
TString param_file_name = "params_to_fit.txt";
params->readFromFile( param_file_name );
````

The parameters file params_to_fit.txt has the form:

````
mean_guess = 1250 L (1225 - 1275)
sigma_guess = 100 L (50 - 200)
N_f = 0.8 C L (0.7 - 0.9)
````

where L and C stands for free and constraint respectively. In this way it is possible to set the parameters to their initial value or to free them. Following the fit, the parameters can be saved in another .txt file (or the same one, which could be updated):

````
params->writeToFile("params_after_fit.txt")
````

The parameters then become:

````
mean_guess = 1247.92 +/- 1.51723 L (1225 - 1275)
sigma_guess = 120.121e+02 +/- 1.34229 L(50 - 200)
N_f = 0.8 C L (0.7 - 0.9)
````

In order to plot the fit results data and PDFs are implemented in a [`RooPlot`](https://root.cern.ch/doc/master/classRooPlot.html).
First we have to create a `RooPlot`:

````
RooPlot* plot2 = mass.frame(RooFit::Title("Model"));
````

Then it is important to plot on the frame, adding all of the options we want, as `LineColor`, `LineStyle`, `Name`, etc.. The signal and background components have to be added one by one:

````
data.plotOn(plot2);
model_unbinned.plotOn(plot2);
model_unbinned.plotOn(plot2, RooFit::Components(sig_gaus), RooFit::LineColor(kRed));
model_unbinned.plotOn(plot2, RooFit::Components(bkg), RooFit::LineColor(kGreen));
````

Here is the corresponding plot:

[!["Rookeys_example"](C_unbinned.png)](C_unbinned.png)

In order to display the output parameters of our plot, we make use of the `paramOn` method:

````
model_unbinned->paramOn(plot2, Layout(0.6,0.89,0.89));
````

Moreover, it is sometimes useful to add a pull plot in order to have a clearer idea of the quality of the fit:

````
RooHist* hpull = plot2->pullHist() ;
hpull->SetFillColor(kBlue);
RooPlot* pulls_plot = mass.frame(Title("Pull Distribution")) ;
pulls_plot->addPlotable(hpull,"BX") ;
````

If the data set is binned the pull plot needs to have the same binning, so we need to add the option

````
Int_t nBins = mass.getBin();
````

When we're dealing with a pull plot, drawing it just under the main plot is very useful. In order to do this it is possible to use TPads, which have the following syntax:

````
TCanvas* c1 = new TCanvas("c1", "Plot");
TPad* upperPad = new TPad("upperPad", "upperPad", .005, .2525, .995, .995);
TPad* lowerPad = new TPad("lowerPad", "lowerPad", .005, .005, .995, .2475);
lowerPad->Draw();
upperPad->Draw();
upperPad->cd();
gPad->SetLogy(1);
plot2->SetMinimum(0.2);
plot2->Draw();
lowerPad->cd();
gPad->SetGrid();
pulls_plot->Draw("B");
````

The following code is a nice RooFit example and covers the various examples that we have covered: /afs/cern.ch/user/s/samarian/public/learn_RooFit.C

Sometimes we want to fit at simulateously multiple datasets. The [`RooSimultaneous`](https://root.cern.ch/doc/master/classRooSimultaneous.html) object is here to help us:

````
RooSimultaneous simPdf("simPdf","simultaneous pdf",sample) ;
````

A full example using `RooSimultaneous` is available at this [address](https://root.cern.ch/root/html/tutorials/roofit/rf501_simultaneouspdf.C.html).

138 changes: 138 additions & 0 deletions self-guided-lessons/intro_roofit.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
# Getting started with RooFit

{% objectives "Learning Objectives" %}
* Learn the basics of RooFit.
{% endobjectives %}

RooFit is built on top of ROOT and is used by particle physicists to fit their data. It can perform very complicated fits and is considered by many as being reliable. However, RooFit’s documentation is sparse and outdated and most of the important informations are often found by users on the ROOT forums. This introduction presents on a step-by-step basis various tools which can be used for our future analysis.

RooFit offers many features that are essential to CERN physicists:

- Plenty of pre-written Probability Density Functions (PDF, the functions used to fit our data): Gaussian, Exponential, Crystal Ball, Poissons etc… (Full list [here](https://github.com/root-project/root/tree/master/roofit/roofit/inc))
- Possibilty to build our own fit model.
- Creation of customised PDFs by adding, convoluting other PDFs.
- Fitting of multiple datasets simultaneously.
- Generation of Datasets, very useful to test our fit models.
- Efficient way to blind our data.

By having a recent version of ROOT, RooFit should already be installed. It can be used in Python and C++. The following pieces of codes are in C++.
At the beginning of our code .C, we should include RooFit libraries and then type in:

```
using namespace RooFit;
```

When doing data analysis, we deal with different object types:

- Data: which can be binned or unbinned.
- Variables, such as fit parameters, observables and derived variables.
- PDFs: necessary to define the shape to fit, which can be a simple one, such as a Gaussian, or a n-dimensional function.

One of the most common objects we deal with when using RooFit is called [`RooRealVar`](https://root.cern.ch/doc/master/classRooRealVar.html). It is a variable which corresponds to our data as well as our fit parameters.
A `RooRealVar` is defined using:

- A name
- A title
- An initial value
- A range (minimum and maximum values)
- A unit (MeV, mm, etc...)

and can be declared as follows:

```
RooRealVar y("y","a_constant_called_y" ,4.0); // fixing the value
RooRealVar x("x","x_has_range_only",-10,10); // giving only the range
RooRealVar z("z","range, initial_value_and_units",4.0,-10,10,"MeV/c^{2}"); // giving both its value and the range (and units)
```

It is sometimes useful to collect multiple `RooRealVar` in a list. To do this, we use [`RooArgList`](https://root.cern.ch/doc/master/classRooArgList.html) (an ordered set of variables) and [`RooArgsSet`](https://root.cern.ch/doc/master/classRooArgSet.html) (an unordered set of variables).

In order to add, substract (or whatever mathematical manipulation) multiple `RooRealVar` together, we can use the object [`RooFormulaVar`](https://root.cern.ch/doc/master/classRooFormulaVar.html). Starting from the variables a and b defined as:

```
RooRealVar a("a","a",1,0,5);
RooRealVar b("b","b",5,-10,20);
```

It becomes possible to define a new variable as follows:

```
RooFormulaVar name("name","title","formula",list_of_variables);
```

For instance, `delta` the difference between a and b:

````
RooFormulaVar delta("delta","delta","a-b",RooArgList(a,b));
````

It is also possible not to give explicitly the name of the variables in the `RooFormulaVar`, indexing them based on the order of the `RooArgList` @0, @1, ..., @n:

````
RooFormulaVar delta("delta","delta","@0-@1",RooArgList(a,b));
````

{% callout "Nota Bene" %}
It is a good habit to use the indices @0, @1, ..., @n based on the `RooArgList` instead of the `RooRealVar` name, because it is easier to debug and to reuse the same code for a different set of variables in the future`.
{% endcallout %}

It is possible to use `RooFormulaVar` as an input to the fit.

Since RooFit is designed to perform fits, there are plenty of ways to make a PDF. We can:

- Use pre-written PDFs (`RooGaussian`, `RooBreitWigner`, `RooExponential`, `RooCBShape`, etc...). These PDFs inherit from the class [ `RooAbsPdf`](https://root.cern.ch/doc/v608/classRooAbsPdf.html)

````
RooGaussian gaus("gaus","gaus",x,mean,width);
````

- Write our own PDF with [`RooGenericPdf`](https://root.cern.ch/doc/master/classRooGenericPdf.html), with the syntax

````
RooGenericPdf your_pdf_name("the_function", RooArgSet(parameters));
````

{% callout "Good Thing with RooFit" %}
There is no need to give a normalisation to our PDF since RooFit does it for us. If for some reason we can provide an analytical integral, the code will be faster to run.
{% endcallout %}

It is also possible to mix and match PDFs. RooFit lets us add, convolve, take products of PDFs to build models in a reasonably intuitive manner (`RooAddPdf`, `RooFFTConvPdf`, `RooProdPdf`, etc.), such as:

- Multiply a signal PDF `sig_tmp` with an efficiency `eff`:

````
RooEffProd sig("sig","sig",sig_tmp,eff);
````

- Add a signal PDF `sig` to a background PDF `bkg` through the [`RooAddPdf`](https://root.cern.ch/doc/master/classRooAddPdf.html) object:

````
RooAddPdf pdf_of_the_model("pdf_of_the_model", "pdf_of_the_model", RooArgSet(sig,bkg));
````

In the case of a multidimensional fit (with at least two uncorrelated `RooRealVar` being spaned by the fits), [`RooProfPdf`](https://root.cern.ch/doc/master/classRooProdPdf.html) can be used:

````
RooAbsPdf xy("xy","xy",model_1 ,model_2);
````

where `model_1` spans the x observable and `model_2` the y observable.

{% callout "Question about pointers" %}
Lots of RooFit coders declare RooFit objects with pointers for memory efficiency. For instance the last `RooAbsPdf` could be declared as

````
RooAbsPdf *xy = new RooAbsPdf("xy","xy",model_1 ,model_2);
````

{% endcallout %}

Another very useful tool in RooFit is the possibility to extract the PDF shape from data using [`RooHistPdf`](https://root.cern.ch/doc/master/classRooHistPdf.html) (for binned datasets) and [`RooKeysPdf`](https://root.cern.ch/doc/master/classRooKeysPdf.html) (for unbinned datasets). These PDFs are not fit methods and can only model the data distribution in an empirical way.
They are declared in the following way:

````
RooHistPdf histpdf("Binned Hist Pdf", "Binned Hist Pdf", x, data);
RooKeysPdf keysPdf("Unbinned Keys Pdf", "Unbinned Keys Pdf", x, data);
````

[!["Rookeys_example"](rookeys.png)](rookeys.png)
Binary file added self-guided-lessons/rookeys.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.