Python and ROOT Tricks: efficiency graphs

You are asked to produce a graph showing the efficiency of selecting a sample as a function of a variable. Let’s called that variable x. How do you do this with PyROOT, and how do you make sure (most importantly) that the errors associated with each point in the efficiency graph are correct?

The underlying trap is a simple one. You are immediately tempted to produce two histograms showing the distribution of variable x. One of them shows the distribution before your selection, and the other shows the distribution after your selection. You then use the native ROOT “TH1::Divide()” method of the histogram class to make the efficiency graph. For instance:

h_before_selection = ROOT.TH1F("h_before_selection","",100,0,10)
h_before_selection.Sumw2()
h_after_selection = ROOT.TH1F("h_after_selection","",100,0,10)
h_after_selection.Sumw2()
# do something that fills the histograms with the right events
# ... more code here ...
# now divide them to make an efficiency histogram
h_efficiency = h_after_selection
h_efficiency.Divide(h_before_selection)

So, does this work? Well, it will certainly produce a histogram showing the efficiency as a function of the variable. But what about the errors? They will be very wrong.

Why are the errors incorrect? When you call the default TH1::Divide method, it assumes the data in the two histograms are uncorrelated. But we know that the histogram after selection contains a subset of the events in the histogram before selection, making them correlated. This is exactly the situation handled by binomial errors, where you have two categories into which events can be placed – selected or unselected – and those errors define an exact prescription for correct computation.

ROOT provides a version of TH1::Divide that does this treatment correctly. Here is how you do it:

h_efficiency = h_after_selection
h_efficiency.Divide(h_after_selection,h_before_selection,1.0,1.0,"B")

This divides the two histograms (multiplying each of them first by a factor of 1.0) and computes the binomial errors in each bin of the histogram.

Now, there is one thing here that might annoy some people: when a bin retains all events from prior to the selection, the efficiency is 100% and the binomial prescription yields an error of 0% on that bin. However, some of you may wish to take into account the fact that you know that bin contains a finite statistics and there is some uncertainty on the efficiency as a result. How do you handle this?

The TGraphAsymmErrors class comes to the rescue. It defines a Divide() method (in the very latest version of ROOT; in versions prior to 5.28, you have to call TGraphAsymmErrors::BayesDivide()) that will take into account the finite statistics of a bin with 100% efficiency, using a Bayesian Prior to compute a lower error (the upper error is still 0%, of course). To use this:

g_efficiency = ROOT.TGraphAsymmErrors()
g_efficiency.Divide(h_after_selection,h_before_selection,"cl=0.683 b(1,1) mode")

For versions of ROOT prior to 5.28, use instead:

g_efficiency.BayesDivide(h_after_selection,h_before_selection)

There are more options available for the TGraphErrors::Divide method. Check out the class documentation for more information, and enjoy your new-found prowess with histograms and errors!

References:

http://root.cern.ch/root/html/src/TGraphAsymmErrors.cxx.html#CBi_IC

http://root.cern.ch/root/html/src/TGraphAsymmErrors.cxx.html#G9MXkE

http://root.cern.ch/root/html/TH1.html#TH1:Divide%1

Leave a Reply