[Rivet] Histogram normalisation

Frank Siegert frank.siegert at durham.ac.uk
Tue Oct 13 23:45:41 BST 2009


Following up from the discussion today, here is my understanding of our 
conclusions re normalisation of histograms in the longer term. Please add 
to it and correct me where I'm wrong.

(Note that all of the following only refers to distributions which are 
proportional to the cross section, i.e. not profile histograms like 
N_charged vs. pT(leading jet), where normalisation is not an issue.)

Rivet's written-out histogram files should never be normalised to a fixed 
number, be that 1.0 or the integral of the reference histograms etc.
Instead they should represent the actual cross section that went into the 
histogram, which would currently be achieved by finalising with

  scale(hist, crossSection()/sumOfWeights());

If we agree on that, this should be automated, such that not each 
finalize() method has to do it.
If the reference data is normalised in a different way, then this should 
be stored as extra information which is written out with the histo data. 
E.g. Norm=1.0 or Scale=1.0/780.0 where 780.0 could be a number determined 
during the event processing, like an inclusive XS.

Now when tuning with or plotting the histograms, at least two options 
should be accomodated for all histograms that don't have a fixed 
norm/scale stored as above:
1. Plot everything according to truth, without any scaling/kfactors. 
That's simple.
2. Something like a leading-order mode, since many of the generators that 
Rivet is used with are LO accuracy and usually only care about shapes of 
distributions (because experiments normalise them to N(N)LO calculations 
anyway). This is tricky, because you don't want to normalise every 
histogram separately to data, but only introduce one scaling factor per 
one event sample or analysis (?). My temporary solution to this has been 
several lines like these in a make-plots.conf file:

  # pure QCD
  .*aida/CDF_2006_S6450792/.*::Scale=1.7
  .*aida/CDF_2007_S7057202/.*::Scale=1.7
  .*aida/CDF_2008_S7828950/.*::Scale=1.7
  .*aida/CDF_2008_S8093652/.*::Scale=1.7
  .*aida/D0_2008_S7662670/.*::Scale=1.7

and this has worked quite well for me. We'd want this to be automated 
though, so maybe we could introduce an additional bit of information for 
each histogram called "KFactor" which would normally be set to 1.0, but if 
an analysis thinks for a particular histogram that a kfactor would be 
meaningful, it could calculate and store it as proper as possible. Of 
course, this would not always scale each histogram up to data, because the 
kfactor relates the total *inclusive* NLO/LO cross sections while most 
histograms contain cross sections after significant cuts. As an example 
consider a Z+jets analysis which plots histograms of pT(Z) and pT(3rd 
jet). If you properly introduce a kfactor the fairly inclusive pT(Z) will 
normally get scaled to data, but the pT(3rd jet) integral could be very 
different from data if your Monte-Carlo is not able to describe a correct 
ratio of z+3jet/zinclusive events. Such differences have to be preserved 
in any case.
So each analysis author has the option to provide a reasonable way to 
normalise a histogram for use with LO Monte-Carlos.

Does that sound reasonable? Can we collect more different use cases from 
actual analyses people have written to discuss this?

One more issue, which we haven't mentioned today: Eventually we want to 
provide our plotting tools with the ability to merge output files from 
separate independent runs (to increase statistics by running many jobs in 
parallel e.g. on the grid). For this we need some more information stored 
in the histogram files, namely the raw sum of weights in each bin 
(+squared), don't we? And the number of entries in a histogram?
If we agree on a Rivet-wide

  scale(hist, crossSection()/sumOfWeights());

the sum of weights in each bin could be skipped with just storing the 
number above, but the squared ones are still needed for error estimating. 
Just wanted to mention this while we discuss which information we store 
with histograms.

Sorry for the long email, comments welcome.
Frank



More information about the Rivet mailing list