[Rivet] Histogram normalisation

Jonathan Butterworth jmb at hep.ucl.ac.uk
Tue Oct 20 11:14:36 BST 2009


Hi Frank et al,

This looks very reasonable to me in general. Couple of comments though:

- I'd be very wary of the the KFactor. I am particularly worried that 
people don't start applying multiple scale factors and losing track of 
what has been applied. I suggest that the default output is ALWAYS just 
use the "truth" (either xsec proportional or normalised by your Norm 
factot if applicable) and any other scaling is done later, transiently, 
with plotting tools. The KFactor could be stored and written out so it 
can be applied by the plotting tools if desired (?)

- We also discussing having a plotting tool which steps over various 
scale factors for a combined run and works out the optimal scale factor 
based on the Chi2 between data and MC. This could also (optionally) 
apply the KFactors. Is that still in the plan?

- How do plotting tools know whether a histogram has a cross-section 
(i.e. semi-floating) normalisation or is fixed? Is Norm=0 or some other 
special value for the xsec type histograms? Or is Norm just not written out?

Cheers,
Jon





Frank Siegert wrote:
> Following up from the discussion today, here is my understanding of our 
> conclusions re normalisation of histograms in the longer term. Please add 
> to it and correct me where I'm wrong.
> 
> (Note that all of the following only refers to distributions which are 
> proportional to the cross section, i.e. not profile histograms like 
> N_charged vs. pT(leading jet), where normalisation is not an issue.)
> 
> Rivet's written-out histogram files should never be normalised to a fixed 
> number, be that 1.0 or the integral of the reference histograms etc.
> Instead they should represent the actual cross section that went into the 
> histogram, which would currently be achieved by finalising with
> 
>   scale(hist, crossSection()/sumOfWeights());
> 
> If we agree on that, this should be automated, such that not each 
> finalize() method has to do it.
> If the reference data is normalised in a different way, then this should 
> be stored as extra information which is written out with the histo data. 
> E.g. Norm=1.0 or Scale=1.0/780.0 where 780.0 could be a number determined 
> during the event processing, like an inclusive XS.
> 
> Now when tuning with or plotting the histograms, at least two options 
> should be accomodated for all histograms that don't have a fixed 
> norm/scale stored as above:
> 1. Plot everything according to truth, without any scaling/kfactors. 
> That's simple.
> 2. Something like a leading-order mode, since many of the generators that 
> Rivet is used with are LO accuracy and usually only care about shapes of 
> distributions (because experiments normalise them to N(N)LO calculations 
> anyway). This is tricky, because you don't want to normalise every 
> histogram separately to data, but only introduce one scaling factor per 
> one event sample or analysis (?). My temporary solution to this has been 
> several lines like these in a make-plots.conf file:
> 
>   # pure QCD
>   .*aida/CDF_2006_S6450792/.*::Scale=1.7
>   .*aida/CDF_2007_S7057202/.*::Scale=1.7
>   .*aida/CDF_2008_S7828950/.*::Scale=1.7
>   .*aida/CDF_2008_S8093652/.*::Scale=1.7
>   .*aida/D0_2008_S7662670/.*::Scale=1.7
> 
> and this has worked quite well for me. We'd want this to be automated 
> though, so maybe we could introduce an additional bit of information for 
> each histogram called "KFactor" which would normally be set to 1.0, but if 
> an analysis thinks for a particular histogram that a kfactor would be 
> meaningful, it could calculate and store it as proper as possible. Of 
> course, this would not always scale each histogram up to data, because the 
> kfactor relates the total *inclusive* NLO/LO cross sections while most 
> histograms contain cross sections after significant cuts. As an example 
> consider a Z+jets analysis which plots histograms of pT(Z) and pT(3rd 
> jet). If you properly introduce a kfactor the fairly inclusive pT(Z) will 
> normally get scaled to data, but the pT(3rd jet) integral could be very 
> different from data if your Monte-Carlo is not able to describe a correct 
> ratio of z+3jet/zinclusive events. Such differences have to be preserved 
> in any case.
> So each analysis author has the option to provide a reasonable way to 
> normalise a histogram for use with LO Monte-Carlos.
> 
> Does that sound reasonable? Can we collect more different use cases from 
> actual analyses people have written to discuss this?
> 
> One more issue, which we haven't mentioned today: Eventually we want to 
> provide our plotting tools with the ability to merge output files from 
> separate independent runs (to increase statistics by running many jobs in 
> parallel e.g. on the grid). For this we need some more information stored 
> in the histogram files, namely the raw sum of weights in each bin 
> (+squared), don't we? And the number of entries in a histogram?
> If we agree on a Rivet-wide
> 
>   scale(hist, crossSection()/sumOfWeights());
> 
> the sum of weights in each bin could be skipped with just storing the 
> number above, but the squared ones are still needed for error estimating. 
> Just wanted to mention this while we discuss which information we store 
> with histograms.
> 
> Sorry for the long email, comments welcome.
> Frank
> 
> _______________________________________________
> Rivet mailing list
> Rivet at projects.hepforge.org
> http://www.hepforge.org/lists/listinfo/rivet

-- 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Prof. Jonathan Butterworth,              http://www.hep.ucl.ac.uk/~jmb/
Physics and Astronomy Department                  Tel: +44 20 7679 3444
ATLAS, CERN                                       Tel: +41 22 76  72340
University College London                 Gower St, London WC1E 6BT, UK
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


More information about the Rivet mailing list