[Rivet] Histogram normalisation

Andy Buckley andy.buckley at ed.ac.uk
Tue Oct 20 16:51:20 BST 2009


Frank Siegert wrote:
> Hi Jon,
> 
> Thanks for the comments.
> 
> Jonathan Butterworth, Tuesday 20 October 2009:
>> - I'd be very wary of the the KFactor. I am particularly worried that
>> people don't start applying multiple scale factors and losing track of
>> what has been applied. I suggest that the default output is ALWAYS just
>> use the "truth" (either xsec proportional or normalised by your Norm
>> factot if applicable) and any other scaling is done later, transiently,
>> with plotting tools. The KFactor could be stored and written out so it
>> can be applied by the plotting tools if desired (?)
> 
> My proposal was meant to have truth output plus the KFactor=x.xx written 
> out. But thinking about the complications and dangers of this approach, I 
> agree: Let's drop the LO mode I proposed, and ...
> 
>> - We also discussing having a plotting tool which steps over various
>> scale factors for a combined run and works out the optimal scale factor
>> based on the Chi2 between data and MC. This could also (optionally)
>> apply the KFactors. Is that still in the plan?
> 
> ... replace it with this more general and easier to implement solution of 
> automatic KFactor finding.

Good: this was exactly what I was about to say. Sorry, the length of 
your mail made me delay reading it, Frank!

One further thing, which I'm not sure counts in your "dropped" KFactor 
proropsal: I don't see how we can automate the finalize steps without 
always getting ~half of the observables very wrong, so I think the 
finalize methods always need to be written as part of the analysis, just 
without any hard-coded cross-sections. This also makes sense for users 
like Herwig++, who are accessing Rivet as a library and presumably want 
whatever histograms are written out to be meaningful *before* 
post-processing scripts are applied (with any post-processing of their 
own presumably done via YODA's C++ interface.)

In the last week, we added HepMC cross-section filling to Pythia 6 via 
AGILe, and to Pythia 8: as far as I'm concerned, the remaining generator 
to cover is HERWIG via AGILe --- anyone know which common block variable 
to dig in for HERWIG cross-sections? (and whether it's still safe when 
JIMMY or AlpGen are used)

> In any case we just have to make sure, that histograms which already have 
> a Norm=x.xx or Scale=x.xx are ignored (or does anybody have an analysis 
> use case where a histogram is scaled with anything else than 
> crossSection()/sumOfWeights() and _still_ should have a kfactor?).

None that spring to mind, but never say never...

> And for all others the automatically determined kfactor should somehow be 
> plotted for each MC run in the histogram, maybe together with the legend, 
> or above the top edge?

Sure. Well, that's a plotting detail, but we can make sure in the YODA 
implementation that KFactor can be written as an annotation and used by 
anyone who wants to.

In terms of this Norm and Scale stuff, the motivation is presumably the 
run combination requirement? I.e. if we only ever wanted single runs to 
be used, then we'd continue to do the scaling (and hence conversion to a 
scatter-type data object) inside finalize, with the output containing no 
moments, just points and errors. So the Norm and Scale are really just 
details of how histogramming has to work if we want to be able to 
combine multiple runs... of course, someone will eventually try to 
combine two runs with different scaling targets, so we need to be 
careful about failure modes!

>> - How do plotting tools know whether a histogram has a cross-section
>> (i.e. semi-floating) normalisation or is fixed? Is Norm=0 or some other
>> special value for the xsec type histograms? Or is Norm just not written
>> out?
> 
> I would suggest Norm to not be written out in such a case.

If I understand the question, yes. We could also think about marking 
such histograms explicitly (or vice-versa, mark those subject to 
K-factor rescaling) rather than using covert channels or magic values.

Eike started work last week on centralising some of our histogramming 
mess, and has implemented a script for cutting out bin ranges to avoid 
normalisation biases in e.g. Nch plots where the generator is never 
going to reproduce the diffractive contribution at low-Nch but should be 
able to be fitted to the data for e.g. Nch > 10. We'll also be working 
on YODA and providing the floating norm optimisation script that Jon 
mentioned. I'll keep you posted.

Andy

-- 
Dr Andy Buckley
Particle Physics Experiment Group, University of Edinburgh


More information about the Rivet mailing list