Automated Parallel Calculation of Collaborative Statistical Models

Large scale statistical data analysis in particle physics

Image: CMS Doomsday at the CERN LHC by solarnu – https://www.flickr.com/photos/solarnu/2078532845

The recent discovery of the Higgs boson in 2012 by the ATLAS and CMS experiments at the Large Hadron Collider (LHC) at CERN, Geneva, is a prime example of the success of large scale statistical data analysis in particle physics. At the LHC approximately 10 Petabytes of data are recorded every year of data taking. The scientific goal of the examination of proton-proton collisions is to explore whether previously unseen particles are produced in these collisions, whose presence may be indicative of previous unconfirmed or unknown fundamental physics.

As decay products of the sought-after particles may decay in a multitude of ways, and are buried among hundreds of decay products collision, constructing proof of the existence of these particles requires an exhaustive analysis of collision data. The final statistical evidence combines the results of the analysis of dozens partial data samples that each isolate a signature of interest or measure an important background or nuisance parameter.

Collaborative statistical modelling

In recent years, the concept of collaborative statistical modelling has emerged, where detailed statistical models of measurements performed by independent teams of scientists are combined a posteriori without loss of detail. The preferred tool to do this, RooFit, allows to build probability models from expression trees of C++ objects that can be recursively composed into descriptive models of arbitrary complexity.

Computational performance is a limiting issue

With the emergence of more complex models, computational performance is now becoming a limiting issue. The work in this project aims to introduce eScience techniques to improve computational performance: vectorization and parallelization of calculations will lead to significant improvements in performance, while new structures to represent the combined data will simplify the process of building joint models for heterogeneous datasets.

Useable in lateral directions

With much improved scalability of computational efficiency the developed software can also become useable in lateral directions such as spectral CT image reconstruction.

Participating organisations

Netherlands eScience Center
NIKHEF
Natural Sciences & Engineering
Natural Sciences & Engineering

Impact

Output

Team

WV
Wouter Verkerke
Principal investigator
National Institute for Subatomic Physics
Patrick Bos
eScience Research Engineer
Netherlands eScience Center
Jisk Attema
Senior eScience Research Engineer
Netherlands eScience Center
Rena Bakhshi
Rena Bakhshi
Programme Manager
Netherlands eScience Center
Inti Pelupessy
Inti Pelupessy
Senior eScience Research Engineer
Netherlands eScience Center

Related projects

ROOFIT

Optimized parallel calculation of complex likelihood fits of LHC data

Updated 5 months ago
Finished

DarkGenerators

Interpretable large scale deep generative models for Dark Matter searches

Updated 1 month ago
Finished

Fast open source simulator of low-energy scattering of charged particles in matter

Transferring code to the larger community

Updated 19 months ago
Finished

iDark

The intelligent Dark Matter survey

Updated 19 months ago
Finished

Real-time detection of neutrinos from the distant Universe

Observing processes that are inaccessible to optical telescopes

Updated 19 months ago
Finished

Giving Pandas a ROOT to Chew on

Modern big data front and backends in the hunt for Dark Matter

Updated 19 months ago
Finished