Modern physics experiments are often complex, ambitious, and costly. The times where scientific progress could be made by conducting a small tabletop experiment in your lab are mostly over. Especially, in fields like astrophysics or particle physics, you need huge telescopes, expensive satellite missions, or giant colliders run by international collaborations with hundreds or thousands of participants. To drive this point home: the largest machine ever built by humankind is the Large Hadron Collider (LHC). You won’t be surprised to hear that even just managing the data it produces is a super-sized task.
Since its start in 2008, the LHC at CERN has received several upgrades to stay at the cutting edge of technology. Currently, the machine is in its second long shutdown and being prepared to restart in May 2021. One of the improvements of Run 3 will be to deliver particle collisions at a higher rate, quantified by the so-called luminosity. This enables experiments to gather more statistics and to better study rare processes. At the end of 2024, the LHC will be upgraded to the High-Luminosity LHC which will deliver an increased luminosity by up to a factor of 10 beyond the LHC’s original design value.
Currently, the major experiments ALICE, ATLAS, CMS, and LHCb are preparing themselves to cope with the expected data rates in the range of Terabytes per second. It is a perfect time to look into more detail at the data acquisition, storage, and analysis of modern high-energy physics experiments.
Major Upgrades for ALICE
The ALICE experiment is the oddball among the experiments, because it studies lead-lead collisions instead of proton-proton collisions. It also faces one of the greatest challenges for the upgrade because the observed rate of collisions will increase fifty-fold from 1 kHz to 50 kHz in the upcoming LHC run. With about half a million detector channels being read out at 5 MHz sampling rate this amounts to a ~3 TB/s continuous stream of data.
In order to cope with these numbers, the main detector of ALICE, a Time Projection Chamber (TPC), received 3,276 new front end electronic cards developed by the Oak Ridge National Laboratory. At the heart of the boards is a custom ASIC called SAMPA designed at the University of São Paulo. The SAMPA chip includes a charge sensitive amplifier, a 10-bit ADC, and a digital signal processing (DSP) unit. Doing DSP on the device can already reduce the data to ~1 TB/s through zero suppression.
Because the front-end electronics are located directly at the detector, they have to cope with high doses of radiation. Therefore, CERN started early to develop the GigaBit Transceiver (GBT) platform, a custom ASIC and data transfer protocol that provides a radiation-tolerant 4.8 Gbit/s optical link which is now used by several LHC experiments.
As shown in the schematic picture, data from the front-end electronics at ALICE is transferred to the Common Read-out Units (CRU) that serve as interfaces to the First-Level Processors (FLP). The FLPs are a farm of servers that perform a reduction of the data to ~ 500 GB/s by grouping the detector hits into clusters. The CRU boards are based on Altera Arria 10 GX FPGAs and use the commercial PCI-Express interface since it is cost-effective and widely available for server machines. The data merging and final data-volume reduction is performed by a second farm of computers: the Event Processing Nodes (EPN) reducing the data flow to about 90 GB/s which is then stored to disk.
The most time-consuming step during event processing is the reconstruction of particle trajectories. While other LHC experiments are still using regular multi-core CPUs for this task, ALICE is designing their tracking implementations to run on GPUs, which oﬀer signiﬁcantly more parallel computing power. A study showed that track-finding with an NVIDIA GTX 1080 is up to 40 times faster compared to an Intel i7-6700K processor. Interestingly, the tracking algorithm for the ALICE TPC is based on a cellular automaton.
Multi-Level Triggering, Machine Learning, and Quantum Computing
While the ALICE experiment will be able to continuously stream data from all Pb-Pb collisions happening at 50 kHz, the rate of proton-proton collisions at the other LHC detectors is as high as 40 MHz. Therefore, these experiments employ a multi-level triggering scheme that only reads out preselected events. The CMS experiment, for example, uses an FPGA-based level-1 trigger that can filter data within microseconds. The High-Level Trigger (HLT) at the second stage uses commercial CPUs to process the data by software where longer latencies on the timescale of milliseconds are allowed.
The HLT is responsible for track identification, a task that is currently moving towards the implementation of machine learning techniques. In 2018 CERN hosted the TrackML competition, challenging people to build a machine learning algorithm that quickly reconstructs particle tracks. An even more ambitious approach is pursued by the HEP.QPR project which is developing track-finding algorithms for quantum computers since these can potentially overcome the problem of combinatorial explosion. They already tested their algorithms on the TrackML dataset using D-Wave, a company that is offering cloud service for their quantum computers.
While a track finding algorithm can run comfortably on a CPU or GPU, the required low latency of level-1 triggers and limited hardware resources of FPGAs allows only very basic algorithms for data selection. More sophisticated algorithms could help to preserve potential interesting physics signatures that are currently lost. For this reason, researchers at Fermilab developed the hls4ml compiler package which translates machine learning models from common open-source software packages such as Keras and PyTorch into Register-Transfer Level (RTL) abstraction for FPGAs.
Data Analysis on the Grid
Machine learning is already extensively used in the offline analysis of stored data, in particular for particle identification. One example is b-tagging, which refers to the identification of events originating from bottom quarks that are important for new physics searches. Currently, the most efficient b-tagging algorithms are based on deep neural networks like the DeepCSV algorithm developed by CMS.
You might have heard of the Worldwide LHC Computing Grid (WLCG), where the data analysis is usually carried out. It consists of around 170 computing centers in more than 40 countries and totals about 1 million CPU cores and 1 Exabyte of storage. Perhaps less known is the fact that more than 50% of the WLCG workload is represented by Monte Carlo simulations. These “what-if” models of collisions are a crucial part of the data analysis and needed to optimize the event selection criteria.
Although many people are probably disappointed that the LHC did not yet lead to the discovery of new physics like supersymmetry there are still some persisting anomalies in the data. Even if the upcoming LHC upgrades will reveal these as merely statistical fluctuations, CERN will continue to be a driver for new technologies in the field of electronics, computing, and data science.