The LHC program for this year will be concluded with one month of Heavy Ion (PbPb) collisions. It enables us to study the properties of matter in extreme conditions (a trillion times denser than the core of the sun) and our MIT colleagues from the Relativistic Heavy Ion group are leading the charge. Given the unique and short time of data taking, the physics program is maximized and pushes CMS to its limits.
The PPC is responsible in DAQ for the Storage Manager (SM) and in Offline Computing for the Tier-0, which means for the final data handling at P5 where the experiment is located and the initial phase of the data reconstruction and storage in the offline world at CERN. The SM is a large buffer, storing the compressed raw detector data and sending it to Tier-0, which in turn is responsible for converting the raw data into the CMS data format, its safe storage, and performing prompt event reconstruction. Because of the complex nature of these events (typically the number of particles is 10-100 times higher than for proton-proton collisions), the reconstruction is demanding from a computational point of view. Together with the high data rates from the detector, on average 10 GB/s and in peaks even over a new record of 20 GB/s, this makes for a challenging task at the Tier-0.
Before the start of data taking, several preparatory tests were performed to estimate capacity and resources. Based on these findings, we estimated that for CMS it is possible to perform the prompt reconstruction for all incoming events simultaneously but only by also allocating resources outside of CERN, which has never been done before. We make use of extra resource capacities from the European Tier-1 sites and the old High-Level Trigger (HLT) computing farm at P5 (used during Run-2), both individually estimated to yield 20,000 CPUs.
Data taking started in early October, and the Figure above shows the CPU resources used by Tier-0 over time for the three main resource pools (Tier-2 CERN, old HLT CERN, and Tier-1 Europe). The resources follow mostly the uptime of the LHC, i.e. gaps with reduced resources are due to periods without beam. During periods with intense collisions, Tier-0 managed to run with more than 100,000 CPUs, an absolute record. The resources are optimally shared between the three main pools, with the European Tier-1 sites reaching 40,000 CPUs, almost double what was anticipated. We hope to continue to take data under these conditions for the next weeks to make this Heavy Ion run a success for CMS.