In 2024, CMS collected around 50 PB of raw data and generated more than 75 PB of simulated data. Together with prompt processing, this amounts to more than 150 PB of new data. Using such an enormous amount of data for physics analysis directly is nearly impossible. Therefore, CMS derives smaller data representations containing just ~1% of the original information, which are used for physics measurements. However, the majority of the original data needs to be safely stored in case it is required to address issues with the data or to investigate unexpected physics results.
Keeping all this data on disk for many years is prohibitively expensive. Deleting it is not an option either, as it may still contain valuable and interesting information. High-energy physics experiments often generate new ideas and measurements using old data, sometimes even decades later. This is where the reliable, time-tested method of tape storage comes to the rescue.
Until recently, only major Tier-1 sites and CERN had tape storage available for data archival. At the last CMS Offline and Computing meeting, the MIT group presented its solution for Tier-2 tape storage, developed by Maxim Goncharov and Christoph Paus. This system utilizes the tape facility of the New England Storage Exchange, located in Holyoke. The solution provides cost-effective tape storage while allowing for tight integration with MIT Tier-2, which serves multiple experiments.

Initial test results were promising, and the project was successfully integrated into production. One limitation has been identified so far: the bandwidth is currently limited to 10 Gbit/sec. However, this issue is expected to be resolved in the near future. The success of the tape pilot project marks an important milestone in enhancing the capabilities of our site, which is set to become a major computing hub for the HL-LHC, the Run 4 phase of the Large Hadron Collider at CERN, expected to begin data collection in the early 2030s.