Archi Hackathon at the LPC@FNAL

The recent Large Language Model Hackathon at the LHC Physics Center (LPC) at Fermilab brought together physicists and computer scientists from across the CMS experiment to hack away at building agents for use in the Collaboration. Ideas are abundant as AI-enabled agents promise to affect all branches of high-energy physics: from accelerating experimental design, to automating experiments, to running agentic workflows for analyses. The challenge ahead is to master these emerging tools, and integrating them in our workflows, while maintaining the high level of rigor we already demand of our work.

The Archi framework has sought to provide a basis for the development of agents at CMS, by implementing an end-to-end modular pipeline for data collection and running of agents, tuned for high-energy physics use-cases.

The Archi repository at https://github.com/archi-physics/archi

The hackathon was maybe the last straw for the A2rchi team to rebrand and get ride of the much disputed ‘2’ in the name, to the leaner “Archi”. Together with the rebranding came a suite of new features: including a fully agentic pipeline, meaning the large-language models can freely interact with tools through a ReAct loop, as well as a cleaner, more interactive UI, and a complete re-write of the backend storage to Postgres. The hackathon saw several new and old deployments of Archi come together to collaborate:

  • CompOps1: an assistant for engineers working in computing operations
  • WisDQM2: an assistant for DQM shifters
  • A heavy-ions-tuned agent for expert analysis support
  • A CRAB3-expert for analysis support

The Computing Operations deployment is already in a prototype-mode for operators at CMS, and a more general release is planned for later in the month. The experience gained from this roll-out has been crucial to understand the opportunities and limitations of agents and our framework, and were often referenced in discussions throughout the hackathon.

Archi demo showed by the computing operations team at CMS during the hackathon

Many ideas came together as hackers discussed current issues and wish-lists, and the Archi-experts learned from physicists and engineers how agents could practically be useful in their day-to-day work. Many, many bugs were fixed in-between espressos, and many new features were developed over the week, including: a new document-uploader, a database UI, live-data tools for Rucio, roles for users, sandboxed code execution, and a more robust CI, which includes benchmarking against previous releases to study how changes in the agent affect the quality of the answers.

The hacking was set to the back-drop of the beautiful Wilson Hall at Fermilab, where we were kindly hosted by the LPC, who were perfect hosts, and showed us around the facilities.

Some of the hackers discussing (top left), at SQMS (top right), and with the Mu2E detector (bottom).

The hackathon can be concluded a success, as the team met each other in person for the first time, much work was done during the week to improve the framework, and we laid the work ahead for the next couple of months. Much is left to do, as the current framework seeks to expand to integrate better with CMS specific data sources, APIs, and knowledge bases to better serve CMS physicists and engineers.

We look forward to what is coming in the near future from this project, and to meeting again with the team at the next hackathon! Thanks to everyone that came!


  1. The CMS Experiment Computing Operations project is responsible for managing the massive, worldwide, distributed computing infrastructure necessary to process, store, and analyze data from the Compact Muon Solenoid (CMS) detector at the Large Hadron Collider (LHC). It ensures that petabytes of collision data are accessible to thousands of global scientists. Between the detector data and the Monte Carlo Simulations in the order of 100 PB of data are created annually and for processing on order of half a million cpu cores are running 24/7/365. Key aspects of the CMS Computing Operations project include: Global Distributed Grid: Utilizing the Worldwide LHC Computing Grid (WLCG), which consists of a Tier-0 center at CERN for initial processing, Tier-1 centers for storage/re-reconstruction, and Tier-2 centers for simulation and analysis. Data Management & Processing: Handling the daily operations of data acquisition, transfer, and reconstruction of billions of events. Interacting with Computing Infrastructure and Software Support teams who provide the physical hardware and the software stack (e.g., CVMFS, CMSSW) required for event reconstruction and simulation. Operational Monitoring: Using advanced tools to monitor grid performance, data popularity, and site storage to optimize resource usage. Modernization efforts to integrate machine learning for data analysis, simulation improvements, and automated data quality monitoring. ↩︎
  2. WisDQM is a agentic chat system which supports the Data Quality monitoring shifters: The CMS Data Quality Monitoring (DQM) system ensures high-quality data for physics analysis by monitoring, certifying, and debugging detector performance in real-time and offline. Using a, 6, 9-10] web-based GUI, it processes histograms and scalars from the Online (control room) and Offline (re-reconstruction) systems, with automatic quality tests to classify data as good/bad, identifying issues quickly for repair. Key elements of CMS DQM: Online Monitoring: Provides immediate, real-time feedback on detector, trigger, and Data Acquisition (DAQ) hardware status to shifters at Point 5. Offline Certification: Evaluates reconstructed data (express stream and full dataset) within hours or days to certify data quality for physics analyses. Scope: Covers all sub-detectors (Tracker, ECAL, HCAL, Muon) and the trigger system, employing automated, customizable quality tests for histograms. Evolution: The system is transitioning from traditional, rule-based alerts to using machine learning (autoencoders) for anomaly detection, as highlighted in arXiv and CERN news.  DQM is critical for validating reconstruction software (CMSSW) releases, alignment, and calibration, ensuring the reliability of physics res ↩︎
  3. The CMS Remote Analysis Builder (CRAB) is a lightweight tool that allows CMS physicists to interact with the large, complex computing system to process the data they need for their analysis and manage the output. Due to the complex underbelly of the system it is hard to find causes for failures and the support team is an ideal candidate for an agentic support bot that can–based on existing experience and standard tools–parse log files and perform the relevant diagnosis and further tests to address the user problems. ↩︎

Leave a Reply