Computing demands for large scientific experiments, such as the CMS experiment at the CERN LHC, will increase dramatically in the next decades. To complement the future performance increases of software running on central processing units (CPUs), explorations of coprocessor usage in data processing hold great potential and interest. Coprocessors are a class of computer processors that supplement CPUs, often improving the execution of certain functions due to architectural design choices. We explore the approach of Services for Optimized Network Inference on Coprocessors (SONIC) and study the deployment of this as-a-service approach in large-scale data processing. In the studies, we take a data processing workflow of the CMS experiment and run the main workflow on CPUs, while offloading several machine learning (ML) inference tasks onto either remote or local coprocessors, specifically graphics processing units (GPUs). With experiments performed at Google Cloud, the Purdue Tier-2 computing center, and combinations of the two, we demonstrate the acceleration of these ML algorithms individually on coprocessors and the corresponding throughput improvement for the entire workflow. This approach can be easily generalized to different types of coprocessors and deployed on local CPUs without decreasing the throughput performance. We emphasize that the SONIC approach enables high coprocessor usage and enables the portability to run workflows on different types of coprocessors.

Portable Acceleration of CMS Computing Workflows with Coprocessors as a Service

Ligabue F.;
2024

Abstract

Computing demands for large scientific experiments, such as the CMS experiment at the CERN LHC, will increase dramatically in the next decades. To complement the future performance increases of software running on central processing units (CPUs), explorations of coprocessor usage in data processing hold great potential and interest. Coprocessors are a class of computer processors that supplement CPUs, often improving the execution of certain functions due to architectural design choices. We explore the approach of Services for Optimized Network Inference on Coprocessors (SONIC) and study the deployment of this as-a-service approach in large-scale data processing. In the studies, we take a data processing workflow of the CMS experiment and run the main workflow on CPUs, while offloading several machine learning (ML) inference tasks onto either remote or local coprocessors, specifically graphics processing units (GPUs). With experiments performed at Google Cloud, the Purdue Tier-2 computing center, and combinations of the two, we demonstrate the acceleration of these ML algorithms individually on coprocessors and the corresponding throughput improvement for the entire workflow. This approach can be easily generalized to different types of coprocessors and deployed on local CPUs without decreasing the throughput performance. We emphasize that the SONIC approach enables high coprocessor usage and enables the portability to run workflows on different types of coprocessors.
2024
Settore PHYS-01/A - Fisica sperimentale delle interazioni fondamentali e applicazioni
CMS; Machine learning; Offline and computing
   Advanced Multi-Variate Analysis for New Physics Searches at the LHC
   AMVA4NewPhysics
   European Commission
   Horizon 2020 Framework Programme
   675440

   Search for Higgs bosons decaying to charm quarks
   HIGCC
   European Commission
   Horizon 2020 Framework Programme
   724704

   Direct and indirect searches for new physics in events with top quarks using LHC proton-proton collisions at the CMS detector
   LHCTOPVLQ
   European Commission
   Horizon 2020 Framework Programme
   752730

   Majorana neutrino discovery strategy with CMS
   MajorNet
   European Commission
   Horizon 2020 Framework Programme
   758316

   International Training Network for Statistics in High Energy Physics and Society
   INSIGHTS
   European Commission
   Horizon 2020 Framework Programme
   765710

   The strong interaction at the frontier of knowledge: fundamental research and applications
   STRONG-2020
   European Commission
   Horizon 2020 Framework Programme
   824093
File in questo prodotto:
File Dimensione Formato  
s41781-024-00124-1.pdf

accesso aperto

Tipologia: Published version
Licenza: Creative Commons
Dimensione 2.33 MB
Formato Adobe PDF
2.33 MB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11384/149544
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact