A compelling example is the CERN Large Hadron Collider, where real-time inference systems must process millions of events per second, rapidly filtering out irrelevant data while preserving rare signals of interest. As LHC data rates continue to grow, reaching the equivalent of 5% of global internet traffic, traditional computing approaches are no longer sufficient, requiring new methods for ultra-fast, energy-efficient machine learning.
In this presentation, we will discuss emerging techniques for low-power, low-latency inference, including hardware-aware model design, quantization, sparsity, and hardware–software co-design. Using examples from particle physics and other domains, we will show how real-time machine learning is both a practical necessity and a powerful tool for scientific discovery.