Inferencing at the Edge – Will AI Revive the Old Edge Compute Idea?

Emir Halilovic, Principal Analyst

Summary Bullets

• Inferencing at the edge is seen as a potential strong driver for CSP revenues, mostly driven by the projected requirements of the physical AI

• But the whole concept is similar to ill-fated edge computing (or mobile/multi-access edge computing), tempering the optimistic outlook

As enterprises and public sector organizations continue developing future AI use cases that depend on physical AI, it’s becoming increasingly likely that the predicted distribution of AI inferencing beyond the centralized hyperscaler data centers will indeed happen. One of the key locations for this distributed inferencing is projected to be at the edge of the network – ensuring that physical AI workloads are served with the lowest possible latency, while being offloaded from the actual robots or connected devices, which might be limited in battery capacity, processing power, or both. In other words, inferencing at the edge is supposed to fill the gap between on-device inferencing, which is zero-latency, but limited in terms of processing power and (likely) battery capacity, and centralized data center inferencing, which is practically unlimited in terms of processing power, but has to traverse many networking domains that can introduce unacceptably high latency and undermine the determinism that will be key in physical AI use cases. The main deployment model for inferencing at the edge discussed at MWC counts on deploying GPU- or CPU-based processing power at distributed RAN (D-RAN) and centralized RAN (C-RAN) radio sites, using the same processing capacity for RAN workloads and AI inferencing. However, edge inferencing can also be deployed at the provider edge in fixed access networks, or IP transport for example.

Continue reading “Inferencing at the Edge – Will AI Revive the Old Edge Compute Idea?”