
Summary Bullets
• But the whole concept is similar to ill-fated edge computing (or mobile/multi-access edge computing), tempering the optimistic outlook
As enterprises and public sector organizations continue developing future AI use cases that depend on physical AI, it’s becoming increasingly likely that the predicted distribution of AI inferencing beyond the centralized hyperscaler data centers will indeed happen. One of the key locations for this distributed inferencing is projected to be at the edge of the network – ensuring that physical AI workloads are served with the lowest possible latency, while being offloaded from the actual robots or connected devices, which might be limited in battery capacity, processing power, or both. In other words, inferencing at the edge is supposed to fill the gap between on-device inferencing, which is zero-latency, but limited in terms of processing power and (likely) battery capacity, and centralized data center inferencing, which is practically unlimited in terms of processing power, but has to traverse many networking domains that can introduce unacceptably high latency and undermine the determinism that will be key in physical AI use cases. The main deployment model for inferencing at the edge discussed at MWC counts on deploying GPU- or CPU-based processing power at distributed RAN (D-RAN) and centralized RAN (C-RAN) radio sites, using the same processing capacity for RAN workloads and AI inferencing. However, edge inferencing can also be deployed at the provider edge in fixed access networks, or IP transport for example.
Inferencing traffic for workloads executed at the edge would thus traverse only single-domain, relatively low-latency radio access or fixed access link, while benefiting from pooling resources and not being subject to power supply limitations.

Edge inferencing diagram, GlobalData 2026
Although inferencing at the edge sounds like a logical opportunity for opening a new revenue source for CSPs, there’s still a lot of skepticism about the size of the opportunity and its timing. For example, although potential physical AI implementations look compelling, there are very few (if any) mature physical AI use cases deployed in the field. Further questions remain over what the preferred model for its deployment will be – CSP-owned and operated infrastructure, or some kind of cooperation with hyperscalers.
Finally, opinions differ over what will be the preferred hardware platform for edge inferencing – a lot of experimentation at the moment focuses on GPUs (primarily NVIDIA), while major CPU vendors like Intel and AMD, and their technology partners (Dell, HPE) are pushing general, CPU-based compute platforms touting lower power envelopes and higher efficiency. The choice between CPUs or GPUs for edge inferencing is also likely to shape the way the AI RAN, vRAN, and cloud RAN landscape will develop in the future. In essence, if operators decide that GPUs are necessary for running AI and RAN workloads at the edge, the arguments about efficiency and power envelopes will fall by the wayside. If, on the other hand, CPUs prove sufficient, the operators will indeed continue to use power and efficiency as some of the main buying criteria. But, before all that, operators and their vendors need to credibly prove that inferencing at the edge will actually bring palpable – and sizeable – new revenue to CSPs. Without that, inferencing at the edge will remain a nifty concept without much practical implementation potential.




