Article

Inference Makes Up 90% of Our AI Usage: Here’s Why That Matters

By Brandon Ross, Senior Interconnection Consultant at DE-CIX
17 February 2026

One of the most under-discussed aspects of AI is the demand it’s placing on our connectivity infrastructure. While headlines focus on the need for more compute – bigger data centers, faster models, access to GPUs for training, and the need for increased energy – we risk overlooking one immediate and fundamental point of pressure: the network. Our current use of AI falls into two distinct camps, training and inference. Training tends to happen in isolated bursts and requires a great deal of raw compute and energy in the form of GPU clusters, while inference workloads are continuous, highly distributed, and deeply sensitive to latency. Inference is what happens every time we ask ChatGPT something, every time an autonomous car makes a split-second decision, or every time an enterprise analyzes its data. Inference is the general use of AI, often running on single GPUs, CPUs, or even directly on edge devices, often involving API calls, queries, and tokens.  

According to MIT, inference makes up roughly 80-90% of our total AI usage. And because it’s the dominant mode of AI consumption, it demands that our networks are fast, robust, reliable, and not prone to latency. Latency is kryptonite to AI applications, which cannot tolerate delay and often need to exchange small packets of data at lightning speed in order to maintain a smooth user experience. Delays, packet loss, or brief outages can degrade user experience, disrupt operations, or undermine trust in systems that businesses increasingly rely on to make decisions in real time.

This is already evident in traffic patterns at major interconnection points, where sustained growth reflects the growing need to exchange data efficiently across dozens of networks at once. As AI adoption accelerates, the resilience of the network – not just our compute capability – will determine whether AI delivers consistent value for users and businesses.

Inference at scale: Why AI traffic behaves differently

What inference-driven AI introduces is not just more traffic, but a fundamentally different traffic profile. Instead of large, predictable flows moving between fixed points, AI inference generates countless smaller exchanges happening simultaneously across many locations, from the edge to the cloud. Each interaction may be lightweight on its own, but at scale they place enormous demands on routing efficiency, latency, and consistency. AI workloads frequently span multiple clouds, enterprise environments, and access networks within a single transaction, meaning performance is determined by the weakest link along the data path. What this means is that the network is no longer a simple delivery mechanism. It’s actually a performance driver, and it has the power to dictate whether an AI application functions as intended.

This is where network architecture makes a real difference. Inference workloads benefit from short, direct data paths that minimize hops, reduce congestion, and keep latency tightly controlled. Relying solely on traditional IP transit (the public Internet) means outsourcing control over those paths to third parties, often with little visibility into where traffic is exchanged or how it is routed. And as inference traffic continues to grow, that lack of control becomes a liability. Aggregating networks on Internet Exchange (IX) platforms allows data to move more directly between relevant networks, avoiding unnecessary detours and bottlenecks. So as AI applications spread across regions and industries, the ability to interconnect efficiently and predictably at distributed, neutral IXs becomes the real driver of performance.

What’s at stake

When network performance degrades, the impact on business is immediate. Even minor connectivity issues can have outsized effects. A glitchy video call can erode confidence in a pitch or a presenter. Sluggish application performance frustrates customers and employees alike. At a more tangible level, IT outages carry real financial consequences too – estimates consistently show that network and IT disruptions cost small and mid-sized enterprises tens of thousands of dollars per hour, while large organizations can lose hundreds of thousands or even millions of dollars for every hour of downtime. And those risks only scale with greater AI adoption.

When AI applications are business critical, as many increasingly are, the stakes rise even further. Inference now underpins applications such as digital twins and predictive maintenance, financial services platforms, intelligent product design, autonomous and remotely operated vehicles, and even healthcare systems. In these environments, network reliability isn’t just a question of user experience or productivity – it’s directly tied to safety, regulatory compliance, reputation, and trust.

Despite this, however, many enterprises and even smaller network operators still depend primarily on IP transit to connect to the Internet. This effectively outsources one of the most critical components of digital operations to third-party networks, often without transparency or control over how data is routed. IP transit offers limited visibility into the physical and logical paths data takes, which networks it traverses, where it is exchanged, or even which jurisdictions it passes through. For latency-sensitive AI inference use cases, that lack of control becomes a significant risk. Network delays, outages, and security lapses directly threaten the viability of AI-driven business models, so gaining greater control over connectivity and data paths via interconnection is a prerequisite for success in any AI-driven economy.

Designing for resilience in an AI world

Enterprises looking to protect AI-enabled services against disruption need to start with resilience by design. And that begins with redundancy. Relying on a single provider or connection leaves organizations exposed, particularly as AI-driven traffic spikes and AI-amplified DDoS attacks become more common. A multi-provider approach, combined with direct peering, ensures there is always an alternative path if one network is impaired. Peering at data center and carrier-neutral IXs that support blackholing and advanced traffic filtering allows organizations to isolate and discard malicious traffic from a single connection while keeping all other services online. Resilience also depends on geography. Connecting through multiple, physically separate data centers with non-overlapping network paths reduces the risk of localized failures, whether caused by congestion, outages, or something as mundane as accidental cable damage.

One final thought to add is that resilience only works if it is continuously tested. Backup connections, failover mechanisms, and recovery plans must be exercised regularly. If failover has never been tested under real conditions, it cannot be relied upon when disruption inevitably occurs.

AI is forcing a rethink of how we design and operate networks, not because traffic volumes are rising, but because performance, predictability, and control now matter more than ever. Organizations that invest in network architectures built for scale, redundancy, and low latency will be the ones able to deploy AI with confidence, while those that don’t risk turning innovation into instability.

Join the growing community shaping the AI‑powered digital world.

Sign- up now to get insights on AI networking, infrastructure trends, ecosystem developments, and much more.