LLM Token Economics: Who Takes the AI Money

Vercel just published the numbers most companies keep private: how much money and how many tokens actually flow through production apps. Its AI Gateway routes tens of trillions of tokens across hundreds of models, and in April 2026 the split was lopsided. Anthropic captured 61% of all token dollars but only 26% of the tokens themselves. Google was the mirror image: 38% of the volume, 21% of the revenue.

That single frame is LLM token economics in one picture. Vercel CEO Guillermo Rauch posted an animated "race" of model spend, and it shows not who is "best" but what businesses actually pay for. The market has split into two layers: premium quality and cheap volume.

At Gless we decide every week which model to build a client's system on. So we read this report less as news and more as a map: where it makes sense to pay the premium, and where it doesn't. Let's walk through the numbers and what they mean for your inference bill.

LLM token economics: what the Vercel AI Gateway index revealed

Short version: Anthropic leads on money, Google leads on token volume, and those are two different leagues of the same market.

AI Gateway is a proxy that lets an app reach many models through one API. Vercel released aggregate production data for the first time (its Production Index) — what developers actually run in live systems, not in benchmarks. The April 2026 breakdown:

Provider	Share of spend	Share of tokens
Anthropic	61%	26%
Google	21%	38%
OpenAI	12%	13%
xAI	—	10%

Rauch summed it up: "Google is king of production scale, Anthropic dominates in coding & spend, OpenAI is growing fast since 5.4, and OSS continues to gain ground." The fluidity shows in the trend: OpenAI's spend share tripled from March to April after the GPT-5.4/5.5 releases, and Google's climbed from 8% to 21% in the same month. One more signal about the nature of the workload — tool-call requests grew from 11.4% to 22.2%. Apps increasingly don't "ask the model," they hand it an action.

Why Anthropic takes two-thirds of the money but not two-thirds of the tokens

The answer is price per call. Expensive models catch the high-stakes tasks; cheap models catch the mass volume. Money follows the first group, tokens follow the second.

Compare the price per million tokens: Claude Opus 4.8 runs $5 input and $25 output, while Gemini 3.5 Flash is $1.50 and $9 (Totalum and DataCamp data). Flash is roughly a third the price. Do the math: a billion output tokens on Opus is $25,000; on Flash it's $9,000. If that volume is made of simple calls, the $16,000 gap is money burned. So volume drifts to the cheap models, and Opus keeps the work where a mistake costs more than the savings.

Vercel's report says it outright: premium reasoning lands on Claude Opus, cheap fast calls land on Gemini Flash. The same app pulls both. So the 61/26 "scissors" isn't an anomaly — it's the market pricing risk. Where a model error is expensive (hard code, legal analysis, an agent that drives a task on its own), businesses pay for Claude and don't haggle. Where they need volume and speed (classification, short replies, autocomplete), they take whatever is cheaper.

This explains Anthropic's revenue even without Vercel. The company's annualized revenue reached about $14B by February 2026, up from $1B at the end of 2024 (Sacra data). Claude Code alone crossed $2.5B run-rate. And Salesforce publicly budgeted $300M for Anthropic tokens in 2026, most of it tied to code.

The market split: premium quality and cheap volume

The split shows up by task type, not on average. Anthropic's token share falls from 71% in back-office workloads (highest stakes) to 7% in consumer ones (lowest stakes). Same company, opposite model choice across the layers of an app.

The premium-quality layer is agentic and engineering work, where the model drives a task across several steps. Those workloads already make up 58.9% of all tokens in the Gateway, up from 31.6% six months ago. The more autonomous the work, the higher the cost of an error, and the more willingly a business pays the premium.

The cheap-volume layer is everything you can hand to a cheaper or open model without losing quality. That's where Gemini Flash and open weights live. We recently broke down GLM-5.2 — an open model that beats GPT-5.5 on several coding benchmarks and costs roughly six times less. Releases like that fill the lower layer: they take volume, not premium revenue.

The takeaway is simple: "expensive" and "cheap" stopped being competitors. They're two different products for two different jobs.

Single-vendor in production is dead: from 3 models to 35

The report's main practical signal is that multi-model has become the norm. Teams at 1K–10K requests run about 3 models on average. At 10M+ requests, the average is 35 models in regular use.

The logic is plain. As traffic grows, the price-per-call gap turns into real money, and running one expensive Opus for everything becomes wasteful. Developers spread tasks across models: the expensive one on the 3 hard steps, the cheap one on the 97 easy ones. Plus a safety net: 3.5% of Gateway requests complete after a fallback to another model (5.1% by tokens) when the primary is down or fails.

An example from our own work. A typical support agent does three things: it classifies the request, pulls facts from a knowledge base, and writes the reply. A cheap model handles the first two fine — those need volume, not brilliance. The final wording on a tricky case we hand to a stronger model, because a bad answer to a customer costs more than a couple of cents saved. One pipeline, two or three models, the bill under control. That's LLM token economics at the level of a real system, not a report.

At Gless we build systems the same way. One model for the whole pipeline is either overpaying or a quality drop where it matters. In our AI implementation projects model routing is part of the architecture from day one, not an afterthought. The Vercel index just put a number on what is already standard in production.

What to do with this as a business

Read the 61/26 "scissors" not as a ranking but as a routing instruction. Three practical takeaways.

First: don't put an expensive model on cheap calls. If Claude Opus handles your ticket classification or short auto-replies, you're paying a premium for a risk that isn't there. It's the first thing we cut when we audit other people's systems.

Second: don't cut corners on expensive calls. The opposite mistake is running a complex agent or critical code on the cheapest model to save money. The cost of an error here is higher than the price gap. The market hands Anthropic 61% of the money on this exact layer for a reason.

Third: treat pay-per-token as your main cost line. The industry is moving from flat subscriptions to paying for volume — we covered that when Claude Fable 5 left subscriptions. Your bill now depends directly on how carefully tasks are spread across models.

In short: the winner isn't whoever picked the "right" vendor, but whoever distributed the load correctly between premium quality and cheap volume. If you'd like to map your pipeline this way and estimate the savings, let's discuss your project.

LLM Token Economics: Why Anthropic Takes Two-Thirds of the Money

LLM token economics: what the Vercel AI Gateway index revealed

Why Anthropic takes two-thirds of the money but not two-thirds of the tokens

The market split: premium quality and cheap volume

Single-vendor in production is dead: from 3 models to 35

What to do with this as a business

Want to implement an AI agent?