Everyone Measured Tokens. Nobody Measured Value.

In May 2026, Amazon built a leaderboard. The goal was to track how many AI tokens its developers consumed, a proxy, leadership decided, for how actively the workforce was using AI tools. Employees noticed the leaderboard. They started generating tokens. Lots of them. They ran queries that served no business purpose, inflated outputs, found every creative way possible to make the number go up.

Fortune named the practice “tokenmaxxing” on May 12. By May 28, Fortune declared it dead. Amazon pulled the leaderboard on May 29.

Seventeen days. That’s how long it took a measurement system to collapse under its own incentive structure.

The fast version of this story is that Amazon created a bad metric and employees gamed it. The real version is that this is what enterprise AI governance looks like when organizations deploy first and define success later. The tokenmaxxing story is unusually visible, but the failure mode it represents is running through nearly every major enterprise AI program simultaneously. Most of them just haven’t published the autopsy yet.

The measurement problem runs deeper than the leaderboard

Goodhart’s Law holds that when a measure becomes a target, it ceases to be a good measure. Token leaderboards are the most obvious example. But the same failure shows up in the surveys enterprises use to justify their AI investments.

METR’s May 2026 survey of 349 technical workers found a median self-reported productivity gain of 1.4 to 2x from AI tools. That number gets cited in board decks and investor updates. The problem is that METR’s own controlled research found people overestimate AI’s effect on their work by 40 percentage points on average. The gap isn’t dishonesty. It’s that measuring your own output accurately is genuinely hard, especially when nobody has defined what “productivity” means in the context of how you actually work.

The deeper issue is that enterprise AI deployments are too varied for survey-level generalization to hold. A logistics company using AI to optimize routing and a legal team using it to draft memos aren’t measuring the same thing when they both report “productivity gains.” Averaging those responses into a benchmark number and presenting it as evidence of AI ROI is a category error. But that’s what most organizations are doing, because it’s the data they have and the boards want a number.

Grant Thornton’s 2026 AI Impact Survey found that two-thirds of Fortune 500 companies are measuring AI ROI through estimates rather than financial results. Seventy-eight percent of executives couldn’t pass a basic AI governance audit. The measurement failure isn’t confined to the operational level. It goes all the way up.

The capacity question nobody is asking

Here’s the reframe that gets lost in most enterprise AI measurement discussions: speed is not the variable that matters.

Most organizations measure AI’s impact in terms of time saved — tasks completed faster, hours recovered, workflows accelerated. These are proxies for capacity, not capacity itself. The question that actually maps to business value is whether the organization can do things it couldn’t do before, or sustain the same output with fewer resources. That’s a capacity question. Speed is only one input to it, and only under specific conditions.

Speed gains convert to business value when the freed capacity gets absorbed into higher-value work. That only happens if the organization has redesigned its workflows and incentive structures to make that possible. If a developer completes a task in half the time but has no additional work queued and no incentive to take more on, the efficiency is real and the business impact is zero. This is the incentive gap that tokenmaxxing exposed. Employees used AI to move faster, then had no structural reason to do more. The gains evaporated at the organizational level because the workflow design hadn’t caught up with the tooling.

Deloitte’s 2026 State of AI in the Enterprise report quantifies the cost of that gap. Seventy-four percent of organizations expect AI to grow revenue. Twenty percent actually are. The companies closing that gap share a pattern: they define what they’re trying to change before they deploy, treat governance as infrastructure rather than compliance, and invest in reskilling so employees can absorb the capacity AI creates. That last part matters more than most AI deployment conversations acknowledge. Reskilling only works if the organization’s knowledge infrastructure supports it — if people can access the institutional knowledge they need to take on more complex work once AI has cleared the lower-value tasks from their plate.

Trust is the adoption variable that doesn’t show up on any dashboard

Writer’s 2026 enterprise AI adoption report found that 79% of organizations face significant adoption challenges despite heavy investment. The most consistent underlying factor is trust. Employees who don’t believe in the tools won’t use them well. Low-quality usage produces low-quality outputs regardless of the model running underneath. And this is the part of the AI deployment conversation that consistently gets underweighted: the enterprise program is oriented almost entirely toward customer outcomes, while the workforce that’s supposed to generate those outcomes is treated as an implementation detail.

It isn’t. If people don’t understand why they’re using these tools, don’t see how the tools make their work better, and feel like usage is being mandated and monitored rather than supported, you get tokenmaxxing. You get inflated self-reports. You get the 79% adoption challenge rate. Compliance and genuine adoption are not the same thing, and measurement systems that can’t distinguish between them are flying blind.

For founders building AI-enabled organizations now, the structural advantage is that none of these commitments have been made yet. You don’t have a leaderboard to unwind, a workforce to rebuild trust with, or a board presentation built on self-reported productivity data you can’t actually verify. Define what you’re trying to change before you pick a tool. Define how you’ll know it worked before you deploy. Build the knowledge infrastructure that makes your AI outputs trustworthy rather than just confident-sounding.

The measurement problem is a strategy problem wearing a metrics costume. The companies getting it right aren’t the ones with the most tokens or the most pilots. They’re the ones that asked the right questions before they started counting.

By CharlesPublished On: June 2nd, 2026Categories: Knowledge ManagementComments Off

About the author : Charles

Charles Costa, MLIS is a researcher, strategist, and founder of Lexora Labs, where he works on AI adoption, knowledge management, and the future of expert

Everyone Measured Tokens. Nobody Measured Value.