AI capability limits: riding my bike with no handlebars

I was about nine years old, coasting up and down our quiet street in Denmark on a hand-me-down bike with the paint already flaking off the top tube. I had spent the whole afternoon practising, and I had finally cracked it: both hands off the handlebars, arms folded across my chest, gliding along like I was able to fly. It felt, at the time, like an achievement on the order of the moon landing. I went to find my friends to show them.
The problem was that when I rolled up to the park with what I hoped was a casual-but-definitely-noticed-me flourish, three of them glanced over, shrugged, and went back to kicking a ball around. One of them was riding past me at that exact moment - no hands, no bike at all really, because they had already progressed to wheelies. By the time I had finished my lap, two more had joined in. It turned out nearly every kid on our street could ride with no handlebars. It was not a superpower. It was just the bare minimum of being nine.
Right now, most of us are that nine-year-old with AI. A few years ago, getting GPT to draft a business case over a coffee felt like breaking the laws of physics. Today it is table stakes. The interesting question has moved on. It is no longer "what can AI do?" - the theoretical answer is "nearly anything screen-shaped" - but "what are people actually doing with it, and whose job does that quietly change?" This is exactly what the Anthropic Economic Index is trying to measure: not the theoretical ceiling of AI capability, but the actual tyre marks it is leaving on the labour market.
Put simply: capability is no longer the moat. Observed usage is.
The theoretical ceiling: what AI could do
Before you can tell who is losing their handlebars, you have to agree on what the bike can do in the first place. The foundational piece of work here is Eloundou et al.'s 2023 paper "GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models". The authors took every task description in the O*NET database (~800 US occupations, ~19,000 tasks) and scored each one for how much an LLM could speed it up. They call this score .
The scoring scheme is deliberately simple:
Their headline finding is the one most people already know: roughly 80% of US workers have at least 10% of their tasks exposed to LLMs, and around 19% have at least half their tasks exposed. That is the ceiling. It is the moon-landing version of "I can ride with no hands". And for two years the whole industry has essentially been citing this number back and forth to justify strategy decks.
The problem is that a ceiling tells you nothing about the floor. Plenty of tasks are theoretically automatable and yet, three years in, still stubbornly done by a person. Eloundou et al. can tell you that the kid could ride no-handed. It cannot tell you whether he actually does.
The bridge: what people actually do with it
Anthropic's contribution is the other half of the picture. The Anthropic Economic Index takes anonymised conversations from Claude, clusters them with their internal privacy-preserving tool Clio, and maps them back to the same O*NET task taxonomy. Instead of asking "could a model do this?", they ask "is a model already doing this, in the wild, right now?".
The headline cross-check is remarkable. Roughly 97% of actual Claude usage falls within tasks that Eloundou's rubric rated as theoretically feasible (). In other words: people are not using AI for things it can't do. They are using it for a narrow, practical slice of what it can do, and mostly the obvious slice.
That gap - between the 80% theoretical exposure and the much smaller set of tasks actually being handed over - is the part that matters. The March 2026 "Learning curves" report is explicit that adoption lags capability because of legal exposure, technical integration, trust, and plain old human-in-the-loop habits. Everyone could be riding no-handed; only some kids are.
Observed Exposure: weighting reality over theory
The actually novel thing in the Economic Index is the Observed Exposure metric. Instead of treating every theoretically-exposed task as equal, Anthropic weight each task by how it is actually being used. The shape of it is something like this:
Observed Exposure (schematic)
In plain language, four knobs decide how "exposed" a given job actually is today:
- Prevalence ():
- How often the task shows up in real Claude traffic. A task no one ever actually delegates to the model gets down-weighted, even if it scores on paper.
- Work context ():
- Is the conversation plausibly work-related, or is it someone asking for dinner recipes? Casual use counts for less when you are trying to measure occupational exposure.
- Implementation type ():
- Automated usage (via API, a workflow, or an agent harness) is weighted more heavily than augmentative usage (a human chatting for help). Automation is the form that actually removes human minutes from a task; augmentation just makes those minutes more pleasant.
- Time share ():
- How much of a typical worker's day is actually spent on this specific automatable task. Automating a task that consumes 2% of someone's week is not the same as automating the thing they do all day.
Why this matters: Eloundou's tells you the engineering possibility of substitution. Observed Exposure tells you the economic reality of it, right now, this quarter. The first is a physics problem; the second is a labour-market signal.
Who's already riding no-handed?
When you re-rank US occupations by Observed Exposure rather than pure theoretical , a much sharper picture falls out. Three professions sit at the top, and a surprisingly large chunk of the workforce sits at a hard zero.
The table is a useful sketch, but the underlying data is a 22-dimensional shape - and it rewards poking at. The radar below starts with all 22 SOC major groups side by side - blue is Eloundou's theoretical , lime is Anthropic's Observed Exposure - with spokes colour-coded into six occupation families - Management & Business, Tech & Science, Education/Arts/Law, Healthcare, Service/Sales/Office, and Trades & Production - so you can see at a glance which corners of the labour market each lobe is reaching into. A few things worth knowing about how to drive it:
- Search for anything. Type an occupation, a minor group, or a whole SOC major category into the box. O*NET aliases ("RN", "software engineer") and Soundex matching ("programer", "nurze") still surface the right spoke.
- Stack several picks. Each chip gets its own slice of the 22-spoke budget, so adding "Software Developers" and "Farmers" shows both neighbourhoods side by side instead of one drowning the other.
- See the related-job halo. Around each pick the chart fans outward through O*NET's "related occupations" graph, so you also get the nearest-neighbour jobs - and the family colours tell you which corners of the labour market they're being pulled in from.
- Inspect a spoke. Hover any label or row beneath the chart (tap on mobile) for the exact theoretical , observed exposure, and US employment headcount for that occupation.
- Share or reset. The URL updates as you go, so any view you build is a permanent link; Reset drops you back to the full 22-group overview.
The demographic shadow is also interesting, and uncomfortable. Highly-exposed workers are, on average, higher-paid, more educated, and more likely to be female or Asian than the unexposed group. That is almost the exact inverse of previous waves of automation, which came for manual and manufacturing jobs first. The no-handlebars bike, this time, is parked in the knowledge-worker cul-de-sac.
So… is everyone losing their job?
Short answer: no, not yet - and the Index is careful not to pretend otherwise. The labour market signals in early 2026 look more like a slow re-shaping than a shock.
- No unemployment spike in highly-exposed roles. Programmers, CS reps, and data-entry clerks are not queuing up outside the job centre. What is happening inside those teams, though, is that headcount is being held flat while output rises - a classic productivity-booster signature rather than a "job killer" signature.
- A hiring chill at the bottom of the ladder. The clearest early signal is a measurable slowdown in hiring for 22-25 year olds in highly-exposed fields. Junior roles are exactly the ones whose tasks look most like what a well-harnessed agent can do: bounded, well-specified, low-stakes per task. Firms are quietly re-routing entry-level work to models and waiting to see what the new shape of an "entry-level" job actually is.
- Efficiency now, displacement later (maybe). The authors are explicit that Observed Exposure is designed to be the leading indicator of displacement if and when it comes. Watch the ratio of automated-to-augmentative use: the day it tips, the productivity story turns into a headcount story.
In other words, the kids on the street are not retiring. They are just quietly doing more laps per afternoon, because everyone is now expected to ride without hands as a baseline.
Everybody was riding no-handed all along
Nine-year-old me rolled into that park absolutely convinced I had joined a very small club of extraordinary cyclists. The actual club was "every kid on the street" and the actual benchmark had quietly moved to wheelies. The first lesson of that afternoon was humbling; the second lesson, much more useful, was that the interesting question was never "can you ride no-handed?" - it was "what do you do once everyone can?".
That is the position almost every knowledge worker is now in. "I used AI to do X" is no longer a flex; it is the baseline. What the Anthropic Economic Index gives us is a way to see, with actual data, where the baseline has already moved - and crucially, where the next move is going to land first.
If you run a team, three things are worth doing on Monday:
- Map your team's work to O*NET-style tasks and ask where your and are actually highest - that is where Observed Exposure will bite first.
- Separate augmentative tooling (chat, copilots) from automated tooling (API workflows, harnesses). The second is where the labour-market signal lives.
- Pay specific attention to your 22-25 year old pipeline. If you are quietly not hiring them, you are not being strategic; you are being the statistic.
So, when everyone on the street can ride with no handlebars, what is your wheelie going to be?
Further reading
- Eloundou, Manning, Mishkin, Rock (2023). GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models. arXiv:2303.10130. The original rubric applied to O*NET.
- Anthropic (2025). Introducing the Anthropic Economic Index. The original Clio-driven methodology for measuring real-world Claude usage against O*NET tasks.
- Anthropic (2026). Anthropic Economic Index report: Learning curves. The March 2026 update introducing Observed Exposure, the automate-vs-augment weighting, and the 22-25 hiring-chill signal.
- Anthropic (2026). Labor market impacts of AI: A new measure and early evidence. The underlying March 2026 paper formally defining Observed Exposure and presenting the early empirical evidence - including the BLS growth correlation, the demographic profile of exposed workers, and the slowdown in hiring of younger workers in exposed occupations.
- Anthropic (2026). Anthropic Economic Index: Tracking AI's role in the US and global economy. Companion geographic breakdown, useful if you want to see how exposure lands by state and country.
- Anthropic (2024). Clio: privacy-preserving insights into real-world AI use. The clustering/aggregation tool that makes the Economic Index possible without exposing any individual conversation.
- US Department of Labor. O*NET OnLine. The task taxonomy underneath both studies - worth a browse if you have never actually clicked through to your own job code.