The Geometry of Expertise — Why "You Are an Expert" Fails in AI

I have been around a long time, and I have never met the same expert twice.

There are people who share expertise — who operate at the same level, as we say — but they are far from identical. Two condensed matter physicists who read the same papers, passed the same qualifying exams, and publish in the same journals will approach the same problem differently. They foreground different variables. They suppress different noise. They route their attention through different paths to arrive at different — sometimes contradictory — conclusions, both of which are defensible.

This is not a bug in human cognition. It is, arguably, the whole point of it. Expertise is not a body of knowledge. It is a configuration of attention.

4.5 Million Configurations

The United States alone has 4.5 million doctorate holders, a figure that has more than doubled since 2000. In 2024, U.S. institutions awarded 58,131 research doctorates. Germany produces 28,000 PhDs per year. The UK, another 28,000. Globally, hundreds of thousands of new doctorates are minted annually, and that only counts the credentialed. “Expert” is far wider than “PhD.”

When someone types “You are an expert in physics” into a language model, they are pointing at tens of thousands of living physics PhDs in the US alone — and millions worldwide — without specifying which one. The phrase carries no attentional configuration. It activates a vast, unspecified region of the model's representational space, and the model responds with the statistical mean of every physics-expert-adjacent token pattern it absorbed during training.

The statistical mean of tens of thousands of distinct attentional configurations is, by definition, none of them. It is the average of expertise: competent, generic, and precisely wrong in ways that are hard to diagnose because the output sounds expert while embodying no actual expert topology.

The Most Expensive Five Words in the History of Electricity

It is worth pausing to consider the material cost of this semantic emptiness.

As of early 2026, ChatGPT alone processes over 2.5 billion prompts per day. Across all major AI platforms, the figure approaches 3 billion. “You are an expert in [field]” has been the number-one prompt engineering tip in virtually every guide, tutorial, and viral thread for two years running. Conservative estimates suggest at least 10% of prompts include some form of role assignment.

Each prompt consumes energy — OpenAI states 0.34 watt-hours per query; independent estimates from the IEA and academic researchers range from 0.3 to 3 Wh when full data center overhead is included. But the cost is not merely linear. Self-attention is quadratic: every token in the phrase computes pairwise attention with every other token in the context. Seven tokens of “You are an expert in physics” in a 2,000-token prompt generate 14,000 additional pairwise attention computations — per head, per layer. Across a typical 32-head, 32-layer architecture, that is roughly 14 million wasted floating-point operations per query, before a single output token is generated.

And then the output is generic. So the user sends a follow-up prompt to correct it. Then another. Then another.

Over 21 months, at conservative assumptions, the energy wasted on role-assignment prompts and the follow-up corrections they necessitate reaches the gigawatt-hour scale — enough to power hundreds of homes for a year. The electricity cost runs into the hundreds of thousands of dollars. The CO₂ emissions measure in the thousands of tonnes.

All because of a phrase that tells the model nothing about how to think — only that it should pretend to be one of millions of people who think differently from each other.

What Actually Differs Between Experts

If two physicists with nearly identical training produce different analyses of the same data, the variable is not knowledge. The knowledge overlaps enormously. What differs is attentional topology — the geometric configuration of which features receive weight, which connections are treated as load-bearing, which domains are cross-referenced, and which are suppressed.

Expert A foregrounds symmetry considerations and reaches for group theory. Expert B foregrounds boundary conditions and reaches for numerical methods. Both are “expert in physics.” The word “expert” is doing no useful work in either description. The useful work is being done by the specific attentional routing that distinguishes them.

This is why the phrase “You are an expert” is semantically empty in the context of transformer language models. The model's self-attention mechanism needs geometric specificity — which regions of representational space to privilege, which connections to strengthen, which to attenuate. “Expert” provides none of this. It is the equivalent of telling someone “drive somewhere nice” without giving them an address, a direction, or even a compass bearing.

6% of the Tokens, Better Results

At Metaphori, we have been studying what happens when you replace semantic emptiness with geometric precision.

In repeated experiments across extended conversations, I have taken 200,000-token contexts and restructured them using MESN™ — our patent-pending structured notation system — achieving compression ratios exceeding 94%. The restructured context uses roughly 6% of the original token count. And the result is not merely “still coherent.” It is noticeably better: more focused, more aligned, more capable of sustaining complex reasoning across the remaining context window.

This is counterintuitive if you think of tokens as carriers of information. Removing 94% of the tokens should destroy 94% of the information. But tokens are not carriers of information in the way that letters are carriers of words. In a transformer, tokens are participants in a high-dimensional attention geometry. Every token generates Query and Key vectors that interact with every other token. Many of those interactions are noise — articles attending to prepositions, hedging phrases diluting the signal of the concepts they modify.

Structured notation eliminates the tokens that contribute noise to the attention geometry while preserving — and often strengthening — the tokens that carry actual semantic weight. Fewer pairwise computations means a cleaner residual stream trajectory. Cleaner trajectory means more confident output. The model works with less and produces more, because the geometry is cleaner.

This is the opposite of what “You are an expert” does. That phrase adds tokens to the context without adding geometric specificity. It increases the computational burden while pointing the model's attention at a vast, undifferentiated region of representational space. Every follow-up prompt attempting to correct the resulting generic output adds more noise. More tokens. More pairwise computations. More dilution.

The geometry gets worse the harder you try to fix it.

Expertise Is Topology, Not Content

The claim of this essay is simple, and its implications are not:

What makes an expert is not what they know. It is the geometric configuration of their attention — which features they weight, which connections they see, which paths they traverse through a space of possibilities. No two experts share the same configuration. This is why you have never met the same expert twice.

If that is what expertise is, then the project of working with language models is not a project of injecting knowledge. It is a project of specifying topology. Not “be an expert” but “attend to these features, suppress those, route through this path.” Not more tokens but better geometry.

That is the project we are pursuing at Metaphori.

Jasdeep Jaitla is the Founder & CEO of Metaphori, Inc. Metaphori's research on attention engineering and MESN™ has been validated across 43 models, 12 architecture families, and 3,068 completions. For technical details, see our DLA Studies and Metaphori Engine documentation.