Direct Logit Attribution Studies
Using TransformerLens andnnsight, we decompose each attention head's contribution to next-token prediction across 43 models and 12 architecture families — measuring how Metaphori Engine™ Structured Notation (MESN™) shapes attention geometry. The findings are unusual — several contradict widely held assumptions about how input structure affects model behavior.
Humans have two very interesting traits: we have built-in conversation correction mechanisms — to varying degrees we adapt how we are saying things when we see signs of misunderstanding — and we tend to add more words and specificity if we feel some clarity is missing. This doesn't always work, and with AI in particular, the results are not what humans might expect.
The Perplexity Inversion
The most counterintuitive finding. MESN™ produces high perplexity on input — the model finds it surprising, harder to parse than natural language. Yet it produces low perplexity on output — more fluent, more coherent generation than equivalent prose prompts. Generation PPL ratio falls below 1.0 in 28 of 43 models.
This is an anti-pattern. The universal assumption in prompt engineering is “make input easy for the model.” Our data shows the opposite: harder initial processing produces deeper encoding and better transfer.
This mirrors Robert Bjork's desirable difficultyframework from cognitive neuroscience — effortful encoding leads to stronger memory and better generalization. MESN™ works not despite being hard for the model to process, but because it forces multi-family attention head recruitment rather than falling into what we call the “prose groove” — lazy, diffuse attention patterns that natural language enables.
Multi-Family Simultaneous Activation
A single MESN™ expression activates heads across 4 or more specialization families simultaneously — symbolic, relational, hierarchical, and meta-routing heads all fire together. This is a form of attention convergence that natural language rarely achieves. All 43 models show 8/8 specialization family activation under structured input.
Attention head activation: Typical Context vs MESN™ — 43 models
Spontaneous Notation Reproduction
69.7% of model completions spontaneously generate structured operators in their output — despite zero MESN™ in any model's training data. This isn't learned behavior. The architecture is expressing a preference.
When given structured input, the notation enters the KV cache and shapes subsequent token generation. The model doesn't just process Structured Notation — it continues in it, suggesting the notation aligns with something fundamental in how transformer attention self-organizes.
Complexity Amplification
The DLA advantage is not constant — it increases monotonically with input complexity in 38 of 43 models. Short stimuli show +8.9% mean DLA advantage; extended stimuli show +14.7% — a 65% amplification. MESN™ becomes more valuable precisely where it matters most: in complex, information-dense contexts.
Zero-shot prompt token lengths by complexity tier
| Tier | Pairs | Prose Tokens | MESN™ Tokens | Compression | DLA Advantage |
|---|---|---|---|---|---|
| Short | 32 | ~25–45 | ~10–33 | ~44% | +8.9% |
| Medium | 16 | ~88–127 | ~36–85 | ~52% | +11.1% |
| Long | 16 | ~260–405 | ~97–178 | ~63% | +12.7% |
| Extended | 8 | ~1,045–1,637 | ~344–615 | ~70% | +14.7% |
DLA advantage and MESN™ compression by complexity tier
The RLHF Dampening Effect
Base models consistently show ~2x stronger DLA signal than their instruction-tuned counterparts across all 10 matched pairs. Instruction tuning — RLHF in particular — optimizes the model for the comfortable prose groove, reducing its ability to benefit from structured input.
This creates a practical tension: base models have stronger DLA advantage but instruction-tuned models follow task instructions better. The implication is that current alignment techniques inadvertently narrow the model's cognitive repertoire.
DLA advantage: base vs instruction-tuned (matched pairs)
Cross-Architecture Universality
The structured advantage is not model-specific. It appears across GQA, MHA, and MLA attention mechanisms. It appears in dense and mixture-of-experts architectures. It appears from 3.8B to 141B parameters. Models with zero exposure to MESN™ in training — including Kimi-K2, trained entirely outside our ecosystem — interpret it natively, connecting its patterns to attention topology and equilibrium dynamics.
This suggests MESN™ is not a learned convention but a resonant frequency of transformer attention — the format these architectures were always capable of processing most effectively.
A Gap in the Literature
Every component of our methodology — DLA, attention head analysis, mechanistic interpretability tools — exists independently in the MI literature. But nobody has systematically tested structured notation systems against these tools at this scale. The combination of input-structure engineering with head-level mechanistic measurement is novel.
Studies