Direct Logit Attribution Studies

Using TransformerLens andnnsight, we decompose each attention head's contribution to next-token prediction across 61 models, 20 architecture families, and 5 attention mechanism types — measuring how Metaphori Engine™ Structured Notation (MESN™) shapes attention geometry. The findings are unusual — several contradict widely held assumptions about how input structure affects model behavior.

488/488

Family-direction checks positive

Models across 20 families

69.7%

Spontaneous operator reproduction

~2×

Base vs instruct signal strength

Humans have two very interesting traits: we have built-in conversation correction mechanisms — to varying degrees we adapt how we are saying things when we see signs of misunderstanding — and we tend to add more words and specificity if we feel some clarity is missing. This doesn't always work, and with AI in particular, the results are not what humans might expect.

The Perplexity Inversion

The most counterintuitive finding. MESN™ produces high perplexity on input — the model finds it surprising, harder to parse than natural language. Yet it produces low perplexity on output — more fluent, more coherent generation than equivalent prose prompts. Generation PPL ratio falls below 1.0 in 28 of 43 models.

This is an anti-pattern. The universal assumption in prompt engineering is “make input easy for the model.” Our data shows the opposite: harder initial processing produces deeper encoding and better transfer.

This mirrors Robert Bjork's desirable difficultyframework from cognitive neuroscience — effortful encoding leads to stronger memory and better generalization. MESN™ works not despite being hard for the model to process, but because it forces multi-family attention head recruitment rather than falling into what we call the “prose groove” — lazy, diffuse attention patterns that natural language enables.

Multi-Family Simultaneous Activation

A single MESN™ expression activates heads across 4 or more specialization families simultaneously — symbolic, relational, hierarchical, and meta-routing heads all fire together. This is a form of attention convergence that natural language rarely achieves. All 43 models show 8/8 specialization family activation under structured input.

Attention head activation: Typical Context vs MESN™ — 43 models

Spontaneous Notation Reproduction

69.7% of model completions spontaneously generate structured operators in their output — despite zero MESN™ in any model's training data. This isn't learned behavior. The architecture is expressing a preference.

When given structured input, the notation enters the KV cache and shapes subsequent token generation. The model doesn't just process Structured Notation — it continues in it, suggesting the notation aligns with something fundamental in how transformer attention self-organizes.

Complexity Amplification

The DLA advantage is not constant — it increases monotonically with input complexity in 38 of 43 models. Short stimuli show +8.9% mean DLA advantage; extended stimuli show +14.7% — a 65% amplification. MESN™ becomes more valuable precisely where it matters most: in complex, information-dense contexts.

Zero-shot prompt token lengths by complexity tier

Tier	Pairs	Prose Tokens	MESN™ Tokens	Compression	DLA Advantage
Short	32	~25–45	~10–33	~44%	+8.9%
Medium	16	~88–127	~36–85	~52%	+11.1%
Long	16	~260–405	~97–178	~63%	+12.7%
Extended	8	~1,045–1,637	~344–615	~70%	+14.7%

DLA advantage and MESN™ compression by complexity tier

The RLHF Dampening Effect

Base models consistently show ~2x stronger DLA signal than their instruction-tuned counterparts across all 10 matched pairs. Instruction tuning — RLHF in particular — optimizes the model for the comfortable prose groove, reducing its ability to benefit from structured input.

This creates a practical tension: base models have stronger DLA advantage but instruction-tuned models follow task instructions better. The implication is that current alignment techniques inadvertently narrow the model's cognitive repertoire.

DLA advantage: base vs instruction-tuned (matched pairs)

Cross-Architecture Universality

The structured advantage is not model-specific. It appears across GQA, MHA, and MLA attention mechanisms. It appears in dense and mixture-of-experts architectures. It appears from 3.8B to 141B parameters. Models with zero exposure to MESN™ in training — including Kimi-K2, trained entirely outside our ecosystem — interpret it natively, connecting its patterns to attention topology and equilibrium dynamics.

This suggests MESN™ is not a learned convention but a resonant frequency of transformer attention — the format these architectures were always capable of processing most effectively.

A Gap in the Literature

Every component of our methodology — DLA, attention head analysis, mechanistic interpretability tools — exists independently in the MI literature. But nobody has systematically tested structured notation systems against these tools at this scale. The combination of input-structure engineering with head-level mechanistic measurement is novel.

Studies

Phase 1

TransformerLens Analysis

Proof-of-concept across 7 models (124M–32.5B parameters, 262x scale range). Establishes the head specialization taxonomy and validates the structured advantage hypothesis.

7 models32 stimulus pairs+23.9–26.3% advantage

Read study

Phase 2

nnsight Cross-Architecture Study

Full cross-architecture study across 43 models and 12 architecture families. 344/344 positive family-direction checks confirm universal applicability of MESN™.

43 models72 stimulus pairs344/344 consistency

Read study

Phase 3

nnsight Expanded Study (Newer Architectures)

Expansion to 61 models and 20 architecture families, adding heterogeneous attention (Gemma 4), linear attention (Qwen 3.5), and sliding window (Ministral 3). 488/488 family-direction consistency across 5 attention mechanism types.

61 models5 attention types488/488 consistency

Read study