DLA Study — Phase 2

nnsight Cross-Architecture Study

Jasdeep Jaitla · 2025 · 43 models · 3,068 generation completions

Abstract

A comprehensive study of Metaphori Engine™ Structured Notation (MESN™)across 43 models. Using nnsight for remote model introspection, we measure Direct Logit Attribution across transformer language models spanning 3.8B to 141B parameters, 12 architecture families, and 3 attention mechanism types (GQA, MHA, MLA). 72 matched stimulus pairs across 4 complexity tiers and 8 semantic categories provide the input.

The headline result: 344 out of 344 family-direction checks are positive.Across every model and every head specialization family, MESN™ produces stronger DLA signal than equivalent prose. There are zero exceptions.

Methodology

For each attention head at each layer, DLA decomposes the head's direct contribution to the model's output logits. We hook into the forward pass via nnsight, capturing per-head activation magnitude (L2 norm of DLA vector), attention entropy, per-token log probabilities, top-5 attended tokens, and MoE routing logits where applicable.

Each head is classified into one of 8 specialization families based on pattern-matching its top-5 attended tokens: symbolic/mathematical, code/syntactic, semantic/conceptual, relational/logical, hierarchical/spatial, meta-routing, repetition/emphasis, and constraint/negation.

Key findings

344/344
Family-direction positive
41/43
Models with positive DLA
+10.61%
Mean DLA advantage
+24.2%
Peak (Qwen 2.5 32B Base)

Attention head activation: Typical Context vs MESN™

Input/output perplexity inversion

MESN™ produces high perplexity on input — the model finds it surprising. Yet generation perplexity drops: the PPL ratio falls below 1.0 in 28 of 43 models, reaching 0.900 at the extended complexity tier. The model generates more fluently after processing harder input.

This breaks the core assumption of prompt engineering (“make input easy”) and aligns with Bjork's desirable difficulty: the structured format forces multi-family head recruitment, producing richer internal representations that transfer to generation.

Complexity amplification

The DLA advantage increases monotonically with input complexity in 38 of 43 models. Short stimuli (+8.9%) → medium (+11.1%) → long (+12.7%) → extended (+14.7%). This 65% amplification from short to extended means MESN™ matters most where context is most complex.

DLA advantage and compression by complexity tier

Base vs instruction-tuned models

Across all 10 matched base/instruct pairs, base models show stronger DLA signal. The effect is dramatic in some families — Llama 3.1 70B drops from 23.2% (base) to 8.0% (instruct), a 65% reduction. Qwen 2.5 models retain more signal post-tuning (97% retention at 32B). RLHF appears to optimize for the “prose groove,” narrowing the model's cognitive repertoire.

DLA advantage: base vs instruction-tuned (matched pairs)

Spontaneous notation reproduction

2,140 of 3,068 generation completions (69.7%) spontaneously reproduce structured operators in their output. No model in this study was trained on MESN™. The operators enter the KV cache and shape subsequent generation — the model doesn't just process the notation, it continues in it.

This was independently confirmed with Kimi-K2, a model trained entirely outside our ecosystem. It interpreted MESN™ natively, connecting its patterns to attention topology and equilibrium dynamics — further evidence of architectural preference over learned behavior.

Architecture families tested

FamilyModelsAttentionMean DLA %Best Model
Qwen 2.58GQA+15.4%Qwen 2.5 32B Base (+24.2%)
Qwen 33GQA+18.2%Qwen3 14B (+19.6%)
DS-R1 Distill4GQA+14.8%DS-R1 Qwen 32B (+20.8%)
Llama 3.14GQA+9.2%Llama 3.1 70B Base (+23.2%)
Mistral2GQA+12.9%Mistral 7B Base (+15.2%)
Mixtral3GQA/MoE+6.6%Mixtral 8×7B (+11.2%)
Phi2MHA+11.2%Phi-4 14B (+11.3%)
GLM-42MHA+9.1%GLM-4.5 Air 9B (+15.7%)
Moonlight2MLA/MoE+11.2%Moonlight 16B Base (+12.6%)
Cohere2GQA+4.2%Aya Expanse 32B (+8.1%)
Gemma3GQA–2.9%Gemma 2 9B (+0.1%)