DLA Study — Phase 3
nnsight Cross-Architecture Study (Expanded)
Jasdeep Jaitla · 2026 · 61 models · 4,436 generation completions
Abstract
This expansion from 43 to 61 models introduces three new architecture families and two novel attention mechanism types to the Metaphori Engine™ Structured Notation (MESN™) cross-architecture study. Using nnsight for remote model introspection, we measure Direct Logit Attribution across transformer language models spanning 2.0B to 141B parameters, 20 architecture families, and 5 attention mechanism types (GQA, MHA, MLA, Heterogeneous, Linear+GQA). 72 matched stimulus pairs across 4 complexity tiers and 8 semantic categories provide the input.
The core finding strengthens: 488 out of 488 family-direction checks are positive.New architectures include Gemma 4 (heterogeneous attention — different head types within the same layer), Qwen 3.5 (hybrid linear+softmax attention), and Ministral 3 (sliding window). Across every model and every head specialization family, MESN™ produces stronger DLA signal than equivalent prose. There are zero exceptions.
What changed from Phase 2
Phase 3 nearly doubles the architecture family coverage and introduces attention mechanisms that were not represented in the original study.
| Aspect | Phase 2 | Phase 3 |
|---|---|---|
| Models | 43 | 61 (+18) |
| Family-direction checks | 344/344 | 488/488 |
| Architecture families | 12 | 20 |
| Attention types | 3 (GQA, MHA, MLA) | 5 (+Heterogeneous, +Linear) |
| Parameter range | 3.8B–141B | 2.0B–141B |
| Base/instruct pairs | 10 | 15 |
| Generation completions | 3,068 | 4,436 |
Novel attention architectures
The most significant addition in Phase 3 is the inclusion of two attention mechanism types absent from the original study. Heterogeneous attention(Gemma 4) uses different head configurations within the same layer — some heads may use grouped-query while others use multi-head, within a single transformer block. Linear attention (Qwen 3.5) replaces softmax normalization with linear approximation in some heads, a fundamentally different computation path.
Both respond to MESN™. The 488/488 result spans all five attention types. The structured notation effect is not tied to any specific attention mechanism — it operates at the level of how information is encoded in the residual stream, upstream of the attention computation itself.
Attention head activation: Typical Context vs MESN™
Architecture families tested
| Family | Models | Attention | Mean DLA % | Best Model |
|---|---|---|---|---|
| Qwen 2.5 | 8 | GQA | +15.4% | Qwen 2.5 32B Base (+24.2%) |
| Qwen 3 | 3 | GQA | +18.2% | Qwen3 14B (+19.6%) |
| Qwen 3.5 | 7 | Linear+GQA | +9.9% | Qwen 3.5 9B Base (+15.0%) |
| DS-R1 Distill | 4 | GQA | +14.8% | DS-R1 Qwen 32B (+20.8%) |
| Llama 3.1 | 4 | GQA | +13.0% | Llama 3.1 70B Base (+23.2%) |
| Mistral/Ministral | 6 | GQA | +11.5% | Mistral 7B Base (+15.2%) |
| Mixtral | 3 | GQA/MoE | +6.6% | Mixtral 8×7B (+11.2%) |
| Phi | 2 | MHA | +11.2% | Phi-4 14B (+11.3%) |
| GLM-4 | 2 | MHA | +9.1% | GLM-4.5 Air 9B (+15.7%) |
| Moonlight | 2 | MLA/MoE | +11.2% | Moonlight 16B Base (+12.6%) |
| Gemma 4 | 8 | Heterogeneous | +0.3% | Gemma 4 E2B (+3.3%) |
| Gemma 2 | 3 | GQA | –2.9% | Gemma 2 9B (+0.1%) |
| Cohere/Aya | 2 | GQA | +4.2% | Aya Expanse 32B (+8.1%) |
Complexity amplification
The monotonic increase from Phase 2 holds and sharpens across the expanded model set. Short stimuli (~+6–9%) → medium (~+8–12%) → long (~+10–14%) → extended (~+13–17%). MESN™ matters most where context is most complex — the advantage grows as task difficulty increases.
DLA advantage and compression by complexity tier
Base vs instruction-tuned models
Across all 15 matched base/instruct pairs — up from 10 in Phase 2 — base models show stronger DLA signal. The pattern is consistent: RLHF and instruction tuning narrow the model's cognitive repertoire, reducing its responsiveness to structured notation. The new pairs (including Qwen 3.5 and Gemma 4 variants) confirm the effect is not family-specific.
DLA advantage: base vs instruction-tuned (matched pairs)
The Gemma question
Gemma models present an apparent paradox. Gemma 2 shows negative mean DLA (−2.9%) and Gemma 4 shows near-zero (+0.3%), yet both families contribute to the 488/488 family-direction result. Every head specialization family still shows a positive direction — the magnitude is simply small.
This paradox motivated the DLA+MLP investigation: if attention heads show weak DLA advantage but the family-direction signal persists, the effect may be partially routed through MLP layers rather than attention alone. The two-pathway hypothesis — that structured notation engages both attention and feedforward circuits — is the subject of ongoing work.