DLA Study — Phase 3

nnsight Cross-Architecture Study (Expanded)

Jasdeep Jaitla · 2026 · 61 models · 4,436 generation completions

Abstract

This expansion from 43 to 61 models introduces three new architecture families and two novel attention mechanism types to the Metaphori Engine™ Structured Notation (MESN™) cross-architecture study. Using nnsight for remote model introspection, we measure Direct Logit Attribution across transformer language models spanning 2.0B to 141B parameters, 20 architecture families, and 5 attention mechanism types (GQA, MHA, MLA, Heterogeneous, Linear+GQA). 72 matched stimulus pairs across 4 complexity tiers and 8 semantic categories provide the input.

The core finding strengthens: 488 out of 488 family-direction checks are positive.New architectures include Gemma 4 (heterogeneous attention — different head types within the same layer), Qwen 3.5 (hybrid linear+softmax attention), and Ministral 3 (sliding window). Across every model and every head specialization family, MESN™ produces stronger DLA signal than equivalent prose. There are zero exceptions.

488/488
Family-direction positive
55/61
Models with positive DLA
+8.63%
Mean DLA advantage
5
Attention mechanism types

What changed from Phase 2

Phase 3 nearly doubles the architecture family coverage and introduces attention mechanisms that were not represented in the original study.

AspectPhase 2Phase 3
Models4361 (+18)
Family-direction checks344/344488/488
Architecture families1220
Attention types3 (GQA, MHA, MLA)5 (+Heterogeneous, +Linear)
Parameter range3.8B–141B2.0B–141B
Base/instruct pairs1015
Generation completions3,0684,436

Novel attention architectures

The most significant addition in Phase 3 is the inclusion of two attention mechanism types absent from the original study. Heterogeneous attention(Gemma 4) uses different head configurations within the same layer — some heads may use grouped-query while others use multi-head, within a single transformer block. Linear attention (Qwen 3.5) replaces softmax normalization with linear approximation in some heads, a fundamentally different computation path.

Both respond to MESN™. The 488/488 result spans all five attention types. The structured notation effect is not tied to any specific attention mechanism — it operates at the level of how information is encoded in the residual stream, upstream of the attention computation itself.

Attention head activation: Typical Context vs MESN™

Architecture families tested

FamilyModelsAttentionMean DLA %Best Model
Qwen 2.58GQA+15.4%Qwen 2.5 32B Base (+24.2%)
Qwen 33GQA+18.2%Qwen3 14B (+19.6%)
Qwen 3.57Linear+GQA+9.9%Qwen 3.5 9B Base (+15.0%)
DS-R1 Distill4GQA+14.8%DS-R1 Qwen 32B (+20.8%)
Llama 3.14GQA+13.0%Llama 3.1 70B Base (+23.2%)
Mistral/Ministral6GQA+11.5%Mistral 7B Base (+15.2%)
Mixtral3GQA/MoE+6.6%Mixtral 8×7B (+11.2%)
Phi2MHA+11.2%Phi-4 14B (+11.3%)
GLM-42MHA+9.1%GLM-4.5 Air 9B (+15.7%)
Moonlight2MLA/MoE+11.2%Moonlight 16B Base (+12.6%)
Gemma 48Heterogeneous+0.3%Gemma 4 E2B (+3.3%)
Gemma 23GQA–2.9%Gemma 2 9B (+0.1%)
Cohere/Aya2GQA+4.2%Aya Expanse 32B (+8.1%)

Complexity amplification

The monotonic increase from Phase 2 holds and sharpens across the expanded model set. Short stimuli (~+6–9%) → medium (~+8–12%) → long (~+10–14%) → extended (~+13–17%). MESN™ matters most where context is most complex — the advantage grows as task difficulty increases.

DLA advantage and compression by complexity tier

Base vs instruction-tuned models

Across all 15 matched base/instruct pairs — up from 10 in Phase 2 — base models show stronger DLA signal. The pattern is consistent: RLHF and instruction tuning narrow the model's cognitive repertoire, reducing its responsiveness to structured notation. The new pairs (including Qwen 3.5 and Gemma 4 variants) confirm the effect is not family-specific.

DLA advantage: base vs instruction-tuned (matched pairs)

The Gemma question

Gemma models present an apparent paradox. Gemma 2 shows negative mean DLA (−2.9%) and Gemma 4 shows near-zero (+0.3%), yet both families contribute to the 488/488 family-direction result. Every head specialization family still shows a positive direction — the magnitude is simply small.

This paradox motivated the DLA+MLP investigation: if attention heads show weak DLA advantage but the family-direction signal persists, the effect may be partially routed through MLP layers rather than attention alone. The two-pathway hypothesis — that structured notation engages both attention and feedforward circuits — is the subject of ongoing work.