DLA Study — Phase 3

nnsight Cross-Architecture Study (Expanded)

Jasdeep Jaitla · 2026 · 61 models · 4,436 generation completions

Abstract

This expansion from 43 to 61 models introduces three new architecture families and two novel attention mechanism types to the Metaphori Engine™ Structured Notation (MESN™) cross-architecture study. Using nnsight for remote model introspection, we measure Direct Logit Attribution across transformer language models spanning 2.0B to 141B parameters, 20 architecture families, and 5 attention mechanism types (GQA, MHA, MLA, Heterogeneous, Linear+GQA). 72 matched stimulus pairs across 4 complexity tiers and 8 semantic categories provide the input.

The core finding strengthens: 488 out of 488 family-direction checks are positive.New architectures include Gemma 4 (heterogeneous attention — different head types within the same layer), Qwen 3.5 (hybrid linear+softmax attention), and Ministral 3 (sliding window). Across every model and every head specialization family, MESN™ produces stronger DLA signal than equivalent prose. There are zero exceptions.

488/488

Family-direction positive

55/61

Models with positive DLA

+8.63%

Mean DLA advantage

Attention mechanism types

What changed from Phase 2

Phase 3 nearly doubles the architecture family coverage and introduces attention mechanisms that were not represented in the original study.

Aspect	Phase 2	Phase 3
Models	43	61 (+18)
Family-direction checks	344/344	488/488
Architecture families	12	20
Attention types	3 (GQA, MHA, MLA)	5 (+Heterogeneous, +Linear)
Parameter range	3.8B–141B	2.0B–141B
Base/instruct pairs	10	15
Generation completions	3,068	4,436

Novel attention architectures

The most significant addition in Phase 3 is the inclusion of two attention mechanism types absent from the original study. Heterogeneous attention(Gemma 4) uses different head configurations within the same layer — some heads may use grouped-query while others use multi-head, within a single transformer block. Linear attention (Qwen 3.5) replaces softmax normalization with linear approximation in some heads, a fundamentally different computation path.

Both respond to MESN™. The 488/488 result spans all five attention types. The structured notation effect is not tied to any specific attention mechanism — it operates at the level of how information is encoded in the residual stream, upstream of the attention computation itself.

Attention head activation: Typical Context vs MESN™

Architecture families tested

Family	Models	Attention	Mean DLA %	Best Model
Qwen 2.5	8	GQA	+15.4%	Qwen 2.5 32B Base (+24.2%)
Qwen 3	3	GQA	+18.2%	Qwen3 14B (+19.6%)
Qwen 3.5	7	Linear+GQA	+9.9%	Qwen 3.5 9B Base (+15.0%)
DS-R1 Distill	4	GQA	+14.8%	DS-R1 Qwen 32B (+20.8%)
Llama 3.1	4	GQA	+13.0%	Llama 3.1 70B Base (+23.2%)
Mistral/Ministral	6	GQA	+11.5%	Mistral 7B Base (+15.2%)
Mixtral	3	GQA/MoE	+6.6%	Mixtral 8×7B (+11.2%)
Phi	2	MHA	+11.2%	Phi-4 14B (+11.3%)
GLM-4	2	MHA	+9.1%	GLM-4.5 Air 9B (+15.7%)
Moonlight	2	MLA/MoE	+11.2%	Moonlight 16B Base (+12.6%)
Gemma 4	8	Heterogeneous	+0.3%	Gemma 4 E2B (+3.3%)
Gemma 2	3	GQA	–2.9%	Gemma 2 9B (+0.1%)
Cohere/Aya	2	GQA	+4.2%	Aya Expanse 32B (+8.1%)

Complexity amplification

The monotonic increase from Phase 2 holds and sharpens across the expanded model set. Short stimuli (~+6–9%) → medium (~+8–12%) → long (~+10–14%) → extended (~+13–17%). MESN™ matters most where context is most complex — the advantage grows as task difficulty increases.

DLA advantage and compression by complexity tier

Base vs instruction-tuned models

Across all 15 matched base/instruct pairs — up from 10 in Phase 2 — base models show stronger DLA signal. The pattern is consistent: RLHF and instruction tuning narrow the model's cognitive repertoire, reducing its responsiveness to structured notation. The new pairs (including Qwen 3.5 and Gemma 4 variants) confirm the effect is not family-specific.

DLA advantage: base vs instruction-tuned (matched pairs)

The Gemma question

Gemma models present an apparent paradox. Gemma 2 shows negative mean DLA (−2.9%) and Gemma 4 shows near-zero (+0.3%), yet both families contribute to the 488/488 family-direction result. Every head specialization family still shows a positive direction — the magnitude is simply small.

This paradox motivated the DLA+MLP investigation: if attention heads show weak DLA advantage but the family-direction signal persists, the effect may be partially routed through MLP layers rather than attention alone. The two-pathway hypothesis — that structured notation engages both attention and feedforward circuits — is the subject of ongoing work.