Two-Pathway Attribution Studies

While DLA Studies measure how structured notation shapes attention head behavior, the DLA+MLP track measures both computational pathways simultaneously — attention heads and MLP (feed-forward) layers. This dual measurement reveals that transformer models don't have a single response to Metaphori Engine™ Structured Notation (MESN™); they have architecture-dependent routing patterns.

54
Models with dual-pathway attribution
3
Routing patterns discovered
35×
MLP magnitude vs attention
r=0.080
Pathways uncorrelated

Why MLP Matters

MLP layers carry approximately 35× more signal magnitude than attention heads in absolute terms. Yet attention heads show 10× stronger differential response to structured notation. The metaphor: MLPs are the highway; attention heads are the steering wheel. MESN™ engages the steering mechanism — but in some architectures, the highway itself responds differently.

Three Routing Patterns

Across 54 models, three distinct routing patterns emerge — each representing a fundamentally different way that transformer architectures process MESN™ input.

Dual-Positive 16 models

Both pathways favor MESN™. Attention heads and MLP layers independently contribute positive signal toward structured notation outputs.

Attention-Dominant 26 models

Attention heads favor MESN™, while MLPs favor prose. The attention pathway overrides the MLP preference — the steering wheel wins.

MLP-Compensating 3 models — all Gemma

MLPs compensate for a negative attention signal. The Gemma family routes structured notation processing primarily through feed-forward layers rather than attention heads — a unique architectural response.

Why This Matters

DLA-only studies capture at most half the picture. A model with strong positive DLA may have strongly negative MLP response — Ministral 3 14B shows +15.2% DLA advantage alongside −25.5% MLP response. Conversely, Gemma's negative DLA is not a failure to respond — it's a rerouting through MLPs.

Understanding both pathways changes which models appear to respond and how strongly. Single-pathway measurement creates a distorted map of how structured notation actually moves through transformer computation.