Two-Pathway Attribution Studies
While DLA Studies measure how structured notation shapes attention head behavior, the DLA+MLP track measures both computational pathways simultaneously — attention heads and MLP (feed-forward) layers. This dual measurement reveals that transformer models don't have a single response to Metaphori Engine™ Structured Notation (MESN™); they have architecture-dependent routing patterns.
Why MLP Matters
MLP layers carry approximately 35× more signal magnitude than attention heads in absolute terms. Yet attention heads show 10× stronger differential response to structured notation. The metaphor: MLPs are the highway; attention heads are the steering wheel. MESN™ engages the steering mechanism — but in some architectures, the highway itself responds differently.
Three Routing Patterns
Across 54 models, three distinct routing patterns emerge — each representing a fundamentally different way that transformer architectures process MESN™ input.
Dual-Positive 16 models
Both pathways favor MESN™. Attention heads and MLP layers independently contribute positive signal toward structured notation outputs.
Attention-Dominant 26 models
Attention heads favor MESN™, while MLPs favor prose. The attention pathway overrides the MLP preference — the steering wheel wins.
MLP-Compensating 3 models — all Gemma
MLPs compensate for a negative attention signal. The Gemma family routes structured notation processing primarily through feed-forward layers rather than attention heads — a unique architectural response.
Why This Matters
DLA-only studies capture at most half the picture. A model with strong positive DLA may have strongly negative MLP response — Ministral 3 14B shows +15.2% DLA advantage alongside −25.5% MLP response. Conversely, Gemma's negative DLA is not a failure to respond — it's a rerouting through MLPs.
Understanding both pathways changes which models appear to respond and how strongly. Single-pathway measurement creates a distorted map of how structured notation actually moves through transformer computation.