DLA+MLP Study — Phase 1

Two-Pathway Analysis: Attention vs MLP Signal Routing

Jasdeep Jaitla · 2026 · 54 models · simultaneous dual-pathway attribution

Abstract

The first systematic comparison of how Metaphori Engine™ Structured Notation (MESN™)signal routes through both computational pathways in transformer models — attention heads (DLA) and MLP layers (feed-forward attribution). By measuring both simultaneously across 54 models, we discover that the structured notation response is a dual-pathway system with architecture-dependent routing.

Models, dual-pathway attribution

35×

MLP magnitude vs attention

r=0.080

Pathway correlation (p=0.56)

Routing patterns discovered

The magnitude gap

Pathway	Mean Advantage	Models Positive	Absolute Magnitude
DLA (Attention)	+8.4%	48/54 (89%)	~22 (mean)
MLP (Feed-forward)	–1.8%	23/54 (43%)	~779 (mean)
Combined	+6.5%	33/54 (61%)	—

Despite carrying 35× more signal, MLPs show a modest −1.8% advantage on average. The attention pathway is smaller but more discriminating. Think: MLPs provide the base signal (highway), attention heads provide the steering (differential routing). MESN™ engages the steering mechanism more strongly.

Pathway independence

DLA and MLP advantages are statistically uncorrelated (Pearson r=0.080, p=0.56). They are genuinely independent pathways — knowing a model's DLA advantage tells you nothing about its MLP advantage. This means DLA-only studies capture at most half the picture.

Three routing patterns

Pattern	Count	DLA Direction	MLP Direction	Interpretation
Dual-Positive	16	+	+	Both pathways favor MESN™
Attention-Dominant	26	+	–	Attention favors MESN™, MLPs favor prose
MLP-Compensating	3	–	+	MLPs compensate for negative attention

Dual-positive models

The 16 models where both pathways favor MESN™ show the highest combined signal. Notable: Llama 3.1 70B Base leads with +23.2% DLA and +24.5% MLP for a +47.7% combined advantage. The DS-R1-Distill family is entirely dual-positive — all 4 models — suggesting that distillation preserves MLP response. Mistral 7B Base reaches +29.0% combined.

Model	DLA	MLP	Combined
Llama 3.1 70B Base	+23.2%	+24.5%	+47.7%
DS-R1-Distill Qwen 14B	+15.0%	+29.4%	+44.4%
DS-R1-Distill Qwen 32B	+18.4%	+15.0%	+33.4%
DS-R1-Distill Qwen 7B	+10.5%	+21.9%	+32.4%
Mistral 7B Base	+15.1%	+13.9%	+29.0%
Qwen 2.5 14B Base	+22.9%	+4.6%	+27.5%

The Gemma resolution

The MLP-Compensating pattern is exclusively Gemma. Gemma's negative DLA is not a failure to respond — it's a rerouting. Structured notation signal travels through MLP layers instead of attention heads.

Model	DLA	MLP	Combined
Gemma 4 E4B IT	–1.5%	+12.1%	+10.6%
Gemma 2 27B	–5.9%	+4.3%	–1.6%
Gemma 4 31B IT	–6.8%	+2.3%	–4.5%

Gemma 4 E4B IT is “rescued” by MLP: its −1.5% DLA would classify it as a negative responder, but +12.1% MLP lifts the combined signal to +10.6%.

Combined signal reshuffles the leaderboard

Adding MLP attribution changes which models appear to respond most strongly. Models with strong DLA but strongly negative MLP drop dramatically:

Aya Expanse 32B: +8.1% DLA, −46.1% MLP = −38.0% combined
Ministral 3 14B: +15.2% DLA, −25.5% MLP = −10.3% combined

A DLA-only study would call these positive responders. The full picture is more complex.

Implications

Two-pathway analysis reveals that structured notation doesn't simply “activate attention heads” — it creates architecture-dependent routing patterns that involve the entire forward pass. Future work should always measure both pathways simultaneously.