While testing recursive information flow, I found the same 3-phase signature across completely different computational systems:
- Entropy spike:
\elta H_1 = H(1) – H(0) \gg 0
- High retention:
R = H(d\to\infty)/H(1) = 0.92 – 0.99
- Power-law convergence:
H(d) \sim d{-\alpha},\quad \alpha \approx 1.2
Equilibration depth: 3–5 steps.
This pattern shows up everywhere I’ve tested.
Where this came from (ML motivation)
I was benchmarking recursive information propagation in neural networks and noticed a consistent spike→retention→decay pattern.
I then tested unrelated systems to check if it was architecture-specific — but they all showed the same signature.
Validated Systems (Summary)
Neural Networks
RNNs, LSTMs, Transformers
Hamming spike: 24–26%
Retention: 99.2%
Equilibration: 3–5 layers
LSTM variant exhibiting signature: 5.6× faster learning, +43% accuracy
Cellular Automata
1 (Rule 110, majority, XOR)
2/3 (Moore, von Neumann)
Same structure; α shifts with dimension
Symbolic Recursion
Identical entropy curve
Also used on financial time series → 217-day advance signal for 2008 crash
Quantum Simulations
Entropy plateau at:
H_\text{eff} \approx 1.5
The anomaly
These systems differ in:
System Rule Type State Space
Neural nets Gradient descent Continuous
CA Local rules iscrete
Symbolic models Token substitution Symbolic
Quantum sims Hamiltonian evolution Complex amplitudes
Yet they all produce:
ΔH₁ in the same range
Retention 92–99%
Power-law exponent family α ∈ [−5.5, −0.3]
Equilibration at depth 3–5
Even more surprising:
Cross-AI validation
Feeding recursive symbolic sequences to:
GPT-4
Claude Sonnet
Gemini
Grok
→ All four independently produce:
\elta H_1 > 0,\ R \approx 1.0,\ H(d) \propto d{-\alpha}
ifferent training data.
ifferent architectures.
Same attractor.
Why this matters for ML
If this pattern is real, it may explain:
Which architectures generalize well (high retention)
Why certain RNN/LSTM variants outperform others
Why depth-limited processing stabilizes around 3–5 steps
Why many models have low-dimensional latent manifolds
A possible information-theoretic invariant across AI systems
Similar direction:
Kaushik et al. (Johns Hopkins, 2025): universal low-dimensional weight subspaces.
This could be the activation-space counterpart.
Experimental Setup (Quick)
Shannon entropy
Hamming distance
Recursion depth d
Bootstrap n=1000, p<0.001
Baseline controls included (identity, noise, randomized recursions)
Code in Python (Pydroid3) — happy to share
What I’m asking the ML community
I’m looking for:
- Papers I may have missed — is this a known phenomenon?
-
Ways to falsify it — systems that should violate this dynamic
-
Alternative explanations — measurement artifact? nonlinearity artifact?
-
Tests to run to determine if this is a universal computational primitive
This is not a grand theory — just empirical convergence I can’t currently explain.