[P] Zero Catastrophic Forgetting in MoE Continual Learning: 100% Retention Across 12 Multimodal Tasks (Results + Reproducibility Repo)

By skyforbes Dec 4, 2025 No Comments

https://preview.redd.it/idwd99rlr85g1.png?width=2954&format=png&auto=webp&s=ae5db7ed100fab0485063598bc9ef92e0732f24e

I’ve been running a set of continual learning experiments across 12 multimodal tasks (vision, speech, and text), and I managed to build an architecture that essentially eliminates catastrophic forgetting, even without replay.

The key turned out to be a combination of:

ynamic expert expansion (grow only when new distributions appear)
Task embeddings for conditioning shared components
A lightweight retrieval memory
Small task-specific heads for stable readout

With this setup, retention remained almost perfectly stable across the full task sequence. Earlier tasks showed no accuracy collapse even after many training stages, and performance stayed consistent as new tasks came in.

Some highlights from the results

Zero observable catastrophic forgetting across all 12 tasks
Experts expanded only when necessary, matching new distribution shifts
The shared latent space stayed coherent across modalities
Intrinsic signals (e.g., prediction error) boosted stability during training but weren’t needed at inference

For anyone interested in digging into the evaluation pipeline, I’ve packaged the experiment logs, model checkpoints, and a safe inference script here:

🔗 GitHub (Reproducibility / Results)
https://github.com/nkundinezayv/CORA-ContinualLearning

(It's not the full training implementation, but it’s enough to verify the results and understand the evaluation flow.)

I’m sharing this mainly to compare observations with others working on continual or modular learning.
Has anyone explored dynamic expansion or large-scale modular CL setups?

I’d love to hear about bottlenecks, failure modes, or architecture designs that worked well for you.

By skyforbes

MachineLearning

[P] Zero Catastrophic Forgetting in MoE Continual Learning: 100% Retention Across 12 Multimodal Tasks (Results + Reproducibility Repo)

Some highlights from the results

Like this:

By skyforbes

Leave a ReplyCancel reply

You Missed

Japan fires back at ‘unsubstantiated’ Chinese letter to UN

The Pretty Reckless – Where Are You Christmas? [HARD ROCK]

Is the Running Man (2025) worth seeing for the action scenes?

If commercialized space travel will ever exist we will probably be able to buy water from Europa as a luxury product the same way as we are able to buy iceberg water nowadays.

Archives

[P] Zero Catastrophic Forgetting in MoE Continual Learning: 100% Retention Across 12 Multimodal Tasks (Results + Reproducibility Repo)

Some highlights from the results

Like this:

By skyforbes

Related Posts

[D] LLMs Need Better Executive Function

[P] I trained Qwen2.5-Coder-7B for a niche diagramming language and reached 86% code accuracy

[R] Infrastructure Feedback: Is ‘Stateful’ Agent Sandboxing a Must-Have or Nice-to-Have for Production ML Agents?

Leave a ReplyCancel reply

You Missed

Japan fires back at ‘unsubstantiated’ Chinese letter to UN

The Pretty Reckless – Where Are You Christmas? [HARD ROCK]

Is the Running Man (2025) worth seeing for the action scenes?

If commercialized space travel will ever exist we will probably be able to buy water from Europa as a luxury product the same way as we are able to buy iceberg water nowadays.