Modern large language models learn vast amounts of knowledge; yet when we try to teach them something new, they tend to forget what they already know. This challenge, known as catastrophic forgetting, limits how flexibly models can adapt over time. Meta AI’s Sparse Memory Finetuning introduces a clever, lightweight solution: instead of changing the entire model, it teaches selectively updating only the few memory slots that truly matter.
This approach rethinks how neural memory works. Traditional finetuning touches millions of parameters, creating interference between new and old knowledge. Sparse memory finetuning, by contrast, works like adjusting the bookmarks inside a vast library touching only those pages that need revision, while the rest of the library remains untouched.
The Intuition Behind Sparse Memory Finetuning
At the heart of this method lies the memory layer; a parametric component inside the model that stores contextual knowledge. During training, each input token interacts with only a small subset of memory slots. The insight here is that not every slot needs to be updated for every new learning event.
Learn more about Continual Learning via Sparse Memory Finetuning (Paper Review)
