[P] A “foveated” memory layer for LLM agents: +46.7pp accuracy at 256-token context (open-source)

Hi all! I’ve been experimenting with long-term memory for LLM agents under small context budgets, and ended up building a “foveated” memory layer inspired by how the eye focuses.

Landing page / demo / repo:

https://fractal-glyph-tape.vercel.app/

Instead of the usual RAW-TRUNCATE (“take the last N tokens”), the system:

  • Stores conversations as phrase families → glyphs (Mandarin chars used as symbols only) in a structured address space (world / region / tri_path / depth / time_slice).
  • Uses a foveated policy under a fixed token budget:
    • ~30% of tokens on early setup turns (goals/constraints),
    • ~30% on semantically relevant past turns (w.r.t. the current question),
    • ~40% on recent turns for local coherence.

Then I benchmarked it on synthetic multi-turn dialogs where the final question depends on information buried early and padded with filler.

Result (150 episodes, synthetic):

  • At a 256-token budget:
    • RAW-TRUNCATE: 26.7% answer accuracy
    • Foveated (Fractal Glyph Tape): 73.3% → +46.7 percentage points using the same token budget.
  • At 512+ tokens (enough to include the full conversation in this setup), both methods converge at 73.3%, as expected.

So this is not a claim of SOTA on BEAM/MEMTRACK/etc., and it’s on synthetic data for now. It is a concrete, open-source prototype showing that a simple, budget-aware, early+relevant+recent policy can significantly beat naive truncation in the tight-budget regime, and match it when budgets are large.

What’s included:

  • Fractal/glyph memory service (FastAPI + SQLite) with write / read APIs
  • Foveated context selection policy
  • Agent demo wired to this memory layer
  • Benchmark scripts + PHASE-5-RESULTS.md with setup and numbers

I’d be interested in feedback on:

  • How this compares to query-aware compression / retrieval you’ve tried
  • Whether it’s worth running on standard benchmarks (BEAM, MEMTRACK, etc.)
  • Any obvious failure modes I should test for before claiming more than “beats naive truncation on this benchmark”

Leave a Reply