[D] I built a synthetic “nervous system” (Dopamine + State) to stop my local LLM from hallucinating. V0.1 Results: The brakes work, but now they’re locked up.

By skyforbes Dec 8, 2025 No Comments

TL;R: I’m experimenting with an orchestration layer that tracks a synthetic "somatic" state (dopamine and emotion vectors) across a session for local LLMs. High risk/low dopamine triggers defensive sampling (self-consistency and abstention). Just got the first real benchmark data back: it successfully nuked the hallucination rate compared to the baseline, but it's currently tuned so anxiously that it refuses to answer real questions too.

The Goal: Biological inspiration for AI safety

We know LLMs are confident liars. Standard RAG and prompting help, but they treat every turn as an isolated event.

My hypothesis is that hallucination management is a state problem. Biological intelligence uses neuromodulators to regulate confidence and risk-taking over time. If we model a synthetic "anxiety" state that persists across a session, can we force the model to say "I don't know" when it feels shaky, without retraining it?

I built a custom TypeScript/Express/React stack wrapping LM Studio to test this.

The Implementation (The "Nervous System")

It’s not just a prompt chain; it’s a state machine that sits between the user and the model.

1. The Somatic Core I implemented a math model tracking "emotional state" (PA vectors) and synthetic opamine (fast and slow components).

Input: After every turn, I parse model telemetry (self-reported sureness, frustration, hallucination risk scores).
State Update: High frustration drops dopamine; high sureness raises it. This persists across the session.
Output: This calculates a scalar "Somatic Risk" factor.

2. The Control Loop The system modifies inference parameters dynamically based on that risk:

Low Risk: Standard sampling, single shot.
High Risk: It clamps temperature, enforces a "Sureness Cap," and triggers Self-Consistency. It generates 3 independent samples and checks agreement. If agreement is low (<70%), it forces an abstention (e.g., "I do not have enough information.").

V0.1 Benchmark Results (The Smoking Gun ata)

I just ran the first controlled comparison on the RAGTruth++ benchmark (a dataset specifically labeled to catch hallucinations).

I compared a Baseline (my structured prompts, no somatic control) vs. the Somatic Variant (full state tracking + self-consistency). They use the exact same underlying model weights. The behavioral split is wild.

The Good News: The brakes work. On items labeled "hallucinated" (where the model shouldn't be able to answer):

Baseline: 87.5% Hallucination Rate. It acted like a total "Yes Man," confidently making things up almost every time.
Somatic Variant: 10% Hallucination Rate. The system correctly sensed the risk, triggered self-consistency, saw low agreement, and forced an abstention.

The Bad News: The brakes are locked up. On items labeled "answerable" (factual questions):

Somatic Variant: It missed 100% of them in the sample run. It abstained on everything.

Interpretation: The mechanism is proven. I can fundamentally change the model's risk profile without touching weights. But right now, my hardcoded thresholds for "risk" and "agreement" are way too aggressive. I've essentially given the model crippling anxiety. It's safe, but useless.

(Caveat: These are small N sample runs while I debug the infrastructure, but the signal is very consistent.)

The Roadmap (v0.2: Tuning the Anxiety ial)

The data shows I need to move from hardcoded logic to configurable policies.

itching Hardcoded Logic: Right now, the "if risk > X do Y" logic is baked into core functions. I'm refactoring this into injectable SomaticPolicy objects.
Creating a "Balanced" Policy: I need to relax the self-consistency agreement threshold (maybe down from 0.7 to 0.6) and raise the tolerance for somatic risk so it stops "chickening out" on answerable questions.
Real RAG: Currently testing with provided context. Next step is wiring up a real retriever to test "missing information" scenarios.

I’m building this in public to see if inference-time control layers are a viable, cheaper alternative to fine-tuning for robustness. Right now, it looks promising, it just needs therapy.

By skyforbes

MachineLearning

[D] I built a synthetic “nervous system” (Dopamine + State) to stop my local LLM from hallucinating. V0.1 Results: The brakes work, but now they’re locked up.

The Goal: Biological inspiration for AI safety

The Implementation (The "Nervous System")

V0.1 Benchmark Results (The Smoking Gun ata)

The Roadmap (v0.2: Tuning the Anxiety ial)

Like this:

By skyforbes

Leave a ReplyCancel reply

You Missed

Period pieces/biopics that are actually more of think pieces like Amadeus

My Secret Santa skipped me while I went out of my way to make sure everyone else got included.

NixOS saved me from leaving Linux

How to analyze a conversation with ChatGPT (GPT-5) to know which answers are based on history and which ones are just suggestions?

Archives

[D] I built a synthetic “nervous system” (Dopamine + State) to stop my local LLM from hallucinating. V0.1 Results: The brakes work, but now they’re locked up.

The Goal: Biological inspiration for AI safety

The Implementation (The "Nervous System")

V0.1 Benchmark Results (The Smoking Gun ata)

The Roadmap (v0.2: Tuning the Anxiety ial)

Like this:

By skyforbes

Related Posts

[P] Fast and Simple Solution to Kaggle’s `Jigsaw – Agile Community Rules Classification`

[D] A contract-driven agent runtime: separating workflows, state, and LLM contract generation

[R] Adopting a human developmental visual diet yields robust, shape-based AI vision

Leave a ReplyCancel reply

You Missed

Period pieces/biopics that are actually more of think pieces like Amadeus

My Secret Santa skipped me while I went out of my way to make sure everyone else got included.

NixOS saved me from leaving Linux

How to analyze a conversation with ChatGPT (GPT-5) to know which answers are based on history and which ones are just suggestions?