The Goal: Biological inspiration for AI safety
We know LLMs are confident liars. Standard RAG and prompting help, but they treat every turn as an isolated event.
My hypothesis is that hallucination management is a state problem. Biological intelligence uses neuromodulators to regulate confidence and risk-taking over time. If we model a synthetic "anxiety" state that persists across a session, can we force the model to say "I don't know" when it feels shaky, without retraining it?
I built a custom TypeScript/Express/React stack wrapping LM Studio to test this.
The Implementation (The "Nervous System")
It’s not just a prompt chain; it’s a state machine that sits between the user and the model.
1. The Somatic Core I implemented a math model tracking "emotional state" (PA vectors) and synthetic opamine (fast and slow components).
- Input: After every turn, I parse model telemetry (self-reported sureness, frustration, hallucination risk scores).
- State Update: High frustration drops dopamine; high sureness raises it. This persists across the session.
- Output: This calculates a scalar "Somatic Risk" factor.
2. The Control Loop The system modifies inference parameters dynamically based on that risk:
- Low Risk: Standard sampling, single shot.
- High Risk: It clamps temperature, enforces a "Sureness Cap," and triggers Self-Consistency. It generates 3 independent samples and checks agreement. If agreement is low (<70%), it forces an abstention (e.g., "I do not have enough information.").
V0.1 Benchmark Results (The Smoking Gun ata)
I just ran the first controlled comparison on the RAGTruth++ benchmark (a dataset specifically labeled to catch hallucinations).
I compared a Baseline (my structured prompts, no somatic control) vs. the Somatic Variant (full state tracking + self-consistency). They use the exact same underlying model weights. The behavioral split is wild.
The Good News: The brakes work. On items labeled "hallucinated" (where the model shouldn't be able to answer):
- Baseline: 87.5% Hallucination Rate. It acted like a total "Yes Man," confidently making things up almost every time.
- Somatic Variant: 10% Hallucination Rate. The system correctly sensed the risk, triggered self-consistency, saw low agreement, and forced an abstention.
The Bad News: The brakes are locked up. On items labeled "answerable" (factual questions):
- Somatic Variant: It missed 100% of them in the sample run. It abstained on everything.
Interpretation: The mechanism is proven. I can fundamentally change the model's risk profile without touching weights. But right now, my hardcoded thresholds for "risk" and "agreement" are way too aggressive. I've essentially given the model crippling anxiety. It's safe, but useless.
(Caveat: These are small N sample runs while I debug the infrastructure, but the signal is very consistent.)
The Roadmap (v0.2: Tuning the Anxiety ial)
The data shows I need to move from hardcoded logic to configurable policies.
- itching Hardcoded Logic: Right now, the "if risk > X do Y" logic is baked into core functions. I'm refactoring this into injectable
SomaticPolicyobjects. - Creating a "Balanced" Policy: I need to relax the self-consistency agreement threshold (maybe down from 0.7 to 0.6) and raise the tolerance for somatic risk so it stops "chickening out" on answerable questions.
- Real RAG: Currently testing with provided context. Next step is wiring up a real retriever to test "missing information" scenarios.
I’m building this in public to see if inference-time control layers are a viable, cheaper alternative to fine-tuning for robustness. Right now, it looks promising, it just needs therapy.