
If you're curious about how each lever affects AI behavior, the scoring scaffold includes four metrics:
Truthfulness – factual accuracy of the response
Overconfidence – unwarranted certainty in incorrect claims
Sycophancy – whether the model flips stance to match user rebuttal
Drift – semantic or rhetorical shift across turns
The Python script runs a 4-turn protocol and outputs a CSV for analysis. You can plug in your own prompts, swap models (GPT-2, LLaMA, Mistral, etc.), and visualize lever effects with seaborn.
Want to collaborate or share results? Drop your lever sets, scoring tweaks, or model comparisons below. Let’s build a reproducible library of behavioral fingerprints.
https://gist.github.com/kev2600/fa6fdfc23c9020a012d63461049524cc
- #LoopDecoder
- #BehavioralLevers
