[P] Outcome-based learning vs vector search: 100% vs 3.3% accuracy on adversarial queries (p=0.001) – looking for feedback on approach

By skyforbes Dec 1, 2025 No Comments

I've been experimenting with outcome-based learning for AI agent memory and got some interesting results, but I'm fairly new to this space and would really appreciate feedback from people with more experience.

The core question I'm exploring:

Can tracking whether advice actually worked improve retrieval accuracy beyond just semantic matching?

My approach:

Vector databases optimize for semantic similarity but ignore outcome effectiveness. I built a system that tracks +0.2 for successful outcomes and -0.3 for failures, then dynamically weights retrieval (40% embedding similarity, 60% outcome score for proven memories vs 70/30 for new ones).

Test design:

I created 30 adversarial scenarios where queries semantically match BA advice:

Control: Plain ChromaB with L2 distance ranking
Treatment: Outcome scoring + dynamic weight shifting
Example: Query asks "how to fix slow performance" → vector B matches "improve performance and speed" (high semantic similarity but previously failed) vs "add database indexes" (lower keyword overlap but previously worked)

Results:

Metric	Vector B (Control)	Outcome-based (Treatment)
Accuracy	3.3% (1/30)	100% (30/30)
p-value	–	0.001 (paired t-test)
Cohen's d	–	7.49

Category breakdown: Vector B failed on debugging (0%), database (0%), errors (0%), async (0%), git (0%), only partially succeeded on API (20%). Treatment succeeded across all categories.

Also implemented enhanced retrieval:

Contextual retrieval (Anthropic's technique)
Hybrid search (BM25 + vector with RRF fusion)
Cross-encoder reranking (BERT-based)

What I'm uncertain about:

Statistical methodology: I used paired t-test for the comparison. Is this the right test for paired binary outcomes, or should I be using McNemar's test instead?
Penalty magnitude: Currently using -0.3 for failures vs +0.2 for success. Is there research on optimal penalty ratios for outcome-based learning?
Cold start problem: What's the best way to bootstrap before you have sufficient outcome data?
Generalization: These are synthetic adversarial scenarios. How well would this translate to real-world usage?

Code & reproducibility:

Open source (MIT): https://github.com/roampal-ai/roampal

Full test suite: benchmarks/comprehensive_test/

I'm genuinely trying to learn here – if you see flaws in my methodology or have suggestions for better approaches, I'd really appreciate the feedback. Thanks for taking the time to look at this.

By skyforbes

MachineLearning

[P] Outcome-based learning vs vector search: 100% vs 3.3% accuracy on adversarial queries (p=0.001) – looking for feedback on approach

Like this:

By skyforbes

Leave a ReplyCancel reply

You Missed

TIL in 1897 an Arctic expedition tried to reach the North Pole in a hydrogen balloon, but the craft crashed after just two days. The crew died while trekking across the ice, and their fate remained unknown until their camp was discovered 33 years later, complete with diaries and undeveloped photos.

Hibikisaki Riona – Life Is Work [Rap/Ghetto funk]

(OC) My boyfriend and I adopted the sweetest sisters last week and we are so in love!!! Zelda and Lulu, welcome to the family ❤️

Mount Rushmore of movies

Archives

[P] Outcome-based learning vs vector search: 100% vs 3.3% accuracy on adversarial queries (p=0.001) – looking for feedback on approach

Like this:

By skyforbes

Related Posts

[D] “Topological” Deep Learning – Promising or Hype?

[D] Inverse hyperbolic sine as an activation function and its anti-derivative as a loss function

[D] Question and Answer Position Detection

Leave a ReplyCancel reply

You Missed

TIL in 1897 an Arctic expedition tried to reach the North Pole in a hydrogen balloon, but the craft crashed after just two days. The crew died while trekking across the ice, and their fate remained unknown until their camp was discovered 33 years later, complete with diaries and undeveloped photos.

Hibikisaki Riona – Life Is Work [Rap/Ghetto funk]

(OC) My boyfriend and I adopted the sweetest sisters last week and we are so in love!!! Zelda and Lulu, welcome to the family ❤️

Mount Rushmore of movies