[P] Outcome-based learning vs vector search: 100% vs 3.3% accuracy on adversarial queries (p=0.001) – looking for feedback on approach

By skyforbes Dec 1, 2025 No Comments

I've been experimenting with outcome-based learning for AI agent memory and got some interesting results, but I'm fairly new to this space and would really appreciate feedback from people with more experience.

The core question I'm exploring:

Can tracking whether advice actually worked improve retrieval accuracy beyond just semantic matching?

My approach:

Vector databases optimize for semantic similarity but ignore outcome effectiveness. I built a system that tracks +0.2 for successful outcomes and -0.3 for failures, then dynamically weights retrieval (40% embedding similarity, 60% outcome score for proven memories vs 70/30 for new ones).

Test design:

I created 30 adversarial scenarios where queries semantically match BA advice:

Control: Plain ChromaB with L2 distance ranking
Treatment: Outcome scoring + dynamic weight shifting
Example: Query asks "how to fix slow performance" → vector B matches "improve performance and speed" (high semantic similarity but previously failed) vs "add database indexes" (lower keyword overlap but previously worked)

Results:

Metric	Vector B (Control)	Outcome-based (Treatment)
Accuracy	3.3% (1/30)	100% (30/30)
p-value	–	0.001 (paired t-test)
Cohen's d	–	7.49

Category breakdown: Vector B failed on debugging (0%), database (0%), errors (0%), async (0%), git (0%), only partially succeeded on API (20%). Treatment succeeded across all categories.

Also implemented enhanced retrieval:

Contextual retrieval (Anthropic's technique)
Hybrid search (BM25 + vector with RRF fusion)
Cross-encoder reranking (BERT-based)

What I'm uncertain about:

Statistical methodology: I used paired t-test for the comparison. Is this the right test for paired binary outcomes, or should I be using McNemar's test instead?
Penalty magnitude: Currently using -0.3 for failures vs +0.2 for success. Is there research on optimal penalty ratios for outcome-based learning?
Cold start problem: What's the best way to bootstrap before you have sufficient outcome data?
Generalization: These are synthetic adversarial scenarios. How well would this translate to real-world usage?

Code & reproducibility:

Open source (MIT): https://github.com/roampal-ai/roampal

Full test suite: benchmarks/comprehensive_test/

I'm genuinely trying to learn here – if you see flaws in my methodology or have suggestions for better approaches, I'd really appreciate the feedback. Thanks for taking the time to look at this.

By skyforbes

MachineLearning

[P] Outcome-based learning vs vector search: 100% vs 3.3% accuracy on adversarial queries (p=0.001) – looking for feedback on approach

Like this:

By skyforbes

Leave a ReplyCancel reply

You Missed

Is there a collective list of the biggest financial flops in gaming history?

Taiwanese authorities raid Intel exec’s home, seize his computers

TIL that two species of Sacoglossan sea slugs can self-decapitate, shedding their entire body while keeping only their head, and then regenerate a new body in 17 days – AND they can do this multiple times!

At 100 years old, the Grand Ole Opry is the keeper of country music’s legacy

Archives

[P] Outcome-based learning vs vector search: 100% vs 3.3% accuracy on adversarial queries (p=0.001) – looking for feedback on approach

Like this:

By skyforbes

Related Posts

[D] “Topological” Deep Learning – Promising or Hype?

[D] Inverse hyperbolic sine as an activation function and its anti-derivative as a loss function

[D] Question and Answer Position Detection

Leave a ReplyCancel reply

You Missed

Is there a collective list of the biggest financial flops in gaming history?

Taiwanese authorities raid Intel exec’s home, seize his computers

TIL that two species of Sacoglossan sea slugs can self-decapitate, shedding their entire body while keeping only their head, and then regenerate a new body in 17 days – AND they can do this multiple times!

At 100 years old, the Grand Ole Opry is the keeper of country music’s legacy