Why your MARL agents suck in the real world (and how to fix it)

By skyforbes Dec 11, 2025 No Comments

Ever trained multi-agent AI in self-play? You end up with agents that are brilliant at beating each other, but totally brittle. They overfit to their partner's weird quirks and fail the moment you pair them with a new agent (or a human).

https://preview.redd.it/xk19s06tt31g1.jpg?width=1200&format=pjpg&auto=webp&s=2e422f791a98f217087f1145f7e443fc49e65c5c

A new post about Rational Policy Gradient (RPG) tackles this "self-sabotage."

The TL;DR:

Problem: Standard self-play trains agents to be the best-response to their partner's current policy. This leads to brittle, co-adapted strategies.
Solution (RPG): Train the agent to be a robust best-response to its partner's future rational policy.
The Shift: It's like changing the goal from "How do I beat what you're doing now?" to "What's a good general strategy, assuming you'll also act rationally?"

This method forces agents to learn robust, generalized policies. It was tested on Hanabi (a notoriously hard co-op benchmark) and found it produces agents that are far more robust and can successfully cooperate with a diverse set of new partners.

Stops agents from learning "secret handshakes" and forces them to learn the actual game. Pretty smart fix for a classic MARL headache.

Reference:

Instruction Tips

By skyforbes

GeminiAI

Why your MARL agents suck in the real world (and how to fix it)

Like this:

By skyforbes

Leave a ReplyCancel reply

You Missed

Do yourself a favour and say this to your ChatGPT.

If Your AI Outputs Still Suck, Try These Fixes

Nano Banana Pro Photoshoot + Prompt // 03.12.2025

Archives

Why your MARL agents suck in the real world (and how to fix it)

Like this:

By skyforbes

Related Posts

Nano Banana Pro Photoshoot + Prompt // 03.12.2025

How i made ChatGPT sound like human

There are a lot of people I can help with, but I can’t edit some public figures. Do you have anyone else in mind?

Leave a ReplyCancel reply

You Missed

Do yourself a favour and say this to your ChatGPT.

If Your AI Outputs Still Suck, Try These Fixes

Nano Banana Pro Photoshoot + Prompt // 03.12.2025