We Need More Benchmarks Like This. Not Less. (featuring ChatGPT)


This research company just launched an enterprise simulator game and research paper, MAPs, a new benchmark for evaluating agents (including Chatgpt, gemini, Claude and more) on long-horizon planning, world modelling, and strategic decision-making in stochastic, dynamic environments.

Leave a Reply