Costs are 1.6x of GPT-5, but still cheaper than Sonnet 4.5.
Gemini takes exceptionally many steps to iterate on a task, significantly more than GPT-5, only flattening at > 100 steps (but Sonnet 4.5 is higher still).
By varying the maximum steps you allow your agent, you can trade resolution rate vs cost. Gemini 3 is more cost-efficient than Sonnet 4.5, but much less than gpt-5 (or gpt-5-mini)
You can browse all agent trajectories/logs in the webbrowser here: https://docent.transluce.org/dashboard/3641b17f-034e-4b36-aa66-471dfed837d6
Full leaderboard ("bash only"): https://www.swebench.com/ (about to be updated)
All comparisons performed with mini-swe-agent, a bare-bones agent that uses only bash and the same scaffold & prompts for all models for an apple-to-apples comparison. Comes with a claude-code style CLI, too, if you want to try it/reproduce our numbers. https://github.com/SWE-agent/mini-swe-agent/