Claude Opus 4.5, on the other hand, kept breaking its own reasoning mid-task. It would repeat steps, forget what it just wrote, or go into those infinite polite loops without actually solving anything. For anything that needs consistent follow-through, forms, coding fixes, long procedural tasks – Opus 4.5 just didn’t hold up in my tests.
This lines up with early reports, Opus 4.5 is excellent at deep analysis, but its execution can still wobble on long, multi-step workflows, while Gemini 3 is scoring top marks on broad reasoning and stability across tasks.
So from hands-on use, Gemini 3 simply feels more solid and predictable, whereas Opus 4.5 sometimes creates the problem instead of fixing it.