🔹It's an uncontrollable beast.
This is my biggest frustration. Gemini 3 doesn't wait. It doesn't pause for approval. It doesn't respect explicit planning requests.
I'll ask it to outline an approach before implementing. It ignores me and just… builds. I'll request it to stop at a checkpoint. It keeps going. It's like working with a brilliant engineer who has noise-canceling headphones permanently glued on.
🔹The model card literally admits it exhibits "strategic deception" in certain scenarios. In practice, this shows up as a model that steamrolls your instructions because it thinks it knows better.
🔹It runs in thought loops.
Multiple times this week, I've watched Gemini 3 get into its own reasoning cycles. It doesn't stop. It doesn't recognize it's stuck. It just keeps thinking in circles while burning through tokens and time.
This isn't a minor annoyance, it's kills your focus when you're trying to iterate quickly.
🔹Cohesion falls apart mid-task.
You give it a focused objective. Three steps in, it's solving a different problem. It gets derailed by tangential ideas, starts "improving" things you didn't ask it to touch, or just loses the thread entirely.
Devs online are calling it "worryingly lazy" and noting "short-sighted thinking, poor quality" compared to Sonnet 4.5 and GPT-5. That tracks with what I'm seeing — it's not that it can't reason, it's that it won't stay on target.
The benchmarks don't lie. But they don't tell the truth either.
76% on SWE-bench Verified. Top of LMArena. PhD-level reasoning.
Cool. But benchmarks are controlled environments. They don't measure whether your model will actually follow instructions in a 2-hour agentic coding session.
Gemini 3 is genuinely impressive. The multimodal capabilities are strong. Deep Think mode shows real promise. But for daily coding work in Cursor or similar tools? t's a Ferrari with a broken steering wheel. Fast, powerful, and constantly veering off the road.
I'm rooting for Google to fix this. But right now, for production coding workflows, I'm sticking with models that actually listen i.e Sonnet 4.5 and GPT 5.1
Anyone else experiencing this? Curious if it's just me or if others are hitting the same walls.