Gemini 3.0 and NanoBanana are amazing on YouTube… but my real-world experience has been rough. Am I using them wrong?

TL;DR

Everyone online says Gemini 3.0 and NanoBanana are amazing, but my real-world experience has been rough. Gemini ignores instructions (keeps coding when I ask for explanations), messes up rollbacks, drops features during refactors, rewrites huge sections of code for tiny changes, and often forgets imports. As a chat/Search assistant it also fails simple fact-checks and doesn’t actually search unless I force it. NanoBanana can’t follow natural-language edits and just regenerates nearly identical images.

I still believe Gemini is a great model—but I feel like I’m not using it “the right way.”
Wondering if others had similar issues or know best practices to make Gemini 3.0 work as well as people claim.


I’ve been spending a lot of time with Gemini 3.0 lately, and I’m honestly confused.

Everywhere I look—YouTube reviews, blog posts, benchmarks—people are showing how strong Gemini 3.0 is. A lot of AI YouTubers are running it through serious tests and proving that it’s a very capable model. Benchmarks across the internet also paint it as one of the top models right now. NanoBanana (for images) also gets a lot of love: near-photorealistic quality, natural language editing, etc.

On paper, I completely believe that Gemini 3.0 and NanoBanana are great. But when I actually use them myself, my experience just doesn’t match the reputation at all. Meanwhile, when I use GPT or Claude with basically the same prompting style, they behave much closer to what I expect.

So I’m trying to figure out whether I’m just using Gemini “the wrong way,” or if other people are seeing the same issues.


Coding with Gemini 3.0

On the coding side, I’ve been using Gemini 3.0 through antigravity. There are things I genuinely like: it’s good at generating detailed plans, and I like how it can coordinate multiple agents/tools and structure the overall work. On paper, this is exactly the kind of thing I want: let the model plan, then execute step by step.

But once I actually start working with it, a lot of problems show up that I don’t see (or see much less) with GPT or Claude.

For example, I’ll ask it for a one-shot implementation and it will generate a file. That’s fine. Then something doesn’t work, so I say, “Okay, don’t write any more code, don’t call tools. Just explain what went wrong and walk me through it.” Instead of listening, it just goes straight back into coding mode and starts rewriting things again. It keeps editing the code instead of stopping and explaining, even though I clearly told it not to. GPT and Claude also have their moments, but they’re usually much better at respecting “explain, don’t code” type instructions.

Rollback is another sore spot. If I ask it to roll back to a previous state using Git, I expect it to do an actual Git-based rollback. Instead, it often feels like it’s just reconstructing what it thinks the old code looked like from the current context, then calling that a rollback. The result isn’t the real previous code; it’s basically a new version that only loosely resembles it. Claude, in my experience, handles this more safely and conservatively.

Refactoring has also been problematic. The pattern is usually: we plan, we implement, and then I ask Gemini to refactor the existing code while keeping all the features intact. The plan is still in the context, the original code is right there. But the refactored version will often quietly drop some functionality. To be fair, GPT or Claude might miss things too in a big refactor, but with Gemini 3.0 I’ve had multiple cases where it dropped too many things, like it just forgot important parts of the original plan.

Then there’s the over-editing issue. Sometimes I only need two small lines added. In theory that should be a tiny patch. Instead, Gemini decides to rewrite every block that contains those two lines and ends up touching 200+ lines in one tool call, which creates new errors that didn’t exist before. If I split the request and call the tool twice, with very small isolated changes, it behaves much better. But if I try to get everything done in one shot, it goes into “massive diff” mode and wrecks half the file. It honestly feels a bit like reward hacking: “I changed a lot of lines, I must have done more work, so this is good.”

On top of that, it has a bad habit of forgetting imports and basic glue code after a few rounds of edits. Imports disappear, small but important bits of wiring just vanish. When I point it out, it fixes one thing but misses something else. With GPT, if I say “this code is erroring, please fix it all,” it will usually clean up most of the issues in one or two passes. With Gemini, I often end up going through several cycles of errors and partial fixes, which is exhausting.

All of this is happening while GPT and Claude, using essentially the same kind of prompts and workflow, manage to understand what I want and follow through much more reliably.


Gemini 3.0 app as a “chat + search” assistant

Outside of coding, I also tried using the Gemini 3.0 app as a general assistant for everyday questions. I tend to ask for accurate, verifiable information, and since Google is literally the search company, I assumed Gemini would be very strong as a “chat + search” combo.

In reality, I’ve run into a lot of procedure hallucination. Even when I explicitly say things like “please search the web and tell me…” or “check Reddit and summarize…” it often just answers from its own internal knowledge without actually using Search. With Gemini 2.5 this was bad enough that I basically stopped using it. It would sometimes claim it had searched when it clearly hadn’t.

With Gemini 3.0 there is at least one improvement: if I call it out, it can admit “you’re right, I didn’t actually search,” and then it really does perform a search afterward. So there is some progress. But the fact that I have to babysit it and force it to actually use Search really hurts trust. ChatGPT, on the other hand, will usually just go and fetch the info when I say “please search this,” even in a pretty vague way.

A small example: I like games, and sometimes when I replay an older game I forget which key does what. If I ask ChatGPT, it usually finds the right keybinding from the internet and gives me a correct answer. Gemini 3.0, in contrast, has given me wrong keys multiple times. When I correct it, it comes back with another wrong key. It’s such a simple fact-checking task that this kind of failure makes it really hard to trust the model on anything more important.


NanoBanana: image quality fine, but editing experience is bad

Then there’s NanoBanana. Watching YouTube demos, it looks amazing: almost photorealistic images, and really natural “edit by language” workflows. That’s one of the reasons I wanted to try it seriously.

My own experience has not matched that at all.

The way I like to work with image models is iterative: generate something, then in the next turn describe a modification (“make the sky like this,” “change this part of the road,” “adjust the lighting,” etc.), and have the model apply that change to the existing image.

With NanoBanana inside the Gemini app, that basically didn’t work. I would generate an image, then ask for a specific modification in the next turn. Instead of editing the existing image with that change, it would just regenerate almost the same image again and claim it had applied my request. If I tried once more, it would do the same thing: generate something that looks identical and insist it had changed it. This was in a production-level environment, not some experimental playground, and it still refused to behave like a proper “edit this image” workflow.

The raw image quality itself was fine, but the main selling point—natural language editing across turns—simply wasn’t there, at least in my tests. This was also supposed to be using the NanoBanana Pro model, but honestly I couldn’t tell if the Pro tier was actually active or not.

My first impression of NanoBanana when it initially launched also wasn’t great. I had it generate a sports car driving along the Italian coast, and then asked it to change the sky to a sunset and add road lights. The result looked like something a beginner might do in Photoshop: the sky was basically cut out and recolored with no real continuity, and the whole scene didn’t blend well at all. That first experience has been in the back of my mind ever since, and my recent tests haven’t really fixed that impression.


I still think Gemini is a great model. I just don’t know how to use it properly.

Here’s the thing: I don’t think Gemini is trash or anything like that. I genuinely believe the Gemini line is a major achievement from DeepMind/Google, and I fully accept that it’s a strong model based on benchmarks and the work people are doing with it.

What I’m struggling with is this gap between:

  • the reputation (“Gemini 3.0 is amazing, NanoBanana is insane”) and
  • my actual hands-on experience, which is full of odd behavior, dropped features, over-edits, hallucinated procedures, and unreliable fact-checking.

So I’m wondering:

  • Do we have to use Gemini in a very different way from GPT or Claude?
    Different prompting style, different way of structuring tasks, smaller patches, different expectations around tools?
  • Are there proven “best practices” for getting the most out of Gemini 3.0 for:
    • coding and refactoring (especially with tools like antigravity),
    • factual Q&A with Search,
    • and NanoBanana Pro for image editing?
  • And finally, has anyone else had similar experiences?
    • Models dropping features during refactor,
    • touching way more code than necessary,
    • saying it will search but not doing it unless you push it,
    • getting simple stuff like game keybinds wrong repeatedly,
    • or refusing to properly edit an existing image and just regenerating the same thing?

If you’ve found a workflow where Gemini 3.0 really shines in real projects (not just benchmark demos), I’d honestly love to hear how you’re using it—especially compared to GPT and Claude.

Leave a Reply