Both on the web app and when "Grounding with Google Search" is enabled within AI Studio or API, the model gets access to a tool called google:search. You'd think that with access to a world-class search engine, the model would be able to comprehensively investigate a topic, but that's far from reality.
The Google search integration is a complete mess that actively sabotages Gemini by choking it with a bunch of snippets instead of letting it read actual content like every other LLM provider on the planet.
Here's an example of what the tool gives the model when it searches for "platypus facts":
[SearchResults(query="platypus facts", results=[PerQueryResult(index='1.1', snippet='9 Interesting <b>platypus facts</b> | WWF Australia: (2024-04-10) 1. Platypuses are venomous. They might look cute and cuddly but come across a male platypus in mating season and you\'ll be in for a painful shock.\n...\n(2024-04-10) The platypus is an iconic Australian mammal...', source_title='wwf.org.au', url='https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQHNg7PSLRYuSjOBOD9c_cflXpDFWHSjp8JT9sk-l0RvBihzxPrHShhqA_cU5X-gkNVpzMQEkdFCDRmot6RbYTVXPA1ssJoLketh0wResHmnhF8KI5CT_xUN-Zf6WX29WRFDkPlDjNV_6-uZs0cU3wVO'), PerQueryResult(...
First, I don't agree with giving the model a structured output for a request for inherently unstructured data. Second, it makes no sense to have HTML tags like <b></b> within the response; the model speaks Markdown, not HTML, so why give it pseudo-HTML?
But the most glaring issue is that the model is kneecapped in the sense that it CANNOT open a specific website it gets from its search query to read its content beyond the snippet it's given. This is fine for basic queries, but for multi-step research it renders the model incapable of investigating something thoroughly. For example, if you ask it for the schema of a specific API it doesn't have in its knowledge, it can search for that API, but much of it will be omitted from the snippets. Since it can't actually read the website, the only way to ascertain the rest of the schema is guesswork.
For reference, OpenAI feeds its models something like this:
“`
Horses on Venus: Myth, Mirage, or Meteorology? (<a href="https://www.interplanetary-equestrians.org/horses-on-venus">https://www.interplanetary-equestrians.org/horses-on-venus</a>)
[wordlim: 120] Published: 2 days ago; The idea of "Venusian horses" began as a misinterpretation of atmospheric radio echoes recorded by early orbiters…
Imaginary Creatures of the Inner Solar System (<a href="https://galacticfieldguide.example/venus/imaginary-horses">https://galacticfieldguide.example/venus/imaginary-horses</a>)
[wordlim: 200] Content type: text/html; 14 Feb 2022 — In speculative xenobiology, "horses on Venus" are often depicted as translucent, buoyant organisms…
“`
Notice how it returns semi-structured text rather than a rigid schema?
OpenAI also gives its models the capability to open a specific link, which will be parsed and returned back to the model in a Markdown-ish format.
To compound all of this, you're literally unable to see anything related to the model's search queries on the Gemini web app, and in the API you're only able to see a list of search queries used after the response is complete. You have no visibility into where the query took place within its chain-of-thought, which is crucial when you're trying to determine the comprehensiveness of the model's search efforts. For example: "Did the model search for XYZ, find only half the picture, then search for the other half? Or did the model just search the web to tick a box and return half-assed results?"
To top it all off, the model clearly was not fine-tuned with effective web searches in mind. For such a large model, its extreme tendency to rely on internal knowledge when faced with a task clearly focused on recency is just baffling.
For example, when I asked it "what is the latest gemini model?", it searched for "latest google gemini model november 2025", "Gemini 2.0 release date rumors November 2025", "Gemini 1.5 Pro updates November 2025". We can all see the issue here: it completely jumps the gun by running targeted queries rather than broad ones for time-sensitive questions.
In fact, this applies to Gemini in so many other areas. For example, in agentic coding, it's extremely eager and will completely refactor your codebase despite instructions to only modify a single file.
A model like GPT-5.1, which clearly has had a better SFT/RLHF pipeline than Gemini 3 for tool calling, shows much more maturity: when I asked it the same question, it searched for "latest Google Gemini model November 2025" and '"announces" "Gemini" model October November 2025'.
You'd think that the Deep Research feature on the Gemini app would solve some of these pain points, but it doesn't and brings so many of its own.
The Deep Research feature STILL uses the same shitty web search logic that only returns snippets, meaning it still has the same architectural limitation of not being able to read a specific website's contents. Therefore, the whole purpose of "deeper" research is completely negated because more snippet confetti ≠ better results.
Additionally, the system prompt for Deep Research is UTTERLY GARBAGE. I've never seen a system that so blatantly and repeatedly ignores instructions. If you tell it to organize the document a certain way, it just won't. Ask it—multiple times if you'd like—to not add an intro and conclusion to the document. The rest is better left unsaid.
Let's look at an example:
I asked the Deep Research feature (on Gemini 3 Pro) to give me a comprehensive technical specification for implementing an OpenAI API wrapper. I was extremely explicit: no intro or conclusion, just the implementation details. I needed JSON schemas, exact request/response examples, streaming formats, error handling, authentication headers, etc. I literally said "give me A LOT of JSON examples" and "this should be comprehensive enough to fully serve as a single source of truth to implement this interface with no external sources."
What did I get? A fucking thesis paper titled "The Architectural Evolution of Agentic Intelligence: A Deep Dive into the OpenAI Responses API" complete with an Executive Summary and a Conclusion section. It gave me exactly what I told it not to give me.
The entire document is full of this pretentious bullshit. It talks about an "inflection point" in AI development and the "burgeoning field of Agentic AI." It uses "ontology" to describe a basic API object model. "Locus of control." "Cognitively robust." "Heterogeneous Output Items." It describes how the API works as "Mechanism of Action" like it's a pharmaceutical drug. There's a section about "The Fragmentation of Multimodality" when all I needed was "here's how to send a PDF as inline data in a request." Another one called "Computer Use: The Frontier of Agency" that says absolutely nothing.
Where are the JSON examples? I asked for implementation details and got vague descriptions. It mentions structured outputs exist but doesn't show me a single actual request. It says there are different SSE event types for streaming but doesn't give me the shape of those events. It talks about encrypted reasoning but where's the actual parameter I need to set? I asked for exact authentication headers and base URLs. I got tables with headers like "The Taxonomy of Response Items" instead.
The whole thing is 90% fluff about why stateful APIs are important and 10% hand-waving at technical details. I can't implement anything from this. I asked for a comprehensive guide and got literally nothing of actual use.
It researched 72 sources—it had to have more than enough material to give me what I asked for. All it had to do was distill that into actual implementation details I could use, but instead it decided to waste my time with garbage.
This isn't a one-off problem either. Every single prompt I give Deep Research comes back with the same academic paper structure. It doesn't matter how explicitly you tell it what you want. The system prompt clearly just forces it to write these pseudo-intellectual essays regardless of what you actually ask for.
The planning system is also utter trash and limits the model significantly. The model has a huge tendency to rely on its internal knowledge when creating research plans rather than approaching queries with appropriate uncertainty. When you ask about something recent, it will confidently scaffold out a plan based on what it knew before its training cutoff, filling in specific entity names, version numbers, and technical details that may have completely changed since then.
Say you ask about a niche API that got a major overhaul last month. Instead of planning "search broadly for the latest documentation, then investigate specific endpoints based on what's found," it will generate a plan like "look up the authentication flow for version 2.3, find the (deprecated) webhook format, investigate the (legacy) response structure." It's operating on stale assumptions and then executing that flawed plan with confidence, completely missing the actual current state of things because it never ran a broad query to begin with.
This rigidity compounds the problem because later research steps often depend on discoveries made in earlier ones. You need the flexibility to pivot when you find something unexpected. By locking the model into a predetermined sequence of specific searches, you're preventing it from adapting its approach based on what it actually finds.
The most frustrating part is that the model doesn't need this hand-holding. It's perfectly capable of doing adaptive, freeform research. OpenAI and Anthropic don't force their models through these rigid planning hoops because they trust the model to dynamically adjust its search strategy as it learns more (note: Anthropic kind of does this because they use subagents, but it's able to conduct preliminary research before spawning the parallel subagents).
Even if Google would like to keep this planning system, at least give the planning model the ability to conduct preliminary research so it has a general idea of what it's about to investigate instead of formulating a single-source-of-truth plan with outdated knowledge.
After all, Gemini 3 is still a Preview model, so many of these tool-calling issues will likely be ironed out in the final release (this is Google's first "proper" model built for the world of agents). However, the web search limitation is a purely architectural limitation; this desperately needs to get reworked:
– Allow the model to search and get web snippets, but also allow the model to retrieve the full Markdown content of a webpage — Google basically owns the internet, a simple webpage → Markdown conversion is not akin to boiling the ocean.
– Surface web search requests within API responses so it's easy to see where in a model's reasoning trace it searched the web, and how many individual web calls it produced.
– Try and train out the model's tendency to launch hyper-specific queries on time-sensitive topics or niche topics; instead teach it to launch a preliminary, broad investigation before running targeted search queries.
– Allow us to add our own tools in tandem with the Google Search tool. Currently, the Google Search tool restricts the ability to add custom tools to requests, which is severely limiting.
– Completely overhaul the Deep Research system prompt: remove the requirement of an academic report and instead keep it as a default that will be overridden if specified by the user's prompt. Deep Research should not be mandated to write reports; it should be seen as an agent with more in-depth search capabilities that can accomplish anything regular Gemini can do, just with more source-based backing.
– Completely overhaul the Deep Research planning phase: either a) allow the model to conduct preliminary research, b) explicitly instruct the model to not go into any specifics the user didn't explicitly provide in the research plan, or c) remove it completely; since Gemini doesn't employ a subagent-based approach for Deep Research a plan is, by all means, unnecessary.
For me, the most important thing that needs to happen is that the model needs a dedicated tool to fetch the contents of a specific website. Gemini is the de facto "long context window" model; allowing it to fetch full websites will allow us to truly exploit this extremely impressive context window and coherence/recall strength.
The frustrating reality is that this isn't even hard to implement. I've personally built web search tools that allow models to genuinely search the web and read page content effectively. Solutions for HTML-to-Markdown conversion already exist (like Turndown and html-to-markdown-rs), and building a custom implementation for a company of Google's scale would be trivial.
I hope to see these issues addressed soon.