GPT is lazy when Gemini did the full job. As a student Deep Research + the UI are the only things keeping me on Plus.

I love ChatGPT, they’re pioneers, and thanks to it I’ve been able to learn medicine, it explains tons of things to me every day. But I really feel like it’s the end. Maybe I’m the one who doesn’t know how to use it properly. Let me share a use case :

TL;DR: I tried extracting data from scanned student questionnaires (checkboxes + comments) using both Gemini and ChatGPT, with the Word template provided. Both make some checkbox-reading mistakes, which I can accept. The problem is ChatGPT stopped early and only extracted 4/27 responses after multiple attempts, yet responded as if the job was complete instead of clearly stating its limits. Gemini mostly followed the requested format and processed the full set. This lack of transparency is making it hard to justify paying $20/month for ChatGPT (I mainly keep it for Deep Research).

Prompt used in both models (translated in English here) :
https://chatgpt.com/share/693bdb0e-f480-800f-a572-e0a4249b6528

Both models are making errors with checkboxes but… it's ok.

Results : ChatGPT 5.2 Thinking in project
https://chatgpt.com/share/693bda79-df4c-800f-99ce-bd93f0681a8c

https://preview.redd.it/dwm9hcdivq6g1.png?width=2170&format=png&auto=webp&s=3ffb362abc42234aa901d98df2543dca133698e8

Results : Gemini 3 Advanced Thinking :
(He did the whole table, not shown in the pic)

https://preview.redd.it/bwjopvoqvq6g1.png?width=553&format=png&auto=webp&s=b48f89dbf9fcf465b385ca1a4b8d02c49d588d51

Context:

I have scanned PDF questionnaires filled out by middle school students. They include checkboxes (often messy: faint marks, ambiguous ticks, blue pen, etc.) and a few free-text comment fields. To help the model, I also provide the Word version of the questionnaire so it knows the exact structure and expected answer options. In both cases I manually validate the output afterward, so I can understand checkbox recognition errors given scan quality.

Where it becomes a real issue is the difference in behavior between Gemini and ChatGPT. Gemini mostly followed the instructions and produced the expected data format (as described in the prompt), even if some checkbox reads were wrong in a way that’s understandable.

ChatGPT, on the other hand, stopped partway through. After several attempts, it eventually produced an output after about 7 minutes, but only for the first 4 students… while the dataset contains 27 questionnaires (and the prompt implicitly asked to process everything).

Both models are making errors but… it's ok.

I can accept hard limits (time, PDF size, page count, etc.). What I don’t understand is the lack of transparency: instead of clearly saying “I can only process X pages / X students, here’s where I’m stopping,” it responds as if the work is complete and validated. In the end you get something that looks finished but isn’t, which makes it unreliable.

For the record, I’ve been a ChatGPT user since the beginning and it has helped me a lot (especially for medical school). But since Gemini 3 came out, it’s started feeling like the gap has narrowed, or even flipped, for this kind of task. Right now, the only reason I keep paying for ChatGPT (USD $20/month) as a student is the “deep research” mode. If that didn’t exist, I’d probably have canceled already, especially since Gemini is free.

I’d appreciate feedback: is this a prompting issue, a known limitation with PDF extraction, or just model-to-model variability (or load-related behavior)?

Leave a Reply