Vertex AI Gemini fails on large (200+ page) PDFs — need full-document JSON extraction WITHOUT chunking

Hey everyone,
I’m running into a major limitation with Google Vertex AI (Gemini 2.5) while processing PDF inspection reports.

My system takes different PDF files uploaded by users every day (each file is unique and used only once). For each PDF, Gemini should generate a structured JSON array of objects with fields like title, trade, page numbers, image indexes, etc.

This works flawlessly for normal reports (~80–120 pages).
But when the PDF is large (200–250+ pages) — especially image-heavy — Gemini breaks:

  • Returns empty or partial JSON
  • Ignores most pages
  • Produces invalid schema output
  • Sometimes seems to “not see” the document at all

Important details:

  • PDFs are often small in file size (e.g., 100–300 KB), but page count is huge (goes from 1 to 400 pages pdf files)
  • The files contain many images/photos
  • I pass the PDF via inlineData or JSONL for batch prediction
  • Using strict responseSchema with responseMimeType: application/json
  • Using models like gemini-2.5-flash and gemini-2.5-pro

Critical requirement:

We strongly prefer NOT to chunk the PDF into smaller pieces.
The client wants:

  • single upload
  • One single model call
  • One single JSON array containing all issues
  • No manual or automatic chunk splitting if possible

Chunking creates other difficulties (duplicate objects across chunks, page offsets, merging logic) which we’re trying to avoid unless absolutely necessary.

Client needs:

They want to upload any inspection report (even 300+ pages), and get reliable full-document structured JSON from Vertex AI in one go.

Looking for solutions / advice:

  • Is there a Gemini/Vertex model with larger multimodal context than Flash/Pro?
  • Any recommended architecture to process VERY large PDFs without chunking?
  • Any way to handle image-heavy multi-page reports using Vision models efficiently?
  • Is this a known limitation of Gemini’s multimodal window?
  • Any roadmap for long-context Vision models on Vertex AI?

Thanks, any insights would be super helpful!

Leave a Reply