Benchmark for LLMs reading images?

Is there a good benchmark of how well LLMs can read images? E.g., for a task of counting cars, identifying their shapes, models etc.

Leave a Reply