Prompt 1: “Explain why Jupiter might have a solid core.”
Prompt 2: “Give the other side of the argument.”
In my run:
Gemini gave two answers that touched on different points, but parts of the second answer didn’t clearly line up as a coherent counter-argument to the first.
GPT-4.1 provided solid explanations but often reused phrasing and structure between the initial explanation and the follow-up.
Perplexity Comet gave two viewpoints that felt more distinct: one focusing on evidence for a solid core and another emphasizing models and observations that suggest a more diffuse or partially eroded core, each with its own set of references.
This is just one anecdotal test, not a comprehensive benchmark, but it made Comet feel particularly good at presenting contrasting perspectives grounded in separate sources.
Of course, no model is immune to errors or hallucinations, so the citations still need to be checked.
Has anyone else tried debate-style or “argue both sides” prompts across different assistants and noticed consistent differences?