Hey r/PromptEngineering! I just launched PromptLens — a tool to compare prompts side-by-side across different LLMs (OpenAI, Anthropic, Google, etc.).
You can:
- Run A/B tests between prompts
- Compare models and outputs
- Upload datasets + run prompt evaluations at scale
- See win/loss analytics to know which prompt actually performs better
It’s free to try (no credit card): https://www.promptlens.io
Would love feedback from this community — what would you want to benchmark or test?