Detecting jailbreaks and prompt leakage before production

I’ve been exploring issues around LLMs leaking system prompts and unexpected jailbreak behavior.

Thinking about a lightweight API that could help teams:
– detect jailbreak attempts & prompt leaks
– analyze prompt quality
– support QA/testing workflows for LLM-based systems

Curious how others are handling this – do you test prompt safety manually, or have any tools for it?

(Set up a small landing for early interest: assentra)

Would love to hear thoughts from other builders and researchers.

Leave a Reply