Made a Github awesome-list about AI evals, looking for contributions and feedback

By skyforbes Nov 22, 2025 No Comments

As AI grows in popularity, evaluating reliability in a production environments will only become more important.

Saw a some general lists and resources that explore it from a research / academic perspective, but lately as I build I've become more interested in what is being used to ship real software.

Seems like a nascent area, but crucial in making sure these LLMs & agents aren't lying to our end users.

Looking for contributions, feedback and tool / platform recommendations for what has been working for you in the field

By skyforbes

GeminiAI

Made a Github awesome-list about AI evals, looking for contributions and feedback

Like this:

By skyforbes

Leave a ReplyCancel reply

You Missed

How do I make my ChatGPT know the time?

How to Write Better Prompts: The “Role → Task → Specifics → Context → Examples → Notes” Method

OMNY website restored the Trip History and Charge History sections of the site after being on hiatus following privacy concerns. But the Trip History page now does not list the location where the an OMNY-compatible card has been tapped

If you could become any video game character, who would it be and why?

Archives

Made a Github awesome-list about AI evals, looking for contributions and feedback

Like this:

By skyforbes

Related Posts

Gemini IDE license?

Experience with helping students understand problems

Gemini Deep Think is garbage

Leave a ReplyCancel reply

You Missed

How do I make my ChatGPT know the time?

How to Write Better Prompts: The “Role → Task → Specifics → Context → Examples → Notes” Method

OMNY website restored the Trip History and Charge History sections of the site after being on hiatus following privacy concerns. But the Trip History page now does not list the location where the an OMNY-compatible card has been tapped

If you could become any video game character, who would it be and why?