Lenny's Newsletter
Lenny's Reads
Building eval systems that improve your AI product
Preview
0:00
-21:40

Building eval systems that improve your AI product

A practical guide to moving beyond generic scores and measuring what matters

If you’re a premium subscriber

Add the private feed to your podcast app at add.lennysreads.com

In this episode, we dive into the fast-emerging discipline of AI evaluation with Hamel Husain and Shreya Shankar, creators of AI Evals for Engineers & PMs, the #1 highest-grossing course on Maven.

After training 2000+ PMs and engineers across 500+ companies, Hamel and Shreya reveal the complete playbook for building evaluations that actually improve your AI product: moving beyond vanity dashboards, to a system that drives continuous improvement.

Listen now: YouTube | Apple | Spotify

In this episode, you’ll learn:

  • Why most AI eval dashboards fail to deliver real product improvements

  • How to use error analysis to uncover your product’s most critical failure modes

  • The role of a “principal domain expert” in setting a consistent quality bar

  • Techniques for transforming messy error notes into a clean taxonomy of failures

  • When to use code-based checks vs. LLM-as-a-judge evaluators

  • How to build trust in your eva…

This post is for paid subscribers