Building eval systems that improve your AI product

Lenny's Reads

Building eval systems that improve your AI product

Preview

0:00

-21:40

Building eval systems that improve your AI product

A practical guide to moving beyond generic scores and measuring what matters

Lenny Rachitsky

Sep 09, 2025

∙ Paid

If you’re a premium subscriber

Add the private feed to your podcast app at add.lennysreads.com

In this episode, we dive into the fast-emerging discipline of AI evaluation with Hamel Husain and Shreya Shankar, creators of AI Evals for Engineers & PMs, the #1 highest-grossing course on Maven.

After training 2000+ PMs and engineers across 500+ companies, Hamel and Shreya reveal the complete playbook for building evaluations that actually improve your AI product: moving beyond vanity dashboards, to a system that drives continuous improvement.

Listen now: YouTube | Apple | Spotify

In this episode, you’ll learn:

Why most AI eval dashboards fail to deliver real product improvements
How to use error analysis to uncover your product’s most critical failure modes
The role of a “principal domain expert” in setting a consistent quality bar
Techniques for transforming messy error notes into a clean taxonomy of failures
When to use code-based checks vs. LLM-as-a-judge evaluators
How to build trust in your eva…

This post is for paid subscribers

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts