Claude Opus 4.6 vs. GPT-5.3 Codex: How I shipped 93,000 lines of code in 5 days

Playback speed

Share post at current time

0:00

Transcript

Claude Opus 4.6 vs. GPT-5.3 Codex: How I shipped 93,000 lines of code in 5 days

🎙️ I tested GPT-5.3 Codex vs. Claude Opus 4.6 on real code, shipping 44 PRs in five days to see which AI model actually works best

Claire Vo

Feb 11, 2026

I put the newest AI coding models from OpenAI and Anthropic head-to-head, testing them on real engineering work I’m actually doing. I compare GPT-5.3 Codex with Opus 4.6 (and Opus 4.6 Fast) by asking them to redesign my marketing website and refactor some genuinely gnarly components. Through side-by-side experiments, I break down where each model shines—creative development versus code review—and share how I’m thinking about combining them to build a more effective AI engineering stack.

Listen on YouTube, Spotify, or Apple Podcasts

What you’ll learn:

The strengths and weaknesses of OpenAI’s Codex vs. Anthropic’s Opus for different coding tasks
How I shipped 44 PRs containing 98 commits across 1,088 files in just five days using these models
Why Codex excels at code review but struggles with creative, greenfield work
The surprising way Opus and Codex complement each other in a real-world engineering workflow
How to use Git concepts like work trees to maximize productivity with AI coding assistants
Why Opus 4.6 Fast might be worth the 6x price increase (but be careful with your token budget)