Simon Willison in the article Introducing Showboat and Rodney, so agents can demo what they’ve built (published February 10, 2026):
The frontier models all understand that “red/green TDD” means they should write the test first, run it and watch it fail and then write the code to make it pass - it’s a convenient shortcut.
I find this greatly increases the quality of the code and the likelihood that the agent will produce the right thing with the smallest amount of prompts to guide it.