15 April 2025 11:00 - 11:30
I sleep. My agents improve
Everyone who builds with AI hits the same wall. You write a tool or a custom prompt, it works for a while, then it stops getting better. You keep tweaking it and you can't tell if your changes actually help or if you're fooling yourself.
I had 15 of these tools on my laptop. One audits apps. One rewrites prompts. One checks patents. Each one was decent and stuck.
So I tried something weird. I rented a cloud computer for one night, loaded up all 15 tools, and set up a system where AI agents take turns testing changes against each other. One agent runs the old version of a tool. Another agent runs a new version. A third agent looks at both answers, doesn't know which is which, and picks the winner. Good changes get kept. Bad ones get thrown out. Then the loop runs again. And again.
I went to sleep. By morning the system had run 47 experiments. It cost me $10.
The results were not small. My patent tool went from catching 6 out of 10 issues to 9 out of 10. My prompt tool scored 40 percent better on quality. My app audit tool started catching gaps it used to miss.
In this session I'll walk through exactly how I set it up, what the agents looked like, what worked, what broke, and the actual before and after scores. If you've ever wanted your AI tools to get better without you babysitting them all weekend, this is how.