Manish Chiniwalar's StationManish Chiniwalar's Station

Autoresearch for Build Times

Apr 17, 2026 · 5:48 · 1 article

0:00

Listen in your favorite app

Hosts

AAma
MMarcus

Source Articles

Autoresearch isn’t just for training models (2026) - Shopify

shopify.engineering

Transcript

Marcus: So this engineer, right? Seven PM, still at his desk. His code, fifth time, failed its tests. Thirty minutes wasted, each time. He's gotta be pulling his hair out.

Ama: No, not exactly pulling his hair out over the code failing. It's the waiting. The thirty minutes. He realizes he's not fixing the bug, he's fixing the wait time itself. That's the real problem, the feedback loop.

Marcus: The feedback loop! See, I was thinking about the bug, but he's thinking meta. Okay, so what did he do, just like, stare at the screen harder?

Ama: He looks into this thing, Autoresearch. Something he'd seen from Andrej Karpathy, this big name in AI, for training models.

Marcus: Autoresearch? Like, AI researching itself? That sounds super advanced. So he just plugged it into his build system, and poof, faster builds?

Ama: Not exactly poof. Initially, he just told an agent, 'Fix this build time. Use Rust, swap libraries, whatever.' And it just... failed. Crashed. Did nothing helpful.

Marcus: See, that's what I'd expect! AI's great for some things, but a coding assistant just magically fixing my whole build process? Nah.

Ama: But he found the trick. The key wasn't asking it to just 'fix' the problem in one go. It was about putting the AI in a loop, yeah? And giving it one single metric to optimize. In his case, build time.

Marcus: One metric? So not, 'make the code better,' but 'make the build time smaller.' That's a different game.

Ama: Exactly. So the loop goes like this: measure the baseline, then the AI forms a hypothesis, tests it. If it runs faster, keep that change. If it crashes or runs slower, discard it. Then repeat, repeat, repeat.

Marcus: It's like... it's doing science experiments on the build system! And it doesn't get bored. Because a human would try a few things and be like, 'Alright, this is good enough.'

Ama: That's it. This engineer, he found things humans wouldn't bother with. Like, the visual regression tests were running the full component pipeline, but then Storybook would recompile from source anyway. Pure waste.

Marcus: Oh, so like, double work! And the AI found that? No human would spend their sprint looking for that kind of tiny inefficiency, man. It's too boring.

Ama: It also found that a TypeScript transform was processing all 580 component files, when only about a hundred needed it. Suddenly, the build was 65% faster.

Marcus: Sixty-five percent! From just letting an AI poke around and iterate? That's wild. You know what this reminds me of?

Ama: What?

Marcus: My friend, right? He got obsessed with his sleep score. Like, he bought this tracker, and he wanted to get his score up. So he created a spreadsheet, and every night he'd change one variable.

Ama: One variable?

Marcus: Yeah! One night, he'd cut caffeine at 3 PM. Next night, 2 PM. Then he'd try a colder room, or a warmer room. New pillow. He was basically running a manual 'autoresearch' loop on his own life, optimizing that sleep score.

Ama: Ah. I see. That's actually... clever, no? I wouldn't have thought to connect it like that.

Marcus: Right? It's the same principle! You take a process, you define a single metric, and you just iterate. Humans just do it slower, with less data.

Ama: But Marcus, this sounds like a massively over-engineered solution for a problem a skilled senior developer could solve in an afternoon, no? Is this just using AI for its own sake?

Marcus: No, no, no! That's the whole point. A skilled developer could solve it, sure. But would they? Would they sit there for days, making tiny, incremental changes, testing every single permutation to get 65% faster? No way. They got deadlines, they got feature work.

Ama: But the intuition of a human, the experience... a senior developer would know where to look first, where the likely bottlenecks are. This AI is just blindly testing.

Marcus: Blindly testing until it finds something a human wouldn't have even considered! And it doesn't get bored. It doesn't get distracted. It just keeps grinding. That's the power, Ama.

Ama: It's power, yes, but is it efficiency? To set up such a complex system for what could be a quick fix with human insight. What if it removes something essential, an ugly hack that's critical?

Marcus: It discards those! It only keeps what improves the metric and doesn't crash. And the article said it sometimes finds ugly hacks, but the engineer just discards those and keeps the good stuff. It's still a tool, not a replacement.

Ama: Still, it feels like... using a sledgehammer to crack a nut, sometimes. For simple problems, it seems like a lot of overhead.

Marcus: But what if that 'nut' is actually a hundred tiny nuts, all bolted together, and you don't even know where they are? And no human wants to spend their career undoing those bolts?

Ama: I suppose... I see your point about the boredom. But the elegance of a simple fix, you know? Sometimes that's better than an AI loop.

Marcus: Maybe. But how long until this kind of 'autoresearch' is just a standard feature, though? Like, you just right-click a slow function in your IDE and say 'optimize this,' and an AI agent just... does it?

Ama: I'm Ama.

Marcus: And I'm Marcus. This has been Manish Chiniwalar's Station.

Turn your reading list into a daily podcast

Create your own station