$ cat
Stop Prompting. Write Loops.
For a few weeks in June 2026 my feed said the same sentence a hundred different ways: “I don’t prompt the agent anymore — I write loops that prompt it for me.” Boris Cherny, who runs Claude Code at Anthropic, said it on stage. Peter Steinberger said it sharper and got 6.5M views. Then a thousand accounts repackaged it into “loop engineering is the new prompt engineering” threads.
Most of it is noise. About 90% of the content is one idea wearing different hats. But the idea underneath is real and worth getting right, so here’s the version with the marketing stripped out.
The actual shift
Prompt engineering optimized a single turn. You craft the perfect message, the model answers, you read it, you craft the next one. You are the loop — the slow part that sleeps at night.
Loop engineering takes you out of the inner cycle. You stop typing turns and you write the program that types them: discover the next unit of work, hand it to the agent, verify the result, persist what changed, decide whether to continue. The human designs and supervises the machine. The machine does the keystrokes.
That’s it. Everything else is detail. But the details are where it works or doesn’t.
Memory lives in files, not in the chat
The counter-intuitive part: each turn runs in a fresh, empty context window. The agent remembers nothing from the last iteration. That’s a feature — it stops context rot and keeps every turn sharp. Continuity comes from the filesystem instead. A CLAUDE.md holds standing context so the agent never re-asks. A spec folder holds the immutable source of truth. A fix_plan.md is the todo list the loop owns — pick the top item, do it, check it off, append anything new it discovered. Git history is the diff between turns.
This is the “Ralph loop” Geoffrey Huntley popularized: while :; do cat PROMPT.md | claude -p; done. He built an entire programming language this way over three months. The cleverness isn’t the while — it’s that state on disk plus a fresh context every turn is more robust than one long, drifting conversation.
The loop is easy. The guardrails are the product.
Here’s the line everyone skips. An unguarded loop is a billing incident. The engineering is the stop conditions:
- Completion sentinel. The agent prints a token only when the gate is genuinely green. The harness greps for it.
- Cost cap. Sum the spend each turn, abort over budget. Non-negotiable.
- Stall detection. If the working tree hasn’t changed for N turns, the model is spinning on a stuck error. Stop.
- Sandboxing. Run in a git worktree or container so a bad run can’t touch main.
- Escalate, don’t merge. Loops open draft PRs. A human pulls the trigger.
Strip these out and you don’t have a loop, you have a way to spend money in your sleep.
The shapes that matter
Once you have the harness, a “loop” is just pick a trigger, pick a task, pick a gate. The useful ones fall into a handful of families:
- Task-driven — drive one thing to a passing gate (tests green), or build a whole feature from a spec overnight, or drain a labeled issue queue one PR at a time.
- Metric-driven — move a number. Optimize a benchmark and keep the change only if it improved. Climb test coverage to a target. Here the harness is the judge, not the model.
- Fan-out — one mechanical change across hundreds of files, N agents in parallel, each in fresh context.
- Event-driven — no task to finish; watch a signal (CI, health, logs) and fire an agent only when it trips.
- Critic — an agent that reviews and never builds.
- Beyond code — research a topic until a critic agent says it’s complete; refine content until a judge agent scores it past a bar. The gate is another agent, not a test.
How people actually run them
Past the hype, the field reports are consistent:
Builder and reviewer in separate contexts. The setup that keeps coming up: a maker writes, and a reviewer that only sees the diff critiques it. No shared chat, so the reviewer can’t absorb the author’s rationalizations. A fresh pair of eyes, mechanically enforced.
“Prove the test by breaking it.” The best loops don’t trust a green checkmark. One widely-shared setup first breaks the code to confirm the test actually fails, and refuses to run at all if it can’t prove its own setup is sound. Mutation-as-gate beats checkmark theatre.
Cron and Routines, not a babysat terminal. The durable setups don’t sit watching a terminal. They fire from cron, a timer, or a cloud schedule, and report to Slack or a draft PR. The interactive loop is for iterating; scheduled is for shipping.
For scale: practitioners cite roughly 4% of all public GitHub commits — about 135,000 a day — now coming from Claude Code. Treat the exact number as marketing. The direction is the point.
The one rule that picks good work
A loop is only as trustworthy as its gate. The candidates that work are the ones where “done” or “better” is a command that returns a verdict: tests pass, a benchmark number drops, the type checker is clean, a reviewer agent scores it 85. Where there’s no objective gate — taste, judgment, anything irreversible — keep yourself in the loop and make the agent draft, not commit.
And the highest-leverage upgrade, the one that separates a toy from something you’d actually leave running: verify the verifier. Don’t let the agent grade its own homework. A test it wrote and a green checkmark it reports are claims. Confirm the test fails without the fix, run the gate in a clean environment, have a separate-context critic sign off. The loops that ship trust mechanical truth, not the model’s self-assessment.
The leverage didn’t move to people who can write a clever prompt. It moved to people who can design a system that’s safe to leave running. That’s a real engineering skill, and most of the people posting about it haven’t done it yet.