Back to home
TRIGGER cron / webhook DISCOVER scan + triage ISOLATE worktree / sandbox BUILD maker agent VERIFY checker agent SHIP PR + notify STATE memory / progress THE LOOP
Opinion Jul 4, 2026 | 6 min read

Loop Engineering: The Paradigm That Kills Prompt Engineering

You shouldn't be prompting your agents anymore. You should be designing systems that prompt them for you.

Loop Engineering Agentic AI Workflow Systems Design DevEx
IK

Mohammed Imran Khan

Senior ML Engineer at Red Hat | Ex-Mercedes-Benz R&D

Every morning at 8:25, my Mac wakes itself up. A LaunchAgent fires. DeltaForge logs into my broker, connects a WebSocket, starts the trading engine, spawns an independent kill-switch process, and begins scanning for signals. By 9:15, it's placing trades. By 3:30, it squares off, generates a P&L report, and sends me a Slack message. I don't touch anything.

I built that system months ago. Scheduler triggers the engine. Engine proposes trades. Risk engine verifies. Kill switch watches from outside. State persists to disk. The system prompts itself. I just designed the loop.

Then I looked at my coding workflow and realized I was still the one typing prompts into AI tools. One turn at a time. Read the output. Type the next thing. The irony hit hard: I'd automated trading but was still manually operating my coding agents like it was 2024.

That irony has a name now. Loop engineering.

01 // The Shift

The progression is clear in hindsight. Each layer absorbed the one below it:

2022–2024
Prompt Engineering "What should I say to the model?"
2025
Context Engineering "What information fills the window?"
Early 2026
Harness Engineering "What environment does it run in?"
Mid 2026
Loop Engineering "What system designs itself?"

Peter Steinberger put it bluntly: you shouldn't be prompting coding agents anymore. You should be designing loops that prompt your agents. Boris Cherny, who leads Claude Code at Anthropic, reportedly uninstalled his IDE after not opening it for a month. His job, as he described it, is to write loops.

The key distinction: a harness runs a fixed script for one session. A loop decides its own next action, spawns helpers, feeds itself, and runs on a timer. The harness supports you. The loop replaces you as the person who prompts.

The bottleneck shifted from "how do I phrase this?" to something entirely different: who decides the next turn, what evidence is required before proceeding, and what does 'done' mean?

02 // Six Primitives of a Loop

Every production loop I've seen, whether it's shipping code or shipping trades, has the same six structural pieces. Not features. Primitives.

1

Automations

The heartbeat. Cron, webhooks, CI signals. The loop starts without you. If you have to press "go," it's not a loop. It's a script with a human dependency.

2

Isolation

Worktrees, sandboxes, containers. Two agents working the same repo can't collide on the same files. The second you run parallel agents without isolation, you've built a merge conflict factory.

3

Skills

Project knowledge on disk. Without skills, the loop re-derives your conventions from zero each run. A goldfish with compute. With skills, it compounds. The difference between a tool and a teammate.

4

Connectors

MCP, APIs, integrations. A loop that can only see the filesystem is a tiny loop. The one that opens PRs, links tickets, queries Sentry, and pings Slack? That's a loop that acts in your actual environment.

5

Sub-agents

Maker/checker. The model that wrote code is too generous grading its own homework. A second agent (different model, different instructions) catches what the first talked itself into.

6

State

External memory: markdown files, databases, progress boards. The agent forgets between runs. The repo doesn't. State is the spine that lets tomorrow's run pick up where today's stopped.

Here's the part that surprised me: I already had all six in my trading system. APScheduler as automation. Separate-process kill switch as the sub-agent checker. capital.json as state. Slack as connector. Adaptive mode as skill memory. I just didn't call it loop engineering.

deltaforge_as_loop.py
# DeltaForge - loop engineering before the term existed

class DailyScheduler:
    # AUTOMATION: LaunchAgent wakes Mac at 08:25
    # ISOLATION: separate process per component
    # STATE: capital.json + trades.db + HALT file

    def run_day(self):
        self.login()              # retry 3x, 30s between
        self.start_engine()       # MAKER: generates signals
        self.start_kill_switch()  # CHECKER: independent process
        self.start_monitor()      # CONNECTOR: Slack alerts

        while market_open():
            self.engine.tick()    # scan → risk → execute

        self.square_off_all()
        self.generate_report()    # STATE: persists to disk
loop.yaml - conceptual config
name: daily-triage
trigger: cron("0 9 * * MON-FRI")
isolation: worktree

steps:
  - discover:
      run: scan_ci_failures + open_issues + recent_commits
      write_to: state/triage.md

  - act:
      for_each: finding in state/triage.md
      spawn: implementer(finding)
      isolation: worktree

  - verify:
      agent: reviewer  # different model
      check: tests_pass AND lint_clean AND diff_reasonable

  - ship:
      if: verified
      run: open_pr + link_issue + notify_slack

guardrails:
  max_iterations: 3
  budget: $5/run
  halt_on: no_progress(2)
03 // What This Changes

Agents that compound

First run: cold start, re-derives everything from scratch. Tenth run: knows your conventions, your architecture, your edge cases. Skills plus state equals memory across sessions. This is the structural difference between a tool you use and a teammate you trust.

Analysis of 40,000+ GitHub repos shows CLAUDE.md and AGENTS.md adoption exploding in sequence. Teams are learning that the cost of teaching the agent once is paid back on every subsequent run. The loop compounds.

Maker/checker enables unattended trust

Stripe ships 1,300+ merged PRs per week with zero human-written code. Max 2 CI rounds. Cyclotron resolved 1,088 production Sentry issues in 30 days, saving 840 engineering hours. These numbers aren't theoretical. They're running today.

The structural move that makes this possible: the verifier is a different model than the maker. Always. The agent that wrote the code must never be the one grading it.

"I built DeltaForge's risk engine as an independent process specifically so it survives if the trading engine hangs. Same principle. The checker can't depend on the maker's process staying healthy."

The 10x engineer is now a systems architect

The new 10x engineer doesn't write 10x the code. They build the system that writes it. Your job becomes: designing workflows, building verification infrastructure, maintaining repo health for agent consumption.

But here's the constraint nobody talks about: you are the GIL of your AI agents. Your review bandwidth is the actual bottleneck, not how many agents you can spawn. Ten parallel loops are worthless if you can't review what they produce. This is why the maker/checker split matters so much: it extends your review bandwidth by handling the obvious stuff before it reaches you.

04 // The Danger

Loops amplify. That's the feature and the failure mode. They amplify quality when designed well, and they amplify rot when designed poorly. Three specific ways this goes wrong:

Comprehension Debt

The faster the loop ships code you didn't write, the wider the gap between what exists and what you actually understand. The bill that hurts isn't the token bill. It's the afternoon six months from now when production breaks inside a system no human has read.

Cognitive Surrender

Two people can build the exact same loop and get opposite outcomes. One uses it to move faster on work they understand deeply. The other uses it to avoid understanding the work at all. The loop doesn't know the difference. You do. Designing the loop is the cure when done with judgment, and the accelerant of rot when done to avoid thinking.

Runaway Cost

Uber burned their entire 2026 AI budget in four months. A poorly-configured loop reportedly produced a $47,000 overnight bill. Gartner estimates agentic AI requires 5–30x more tokens per task than standard chatbot interactions.

The guard-rail trinity is non-negotiable: hard iteration cap + no-progress detection + spend ceiling. Without all three, what you're running isn't a loop. It's a fire hose pointed at your billing page.

05 // Build the Loop. Stay the Engineer.

I built DeltaForge with a kill switch because I don't trust any system, including my own, to run without independent supervision. The same instinct applies here. The kill switch isn't a lack of confidence in the system. It's the proof that you understand the system well enough to know where it can fail.

"The leverage point moved. Prompting was about what you say. Loop engineering is about what you build. And what you build reveals whether you're using the loop to think faster, or to avoid thinking entirely."

The work didn't get easier. It got structural. You're no longer writing prompts. You're designing the system that makes prompts unnecessary. You're defining what "done" means, what evidence is required, who checks whom, and when to stop.

Next comes fleets: many loops coordinating, negotiating budgets, supervising each other. The skill after "write loops" will be "design the org chart of agents." But that's another post.

For now: build the loop. But build it like someone who intends to stay the engineer, not just the person who presses go.