Back to home
ORCH orchestrator Code Review Deploy Judge Judge Arch Impl Test Sec Perf Style Build Mon Rollback MCP Servers: GitHub, Filesystem, DB, CI/CD, Slack, Monitoring, Security Scanners L1 L2 L3 MCP HIERARCHICAL AGENT ARCHITECTURE
Deep Dive Apr 29, 2026 | 18 min read

Agents All the Way Down

The Architecture Nobody Talks About - How Agent Swarms Actually Ship Code

Agentic AI Multi-Agent Systems MCP A2A Protocol LangGraph Software Engineering
IK

Mohammed Imran Khan

Senior ML Engineer at Red Hat | Ex-Mercedes-Benz R&D

01 // The Death of Linear SDLC

Here's a number that should make every engineering leader pause: according to JetBrains' AI Pulse survey (January 2026), 90% of professional developers now use AI tools regularly at work. Claude Code adoption jumped 6x in under a year to 18% work adoption. The shift is unmistakable.

But the real disruption isn't which tool developers use. It's that the SDLC as we've known it for decades is fundamentally breaking down. When agents can generate, validate, and deploy in parallel, the sequential pipeline ceases to make sense.

SDLC (Linear Pipeline) vs ADLC (Concurrent Loop)

Traditional SDLC

Requirements
Design
Develop
Test
Deploy

Sequential gates, one-way flow

Agentic ADLC

Intent Generate Validate Govern Deploy Observe

Concurrent modes, loops not gates

"ADLC is not SDLC with AI tools added. It is a different operating model - redesigned for a world where the primary execution unit is an agent, not a human."

- adlc.io

01

Concurrency over Sequencing

Six modes operate simultaneously. An agent can write code while another validates the previous output and a third monitors production.

02

Bets over Requirements

Planning favors hypotheses with resolution signals. Because agents make change cheap, you validate assumptions through rapid iteration.

03

Governance over Execution

When agents handle execution, the bottleneck shifts to governance - verifying outputs, enforcing policies, maintaining architectural coherence.

Arthur.ai's take on ADLC adds the reliability dimension: the core of the lifecycle is an "Agent Development Flywheel" where live usage reveals failure modes, which grow eval suites, which ship improvements with regression-aware metrics.

02 // Hierarchical Agent Architectures

If ADLC is the operating model, multi-agent architectures are the execution engine. Four dominant patterns have emerged:

S W W W

Supervisor

Central planner routes to specialists

Router A B C

Router

Intent classifier picks one path

A B C

Pipeline

Fixed sequential stages

A B C D MEM

Swarm

Loose coordination, shared memory

Hierarchical Nesting: Department → Team → Worker

Orchestrator CodeSupervisor ReviewSupervisor DeploySupervisor Architect Implement Test Security Perf Style Build Monitor Rollback L1 L2 L3

This is already happening in production. AgentOrchestra demonstrates hierarchical "conductors" with MCP-oriented tool management. LangGraph Supervisor provides a library specifically for this. CrewAI achieves it with Process.hierarchical.

03 // Agent Promotion & Demotion

In human organizations, promotion and demotion are formal HR processes. In multi-agent systems, they're dynamic routing decisions - probabilistic, continuous, and automated. Every delegation is a hiring decision. Every evaluation is a performance review. In milliseconds.

Agent Routing: Thompson Sampling Belief Updates

New Task Thompson Sampler beliefs + explore Agent A score: 0.87 Agent B score: 0.62 Agent C score: 0.23 Judge cross-model update beliefs

Bandit Routing

REDEREF

Thompson sampling over agent beliefs.

-28% tokens

-17% agent calls

-19% time-to-success

Judge Demotion

Cross-Model

Calibrated judges at handoff boundaries trigger re-routing on failure. Different model families catch different blind spots.

Topology

AdaptOrch

Task dependency DAG selects optimal topology. Promotes entire structural patterns, not individual agents.

agent_routing.py
class AgentRouter:
    def __init__(self, candidates: list[Agent]):
        # Initialize beliefs with uniform priors
        self.beliefs = {a.id: BetaDistribution(1, 1) for a in candidates}
        self.failure_counts = defaultdict(int)

    def delegate(self, task: Task) -> Agent:
        # Thompson sampling: promote high-performers probabilistically
        scores = {
            agent_id: belief.sample()
            for agent_id, belief in self.beliefs.items()
        }
        return max(scores, key=scores.get)

    def on_result(self, agent_id: str, judge_verdict: Verdict):
        # Update beliefs based on judge evaluation
        if judge_verdict.passed:
            self.beliefs[agent_id].record_success()
            self.failure_counts[agent_id] = 0
        else:
            self.beliefs[agent_id].record_failure()
            self.failure_counts[agent_id] += 1

        # Demotion: repeated failures trigger model downgrade
        if self.failure_counts[agent_id] >= 3:
            self.demote(agent_id)

    def demote(self, agent_id: str):
        # Options: switch model, lower temperature, require evidence
        agent = self.agents[agent_id]
        agent.model = agent.fallback_model
        agent.require_citations = True
        self.beliefs[agent_id] = BetaDistribution(1, 5)  # pessimistic prior
04 // The Protocol Wars: MCP vs A2A

Two protocols have emerged as dominant - but they solve different problems at different layers.

Protocol Stack: MCP (Vertical) + A2A (Horizontal)

A2A Protocol Layer - Agent-to-Agent Communication Agent A your system Agent B partner org Agent C 3rd party MCP MCP MCP MCP Servers GitHub Database Slack MCP Servers Analytics CRM Search MCP Servers Payments Storage Monitoring

A2A = Horizontal

Agent ↔ Agent

MCP = Vertical

Agent ↔ Tools

MCP is the USB standard for AI tools. A2A enables cross-org agent collaboration via Agent Cards at /.well-known/agent.json. Teams use both: MCP within a stack, A2A across trust boundaries.

05 // Skills: The New Unit of Intelligence

Models have parameters. Agents have tools. But what makes an agent actually good at a specific task? Skills - versionable packages of procedural expertise centered on SKILL.md. Anthropic's engineering team and the Agent Skills spec formalize four layers:

L4

Policy / Guardrails

What must never happen - rules, allowlists, human approval gates

L3

Skill / Playbook

How to use primitives - SKILL.md, checklists, org conventions

L2

Tool / Integration

What primitives exist - MCP servers, REST APIs, DB bindings

L1

Data / Context

Ground truth - docs, RAG corpora, runbooks, schemas

1

Always On

Name + description

~100 tokens

2

On Relevance

Full SKILL.md

~2K tokens

3

On Subtask

references/ scripts/

as needed

The convergence: skills teach workflow; MCP/APIs supply interfaces. Skills are becoming dependencies, managed with the same rigor as npm packages.

06 // Why Multi-Agent Systems Break

Error Cascade: How a Single Bug Propagates Through the Agent Graph

CodeAgent subtle bug no check TestAgent tests pass! bias ReviewAgent LGTM PRODUCTION users find bug origin conformity monoculture cascade

Fix: Cross-model judges + edge validators + provenance tracking + narrow interfaces

Coordination

Partial observability, sync conflicts, non-deterministic replay.

Trust

No inherent trust. Same model family shares blind spots.

Error Cascades

Wrong intermediate becomes "ground truth" for downstream.

07 // The Developer's New Role

"The new paradigm is characterized by developers who are simultaneously users, creators, and governors of intelligent systems."

- IDC FutureScape - 70% of developers will partner with autonomous agents by 2030

Rising Skills

Task decomposition & delegation

Verification & evals discipline

Observability & agent tracing

Risk literacy (prompt injection)

Skill authoring & tool engineering

System design & architecture

Declining

Boilerplate code writing

Manual test case authoring

Configuration management

Documentation as afterthought

Solo debugging sessions

Exhaustive manual code review

08 // Building Your First Hierarchical Agent System

Theory is useful. Shipping is better. Five steps to your first hierarchical multi-agent system:

1

Define the Contract

Typed payloads between agents: what goes in, what comes out, what errors look like.

{
  "task": "implement_feature",
  "input": { "spec": "...", "context_files": ["..."] },
  "output": { "files_changed": ["..."], "tests_added": 3 },
  "constraints": { "max_tokens": 50000, "timeout_ms": 120000 }
}
2

Pick Topology from Task Structure

Fan-out for parallel. Pipeline for sequential. Hierarchical supervisor for complex quality gates. Start simple.

3

Attach Tools via MCP

Scoped access: ImplementAgent gets filesystem + git. SecurityAgent gets vulnerability scanners. Narrow access = scope sanitization.

4

Add a Routing Policy

Start static (explicit rules). Graduate to bandit-style beliefs (REDEREF) when you have usage data. No fine-tuning required.

5

Instrument and Gate

OpenTelemetry spans per agent/tool. Judges at handoffs. Kill switches. Always a max recursion depth.

supervisor_graph.py - LangGraph Hierarchical Pattern
from langgraph.graph import StateGraph, END
from langgraph.prebuilt import create_react_agent
from typing import TypedDict, Literal

class AgentState(TypedDict):
    task: str
    plan: str
    code: str
    review: str
    status: Literal["planning", "coding", "reviewing", "done"]

def supervisor(state: AgentState) -> dict:
    """Route to the next agent based on current state."""
    if not state.get("plan"):
        return {"status": "planning"}
    if not state.get("code"):
        return {"status": "coding"}
    if not state.get("review"):
        return {"status": "reviewing"}
    return {"status": "done"}

def route(state: AgentState) -> str:
    return state["status"]

# Build the graph
graph = StateGraph(AgentState)
graph.add_node("supervisor", supervisor)
graph.add_node("planner", planner_agent)
graph.add_node("coder", coder_agent)
graph.add_node("reviewer", reviewer_agent)

graph.set_entry_point("supervisor")
graph.add_conditional_edges("supervisor", route, {
    "planning": "planner",
    "coding": "coder",
    "reviewing": "reviewer",
    "done": END,
})

# Each worker returns to supervisor for next routing decision
for node in ["planner", "coder", "reviewer"]:
    graph.add_edge(node, "supervisor")

app = graph.compile()
// Conclusion: The Human in the Loop

The agentic development lifecycle isn't about replacing developers. It's about changing what development means. The developer of 2026 is a conductor of agent orchestras - decomposing problems, designing contracts, verifying outputs, and governing autonomous systems.

The infrastructure is maturing rapidly. MCP standardizes tool access. A2A enables cross-boundary collaboration. Frameworks like LangGraph, CrewAI, and AutoGen make orchestration accessible. Research on agent routing (REDEREF) and topology selection (AdaptOrch) brings rigor to what was previously ad-hoc. IDC predicts a 35% increase in AI governance spending by 2029.

$ echo "TL;DR"

The future of software engineering is not writing code. It's designing the systems that write, validate, and govern code. Master the ADLC, understand multi-agent topologies, and learn to build trust through evaluation - because the agents are already here.

// References & Further Reading
IK

Mohammed Imran Khan

Senior ML Engineer at Red Hat. Ex-Principal ML Engineer at Mercedes-Benz R&D Center. PhD Scholar. Building multi-agent systems and agentic AI platforms.

-- views