AI orchestrating AI: how we refactored a complex system in 3.5 hours using agent coordination

AI orchestrating AI: how we refactored a complex system in 3.5 hours using agent coordination
Connor Turland·October 28, 2025·

You need to refactor your webhook system. Three different tunnel implementations are competing for control. Deprecated code has tendrils everywhere. Simple changes require touching six files. Your team is afraid to modify anything.

The traditional answer: assign it to your senior engineer, hope they don't quit halfway through, and wait 2-3 weeks.

What we did instead: We gave it to an AI orchestrator. Not to write the code - to coordinate six specialized AI agents, each handling one piece of the refactor in parallel. Total time: 3.5 hours. Zero human intervention until final review.

This is orchestration. Not the enterprise buzzword - the actual capability of AI agents coordinating other AI agents to accomplish complex work.

The problem: a refactor too risky to attempt

Here's the situation. Our webhook system had evolved organically over 18 months:

  • Three competing tunnel implementations (ngrok, Cloudflare, custom proxy worker)
  • 5,554 lines of deprecated code (ndjson-client, proxy-worker) with references scattered across 59 files
  • Hardcoded request handlers mixed with server infrastructure
  • Tightly coupled verification logic spanning multiple packages

The ask: refactor this completely. Remove deprecated code. Convert to modern Fastify framework. Separate concerns. Don't break anything.

The problem: This wasn't six isolated tasks. It was six interdependent changes where order mattered, dependencies tangled, and one mistake could cascade across the entire system.

Traditional approach: Assign to senior engineer. Timeline: 2-3 weeks. Risk: High. Cost: One burned-out engineer.

What we actually did: AI orchestrating AI

We gave this to Cyrus acting as an orchestrator agent. Not to write all the code - to break down the work, delegate to specialized agents, verify their output, and coordinate the integration.

Here's what happened:

Hour 0:00 - Decomposition

The orchestrator agent analyzed CYPACK-248 and decomposed it into 6 atomic, sequential sub-issues:

  1. CYPACK-249: Remove ndjson-client and proxy-worker (5,554 lines to delete)
  2. CYPACK-250: Convert SharedApplicationServer to Fastify
  3. CYPACK-251: Create config-updater package, extract handlers
  4. CYPACK-252: Simplify CloudflareTunnelClient
  5. CYPACK-253: Rename linear-webhook-client to linear-event-transport
  6. CYPACK-254: Integrate all refactored components in EdgeWorker

Key insight: The orchestrator identified dependencies automatically. Each sub-issue built on the previous one. Sequential execution was the only safe path.

Hour 0:05 - Delegation begins

The orchestrator created the first sub-issue (CYPACK-249) in Linear with detailed acceptance criteria:

- Delete packages/ndjson-client/ directory completely
- Delete apps/proxy-worker/ directory completely
- Remove all NdjsonClient imports from EdgeWorker
- Update test mocks to remove NdjsonClient references
- Verify: 99 tests passing, TypeScript clean, build succeeds

Then it triggered a child agent session to handle this specific task.

Hour 0:45 - First child completes

Child Agent (CYPACK-249) working in isolated git worktree:

  • Deleted 5,554 lines of deprecated code
  • Removed ndjson-client package (27 files)
  • Removed proxy-worker app (17 files)
  • Updated EdgeWorker to use only LinearWebhookClient
  • Fixed all test mocks (7 test files)
  • Reported back: "99 tests passing, TypeScript clean, build succeeds"

Orchestrator response:

  1. Pulled child's worktree
  2. Verified independently: Ran tests, typecheck, build
  3. Merged changes into orchestrator branch
  4. Triggered next child: Created CYPACK-250 agent session

Hour 1:15 - Fastify conversion complete

Child Agent (CYPACK-250) working in new worktree:

  • Converted SharedApplicationServer from http.createServer to Fastify
  • Removed 892 lines of pathname-based routing (if statements replaced with Fastify routes)
  • Added module registration system
  • Reported back: "99 tests passing, ready for merge"

Orchestrator: Verified, merged, and triggered CYPACK-251

Hour 1:50 - Config extraction complete

Child Agent (CYPACK-251) working in third worktree:

  • Created new config-updater package
  • Extracted configuration handlers from CloudflareTunnelClient
  • Net: 467 additions, 208 deletions
  • Reported back: Complete

Orchestrator: Verified, merged, and triggered CYPACK-252

Hour 2:20 - Tunnel simplification complete

Child Agent (CYPACK-252) working in fourth worktree:

  • Simplified CloudflareTunnelClient to only handle tunnels
  • Removed all HTTP server and handler logic (54 lines net reduction)
  • Reported back: "62 tests passing, TypeScript clean"

Orchestrator: Verified, merged, and triggered CYPACK-253

Hour 2:45 - Naming refactor hits snag

Child Agent (CYPACK-253) working in fifth worktree:

  • Renamed linear-webhook-client to linear-event-transport
  • Refactored with Fastify integration
  • Reported back: "Implementation complete"

Orchestrator verification: 41 tests failing

The child agent renamed the package but forgot to update test imports. Tests still referenced the old LinearWebhookClient class.

Orchestrator action: Posted detailed feedback to CYPACK-253:

  • Verification failed. 41 tests failing across 6 test files
  • Root cause: Test files still import LinearWebhookClient but this class no longer exists
  • Required: Update imports to LinearEventTransport and fix mock implementations
  • Must fix 6 test files: EdgeWorker.dynamic-tools.test.ts, EdgeWorker.feedback-delivery.test.ts, and 4 others
  • Ensure all 148 tests pass before marking complete

Child Agent (CYPACK-253) - 20 minutes later:

  • Fixed all test imports
  • Updated mock implementations
  • Reported back: "All 189 tests passing"

Orchestrator: Verified, merged, and triggered CYPACK-254

Hour 3:30 - Final integration complete

Child Agent (CYPACK-254) working in sixth worktree:

  • Integrated all refactored components in EdgeWorker
  • Replaced multiple LinearEventTransport instances with single shared instance
  • Added Cloudflare tunnel support to SharedApplicationServer
  • Net: 222 lines removed
  • Reported back: "99 tests passing, complete"

Orchestrator final verification:

  • All 189 tests passing across entire codebase
  • TypeScript compilation successful
  • Build succeeds for all packages
  • Linting clean
  • Created pull request with complete changes

The real impact: orchestration vs. single-agent execution

Let's be specific about what orchestration bought us:

Time comparison

Single senior engineer (estimated):

  • Week 1: Understand dependencies, plan approach, start removing deprecated code
  • Week 2: Convert to Fastify, extract modules, hit integration issues
  • Week 3: Fix cascading test failures, debug mysterious breaks, finally get PR ready
  • Total: 15-20 days of focused work

Orchestrated AI agents (actual):

  • Hour 0-1: Decomposition + first deletion phase complete
  • Hour 1-2: Fastify conversion + config extraction complete
  • Hour 2-3: Tunnel simplification + naming refactor complete
  • Hour 3-4: Final integration + PR creation complete
  • Total: 3.5 hours wall-clock time, zero human intervention

The multiplier: 40-60x faster than human execution

Quality comparison

What human refactors typically produce:

  • Forgotten edge cases discovered in production
  • Test coverage gaps ("I'll add tests later")
  • Integration issues caught during code review
  • Documentation updates deferred

What orchestrated refactor produced:

  • Every sub-issue verified before integration (tests, typecheck, build)
  • Orchestrator caught child agent mistakes before merge (CYPACK-253 test failures)
  • Complete test coverage maintained throughout
  • Each phase independently verifiable

The difference: Quality gates at every step, not just at the end

Risk mitigation

Human refactor risks:

  • Knowledge in one person's head
  • Context loss if engineer switches focus
  • No intermediate rollback points
  • "Works on my machine" surprises

Orchestrated refactor safety:

  • Each sub-issue isolated in separate git worktree
  • Independent verification before any integration
  • Clear rollback points (revert to pre-CYPACK-251 state)
  • Reproducible process documented in Linear comment streams

The orchestration patterns: what made this work

Watching the orchestrator coordinate six agents revealed key patterns:

1. Atomic decomposition

The orchestrator didn't create vague sub-tasks like "refactor webhook system." It created precise, verifiable units:

Bad decomposition:

  • "Clean up the webhook code"
  • "Make it more modular"
  • "Improve the architecture"

Actual decomposition:

  • "Delete packages/ndjson-client/ and apps/proxy-worker/. Remove all imports. Verify 99 tests passing."
  • "Convert SharedApplicationServer constructor from http.createServer to Fastify. Verify build succeeds."
  • "Extract config handlers to new config-updater package. Verify TypeScript clean."

Each sub-issue had concrete acceptance criteria. No ambiguity. No interpretation needed.

2. Sequential dependency management

The orchestrator recognized that order mattered:

  • Can't convert to Fastify until deprecated code is removed (CYPACK-249 must complete first)
  • Can't simplify CloudflareTunnelClient until handlers are extracted (CYPACK-251 → CYPACK-252)
  • Can't integrate EdgeWorker until all components are refactored (CYPACK-254 must be last)

Human instinct: Try to parallelize everything for speed. Orchestrator insight: Dependencies force sequence. Respect them.

3. Verification before integration

After each child agent completed work, the orchestrator:

  1. Pulled the child's git worktree
  2. Ran independent verification (didn't trust child's self-reported success)
  3. Only merged if verification passed
  4. Provided detailed feedback if verification failed (CYPACK-253 test failure example)

This prevented cascading failures. A bad merge in hour 1 would have broken everything in hours 2-3.

4. Feedback loops

When CYPACK-253 reported "Implementation complete" but tests were failing, the orchestrator didn't just reject it. It:

  • Identified the specific failure (41 tests, 6 files)
  • Diagnosed root cause (old class name in test imports)
  • Provided actionable fix list
  • Allowed child agent to correct and re-submit

Result: Child agent learned, fixed the issue, and delivered working code. No human escalation needed.

5. Isolated workspaces

Each child agent worked in a separate git worktree. Benefits:

  • No merge conflicts between parallel work
  • Failed experiments don't corrupt main branch
  • Orchestrator can pull completed work when ready
  • Clean rollback if a phase fails catastrophically

When to use orchestration (and when not to)

Orchestration isn't for every task. Here's when it matters:

Use orchestration when:

1. The task has clear sub-problems

  • Refactoring a monolithic service into modules
  • Migrating from one framework to another
  • Extracting shared code into libraries
  • Updating dependencies across multiple packages

Example: CYPACK-248 had six obvious phases. If your task doesn't decompose naturally, orchestration adds overhead.

2. Dependencies between sub-tasks exist

  • Order matters (can't do B until A completes)
  • Integration points need coordination
  • Verification checkpoints prevent cascading failures

Example: Converting to Fastify required removing deprecated code first. Random ordering would have failed.

3. Each sub-task can be independently verified

  • Tests can prove correctness
  • Build/compile/typecheck validates integration
  • Success criteria are objective, not subjective

Example: "99 tests passing" is binary. "Looks good to me" is not.

4. The cost of human coordination exceeds setup cost

  • Task would take multiple days of human focus
  • Requires specialized knowledge in different areas
  • Risk of human error is high
  • You want documentation of the process

Example: This refactor would have taken 2-3 weeks of senior engineer time. Orchestration took 3.5 hours of wall-clock time.

Don't use orchestration when:

The task is simple and linear

  • Single-file changes
  • Obvious implementation path
  • No dependencies to manage
  • Takes less than 1 hour of human time

Example: "Add a new API endpoint" doesn't need orchestration. Just do it.

The requirements are ambiguous

  • Unclear acceptance criteria
  • Design decisions need human judgment
  • Exploratory work ("figure out why X is slow")

Example: "Make the app feel faster" is not orchestratable. "Reduce initial page load from 4s to under 2s" is.

You need creative problem-solving

  • Novel algorithms
  • Complex business logic with edge cases
  • UX design requiring aesthetic judgment

Example: "Design a beautiful checkout flow" needs human creativity. "Implement this checkout flow spec" can be orchestrated.

How to use orchestration with Cyrus

This capability is built into Cyrus. Here's how to trigger it:

Use the "Orchestrator" label

When creating an issue in Linear that needs orchestration:

  1. Write a detailed description of the refactor (like CYPACK-248)
  2. Add the "Orchestrator" label to the issue
  3. Assign to Cyrus
  4. Cyrus will automatically:
  • Analyze the requirements

What you'll see

Once orchestration begins, watch the Linear comment stream:

Parent issue (orchestrator):

  • Decomposition plan with sub-issues
  • Progress updates as each child completes
  • Verification results (tests, typecheck, build)
  • Merge confirmations
  • Final summary with PR link

Child issues (sub-tasks):

  • Detailed acceptance criteria
  • Agent session activity
  • Implementation summaries
  • Verification results
  • Completion notifications

Best practices

Write clear parent issue descriptions:

  • State the goal clearly
  • List known components that need changes
  • Mention dependencies if you know them
  • Set acceptance criteria for the overall refactor

Let the orchestrator plan:

  • Don't try to create sub-issues yourself
  • The orchestrator will analyze and decompose better than manual planning
  • Trust the dependency analysis

Review the plan before execution:

  • After decomposition, the orchestrator will post the plan
  • Review the sub-issues it created
  • Provide feedback if the decomposition looks wrong
  • Give approval to proceed (or it will proceed automatically)

The future: AI agents coordinating AI agents

What you just read isn't a vision of the future. It's happening now. Cyrus orchestrated this refactor on October 28, 2024. The Linear issues exist. The comment streams are real. The PR merged successfully.

This changes everything.

For engineering leaders:

  • Complex refactors that blocked your roadmap for months can complete in hours
  • Technical debt becomes manageable (pay it down systematically, not heroically)
  • Risk mitigation is built in (verification at every step, not just at the end)
  • Documentation is automatic (every orchestration leaves a complete audit trail)

For senior engineers:

  • Stop being the bottleneck for complex migrations
  • Delegate the tedious parts, focus on architecture decisions
  • No more "I'm the only one who can touch this code"
  • Knowledge transfer happens through documented orchestration patterns

For teams:

  • Velocity becomes predictable (orchestration times are consistent)
  • Onboarding accelerates (new engineers see how complex work gets coordinated)
  • Code quality improves (every phase independently verified)
  • Burnout decreases (AI handles the grunt work of large refactors)

The bottom line

Orchestration isn't about making code prettier. It's about making impossible refactors possible.

The old model: Complex refactor, assign to senior engineer, hope they don't quit, wait 3 weeks, pray it works

The new model: Complex refactor, assign to orchestrator agent, decompose automatically, verify continuously, complete in hours

The difference:

  • 3.5 hours vs 3 weeks (40-60x faster)
  • Zero human hours vs 120 human hours
  • Quality gates at every step vs one big review at the end
  • Complete audit trail vs knowledge in one person's head
  • Reproducible process vs tribal knowledge

The teams that adopt orchestration first will ship faster. The teams that wait will wonder how they're getting lapped by smaller competitors.


Questions about orchestration or want to discuss your specific refactoring challenges? Connect with us @CyrusAgent or book a 20-minute technical discussion with our team.

AI orchestrating AI: how we refactored a complex system in 3.5 hours using agent coordination | Cyrus AI Development Agent