Why?

When advancing complex projects with Vibe Coding, relying on a single Agent quickly hits a capability bottleneck. First, the context window fills up with low-value information, forcing task interruptions. Second, mixing different roles like search, planning, and coding in one process lowers efficiency and increases errors. Finally, security concerns arise: lacking a global plan leads to redo work, and no permission isolation risks dangerous operations.

An Agent Team separates roles—planning is planning, execution is execution, and reading is separated from writing—alleviating the first two problems significantly. Harness adds a layer of control: assigning permissions, matching models, managing context, and constraining behavior. However, off-the-shelf solutions rarely fit directly since each project’s tech stack, development standards, task types, and workflows differ. One-size-fits-all solutions struggle to meet real needs.

Fortunately, customizing your own system isn’t complicated. Open a multi-Agent-supporting IDE or CLI tool (like Claude Code or OpenCode) and follow three key steps: design architecture and roles, manage permissions, models, and context, and add a Plan workflow. There’s also a faster option: import a ready-made Team + Harness framework and tailor it to your project’s specific requirements.

Step One: Design Architecture and Roles

Start Simple

You don’t have to build a complete multi-role team from the start. Begin with a single Agent, hereafter called the Builder, roughly equivalent to a default Build Agent in most AI coding tools. As task complexity grows, gradually split roles as needed. Each role is effectively a Sub-agent that takes on specialized tasks delegated by the Builder:

  • Builder handles all work initially; if context or efficiency bottlenecks arise, consider splitting duties.
  • Add read-only roles Explorer / Researcher: responsible for search and information collection, offloading retrieval work from the main Agent.
  • Add Coder: focused on code implementation and modification to improve code quality and consistency.
  • Add Planner: manages planning and task breakdown for multi-file or multi-step complex tasks.
  • Extend with roles like Reviewer or General as needed for review, general assistance, or permission isolation.

“The best agent architecture is the one you don’t need to build. Start simple. Add complexity only when metrics prove it helps.” —Reliable Data Engineering

Core Design Principles

  • Separate Planning and Execution: Planning (deciding what to do and task breakdown) and execution (actual coding and modification) engage different cognitive processes and should be handled by separate roles to avoid frequent context switches and errors.
  • Separate Reading and Writing: The risks and costs of reading/searching information versus writing code differ sharply. Reading errors mainly waste time; writing errors can break the project. Assign retrieval tasks to read-only roles and code modification tasks to roles with permissions and review workflows.
  • Separate Internal and External Searches: Internal codebase search and external internet research require different tools and strategies. Codebase search relies on project structure understanding (like Glob/Grep/Read), while internet search focuses on keywords, source evaluation, and info synthesis. Combining both in one role leads to bulky tools or conflicting strategies. Design and manage them separately.

Builder’s Routing Decisions

The Builder is the team’s decision-maker, using the most capable general-purpose model (e.g., Claude Opus). It rarely codes directly; its core duty is routing tasks to the most suitable roles.

A common routing schema looks like this:

Target AgentMain Function
CoderCode implementation/modification
ExplorerCodebase search (read-only)
ResearcherWeb and external data research
ReviewerCode review and quality control
GeneralHandle general subdividable tasks
BuilderMinor edits or urgent rollbacks

Humans must clearly describe each Agent’s responsibilities and let the Builder’s underlying model decide whom to delegate. Adjust routing rules and thresholds based on performance. Over time, introduce quantitative metrics like delegation success rate, rollback count, and task completion time to iteratively optimize the division of labor.

Additional Behavioral Rules

These can be initially added to the Builder’s prompt, with specific numbers left for the model to decide and refined later:

  • Prefer Delegation: When uncertain, delegate first to avoid Builder becoming a jack-of-all trades swamped by context details.
  • 3-Failure-Stop: Stop and ask the user after 3 consecutive failures on the same subtask; avoid infinite retries.
  • Plan Before Action: For tasks involving more than five files or multiple steps, assign a Planner to create a plan first instead of rushing to finish.
  • Confirm Before Declaring Done: Before declaring completion, verify all parts are handled, code is reviewed, and tests pass.

Overview of Roles

Roles are realized as Sub-agents that take delegated tasks from Builder. There’s no fixed number of roles; if a task type occurs frequently and needs distinct capability or permissions, it’s worth creating a dedicated role:

Coder (coding executor): Uses a capable, cost-effective model to implement analyzed tasks. Cannot call Task (no recursive delegation) or use Skill. Hard constraints include: cannot pass tests by deleting existing ones, cannot deliver code with TODO or empty functions, must run tests and report results.

Explorer (internal analyst): Read-only codebase search. Tools: Glob / Grep / Read. Output must include a Not found section listing searched but unhit items explicitly.

Researcher (external researcher): Read-only info gathering on the internet. Tools: web search / URL fetch / GitHub browsing. All URLs and references must appear in the final report. Search language affects quality; English is recommended due to richer, more reliable tech content. Cross-verify to avoid hallucination and misinformation.

Reviewer (review and quality control): Uses code-specialized model with reasoning capabilities. Review priority: Completeness > Correctness > Quality. Output must be one of PASS / WARN / FAIL. Also used to review Plan.

Common design for read-only roles: Explorer, Researcher, and Reviewer have no edit or bash permissions to isolate side effects and prevent accidental file changes during read operations.

General (generalist): Has the highest allowed independent steps, full toolset, and can schedule Sub-agents. Used for tasks that are dividable but hard to categorize or cross-domain complex tasks.

Compaction (context compression): Uses long-context models with reasoning enabled to decide which messages to keep.

Step Two: Manage Permissions, Models, and Context

Permissions: Least Privilege

Each role should retain only the minimal permissions needed to perform its tasks:

Roleeditbashtaskskillweb
Builder
Planneredit only isolated .plan dir
Coder
Explorer
Researcher
Reviewer
General

Many AI Agents tend to “do as much as possible.” If given edit permission, they may modify files during research; if given task permission, they may further delegate even simple problems. Explicitly restricting permissions ensures each role sticks to its job.

Typical example: If Coder has task permission, it might delegate tough problems to Researcher or Explorer instead of solving them directly. This breaks Builder’s routing logic, wastes tokens, and keeps Builder unaware. Correct practice: Coder handles what it can by coding; if stuck, returns the problem to Builder.

Model Matching

Different tasks require different intelligence levels. The most expensive models go to tasks needing the most judgment.

TierModel TypeRolesThinking EnabledReason
FlagshipMost powerful modelBuilder, PlannerYes, large budgetRouting and planning decision quality directly impacts end-to-end efficiency
ExecutionCost-effective modelCoderYes, small budgetTasks are analyzed and focused on instruction compliance
GeneralVaries by complexityGeneralOptionalTasks have fuzzy boundaries and require cross-domain coordination
SearchLightweight modelsExplorer, ResearcherOptionalSearch is deterministic, prioritizing speed, cost, and tool call success rate
Long contextLong window modelsCompactionNeeds to first comprehend large context before compressing

When configuring, also consider multi-provider mixing and fallback strategies to benchmark performance, cut costs, and mitigate single points of failure.

Context Management and Information Isolation

Context is the scarcest resource; a tight context window quickly degrades model performance. Every bit of context should be used efficiently:

  1. Sub-agents communicate with Builder only via final messages; intermediate steps are invisible to Builder. Builder just needs “task done, result is XXX”; detailed process info is provided only on exceptions.

  2. Compaction can use three strategies simultaneously: auto (automatic triggers), prune (active trimming), and reserved (reserved space).

Step Three: Add the Plan Workflow to Handle More Complex Tasks

Why Plan Before Execution?

For complex tasks, invoking Builder directly without planning leads to four typical problems:

  1. Insufficient context window (already discussed).
  2. Lack of global view: Agent changes first file, then mid-way discovers conflicts in file three, requiring rework.
  3. No recoverability: interruption or context exhaustion causes progress loss; must start over.
  4. No auditability: Humans don’t know Agent’s intent until after execution, risking costly misdirection.

The two-phase Plan-Execute workflow addresses these issues. We add a dedicated Planner responsible for crafting structured Plan files.

Planner Agent

Planner takes complex tasks and outputs structured Plan files. Plans are persisted as files, bringing four benefits:

  • Global view isn’t lost due to context exhaustion.
  • Progress can resume after interruption.
  • Both machine and human can review and adjust plans before execution (human-in-the-loop).

Planner has read/write access restricted to a specific directory (e.g., ./plans/<name>.md), physically isolating planning from execution.

Structure of the Plan File

The Plan file is a structured execution checklist with three core design points:

  • Batch Thinking: Tasks in the same batch are independent and can run in parallel; batches have dependencies and must run sequentially. Parallel execution drastically speeds things up but requires logic to identify true independence—this judgment is the Planner’s responsibility.

  • Five Elements of a Task:

    1. Assigned role: which Sub-agent executes it.
    2. Description: executable task explanation including files and expectations.
    3. Involved files: list of files to create, modify, or read.
    4. Verification criteria: clear checks.
    5. Status flags: pending / in-progress / completed / blocked
  • Mandatory final verification batch: The last Batch must include integration tests and final review. Passing individual tasks doesn’t guarantee overall correctness—e.g., Task A changes an interface, Task B still uses old interface; each test passes alone but combined fails.

Executing the Plan

Executing the plan is managing a state machine to drive tasks from pending to completed. A simple Skill can describe this.

Core execution steps:

  1. Load the Plan file and scan all task statuses.
  2. Find the first incomplete Batch and start executing it.
  3. For each task: mark as in-progress and write to file → delegate to Agent → on response update status and write immediately.
  4. Batch gate: only after all tasks in current batch finish can the next batch start.
  5. Execute the final verification Batch.

Checkpoint recovery: If tasks marked in-progress remain on load, it means last execution was interrupted. Recovery steps:

  1. Locate interrupted tasks.
  2. Assign Explorer to assess damage: Are files complete? Are there TODO placeholders? Does the project compile?
  3. Based on assessment: completed → validated by Reviewer; partial → continued by Coder; damaged → fixed by Coder.

Core principle: Never assume interrupted work is done.

Additional Behavioral Rules

  • Scope Per Delegation: No delegating to Coder tasks involving more than 5 files at once; split bigger tasks.
  • Done Means Verified: Completion criterion is meeting verification checks, not just “code written” or “all tasks delegated”.
  • Persist Progress Eagerly: Write status changes to file immediately.
  • Specification Drift Check: After all batches, re-read Goal and Verification Criteria word by word, compare with outputs to detect any silent feature creep, missing requirements, or deviations.

Advanced Concept: Ralph Loop

For even more complex projects needing long AI autonomy with clear goals, you can introduce the Ralph Loop method.

In March 2026, Anthropic accidentally leaked all 512,000 lines of TypeScript source code for Claude Code. A Korean developer, Sigrid Jin, rewrote the entire codebase from TypeScript to Python in two hours using Ralph Loop, without writing a single line manually—fully relying on Agent-driven iterative development until success.

Ralph Loop’s key idea: break requirements into independent items, process one per iteration, proceed only after passing validation; each iteration starts a fresh Agent context. Memory between iterations is passed through files (git history, progress files, requirement status), avoiding context degradation from long runs.

The flow:

  1. List all requirements, mark each as passed or failed.
  2. Launch fresh Agent context and pick the highest priority unmet item.
  3. Implement and validate it; if pass, mark done.
  4. Append experience to progress file for next iteration’s reference.
  5. Repeat until all pass or reach max iterations.

Ralph Loop can be implemented as a Skill, invoked and driven by the Builder to repeatedly execute the existing Agent Team.

Summary

Building an Agent Team is fundamentally architectural design, not just Prompt engineering. It’s like designing microservices or team workflows. Interestingly, once you start splitting roles, defining permissions, and designing state transitions, you realize these insights reflect how human teams work: why clear responsibility boundaries matter, why persistent workflows matter, why review and testing can’t be skipped. Agents hold up a mirror, reflecting your understanding of organizing work.