Why?
When advancing complex projects with Vibe Coding, relying on a single Agent quickly hits a capability bottleneck. First, the context window fills up with low-value information, forcing task interruptions. Second, mixing different roles like search, planning, and coding in one process lowers efficiency and increases errors. Finally, security concerns arise: lacking a global plan leads to redo work, and no permission isolation risks dangerous operations.
An Agent Team separates roles—planning is planning, execution is execution, and reading is separated from writing—alleviating the first two problems significantly. Harness adds a layer of control: assigning permissions, matching models, managing context, and constraining behavior. However, off-the-shelf solutions rarely fit directly since each project’s tech stack, development standards, task types, and workflows differ. One-size-fits-all solutions struggle to meet real needs.
Fortunately, customizing your own system isn’t complicated. Open a multi-Agent-supporting IDE or CLI tool (like Claude Code or OpenCode) and follow three key steps: design architecture and roles, manage permissions, models, and context, and add a Plan workflow. There’s also a faster option: import a ready-made Team + Harness framework and tailor it to your project’s specific requirements.
Step One: Design Architecture and Roles
Start Simple
You don’t have to build a complete multi-role team from the start. Begin with a single Agent, hereafter called the Builder, roughly equivalent to a default Build Agent in most AI coding tools. As task complexity grows, gradually split roles as needed. Each role is effectively a Sub-agent that takes on specialized tasks delegated by the Builder:
Builderhandles all work initially; if context or efficiency bottlenecks arise, consider splitting duties.- Add read-only roles
Explorer/Researcher: responsible for search and information collection, offloading retrieval work from the main Agent. - Add
Coder: focused on code implementation and modification to improve code quality and consistency. - Add
Planner: manages planning and task breakdown for multi-file or multi-step complex tasks. - Extend with roles like
ReviewerorGeneralas needed for review, general assistance, or permission isolation.
“The best agent architecture is the one you don’t need to build. Start simple. Add complexity only when metrics prove it helps.” —Reliable Data Engineering
Core Design Principles
- Separate Planning and Execution: Planning (deciding what to do and task breakdown) and execution (actual coding and modification) engage different cognitive processes and should be handled by separate roles to avoid frequent context switches and errors.
- Separate Reading and Writing: The risks and costs of reading/searching information versus writing code differ sharply. Reading errors mainly waste time; writing errors can break the project. Assign retrieval tasks to read-only roles and code modification tasks to roles with permissions and review workflows.
- Separate Internal and External Searches: Internal codebase search and external internet research require different tools and strategies. Codebase search relies on project structure understanding (like
Glob/Grep/Read), while internet search focuses on keywords, source evaluation, and info synthesis. Combining both in one role leads to bulky tools or conflicting strategies. Design and manage them separately.
Builder’s Routing Decisions
The Builder is the team’s decision-maker, using the most capable general-purpose model (e.g., Claude Opus). It rarely codes directly; its core duty is routing tasks to the most suitable roles.
A common routing schema looks like this:
| Target Agent | Main Function |
|---|---|
Coder | Code implementation/modification |
Explorer | Codebase search (read-only) |
Researcher | Web and external data research |
Reviewer | Code review and quality control |
General | Handle general subdividable tasks |
Builder | Minor edits or urgent rollbacks |
Humans must clearly describe each Agent’s responsibilities and let the Builder’s underlying model decide whom to delegate. Adjust routing rules and thresholds based on performance. Over time, introduce quantitative metrics like delegation success rate, rollback count, and task completion time to iteratively optimize the division of labor.
Additional Behavioral Rules
These can be initially added to the Builder’s prompt, with specific numbers left for the model to decide and refined later:
Prefer Delegation: When uncertain, delegate first to avoidBuilderbecoming a jack-of-all trades swamped by context details.3-Failure-Stop: Stop and ask the user after 3 consecutive failures on the same subtask; avoid infinite retries.Plan Before Action: For tasks involving more than five files or multiple steps, assign aPlannerto create a plan first instead of rushing to finish.Confirm Before Declaring Done: Before declaring completion, verify all parts are handled, code is reviewed, and tests pass.
Overview of Roles
Roles are realized as Sub-agents that take delegated tasks from Builder. There’s no fixed number of roles; if a task type occurs frequently and needs distinct capability or permissions, it’s worth creating a dedicated role:
Coder (coding executor): Uses a capable, cost-effective model to implement analyzed tasks. Cannot call Task (no recursive delegation) or use Skill. Hard constraints include: cannot pass tests by deleting existing ones, cannot deliver code with TODO or empty functions, must run tests and report results.
Explorer (internal analyst): Read-only codebase search. Tools: Glob / Grep / Read. Output must include a Not found section listing searched but unhit items explicitly.
Researcher (external researcher): Read-only info gathering on the internet. Tools: web search / URL fetch / GitHub browsing. All URLs and references must appear in the final report. Search language affects quality; English is recommended due to richer, more reliable tech content. Cross-verify to avoid hallucination and misinformation.
Reviewer (review and quality control): Uses code-specialized model with reasoning capabilities. Review priority: Completeness > Correctness > Quality. Output must be one of PASS / WARN / FAIL. Also used to review Plan.
Common design for read-only roles: Explorer, Researcher, and Reviewer have no edit or bash permissions to isolate side effects and prevent accidental file changes during read operations.
General (generalist): Has the highest allowed independent steps, full toolset, and can schedule Sub-agents. Used for tasks that are dividable but hard to categorize or cross-domain complex tasks.
Compaction (context compression): Uses long-context models with reasoning enabled to decide which messages to keep.
Step Two: Manage Permissions, Models, and Context
Permissions: Least Privilege
Each role should retain only the minimal permissions needed to perform its tasks:
| Role | edit | bash | task | skill | web |
|---|---|---|---|---|---|
Builder | ✓ | ✓ | ✓ | ✓ | ✓ |
Planner | edit only isolated .plan dir | ✓ | ✗ | ✓ | ✓ |
Coder | ✓ | ✓ | ✗ | ✗ | ✗ |
Explorer | ✗ | ✗ | ✗ | ✗ | ✗ |
Researcher | ✗ | ✗ | ✗ | ✗ | ✓ |
Reviewer | ✗ | ✗ | ✗ | ✗ | ✗ |
General | ✓ | ✓ | ✓ | ✓ | ✓ |
Many AI Agents tend to “do as much as possible.” If given edit permission, they may modify files during research; if given task permission, they may further delegate even simple problems. Explicitly restricting permissions ensures each role sticks to its job.
Typical example: If Coder has task permission, it might delegate tough problems to Researcher or Explorer instead of solving them directly. This breaks Builder’s routing logic, wastes tokens, and keeps Builder unaware. Correct practice: Coder handles what it can by coding; if stuck, returns the problem to Builder.
Model Matching
Different tasks require different intelligence levels. The most expensive models go to tasks needing the most judgment.
| Tier | Model Type | Roles | Thinking Enabled | Reason |
|---|---|---|---|---|
| Flagship | Most powerful model | Builder, Planner | Yes, large budget | Routing and planning decision quality directly impacts end-to-end efficiency |
| Execution | Cost-effective model | Coder | Yes, small budget | Tasks are analyzed and focused on instruction compliance |
| General | Varies by complexity | General | Optional | Tasks have fuzzy boundaries and require cross-domain coordination |
| Search | Lightweight models | Explorer, Researcher | Optional | Search is deterministic, prioritizing speed, cost, and tool call success rate |
| Long context | Long window models | Compaction | — | Needs to first comprehend large context before compressing |
When configuring, also consider multi-provider mixing and fallback strategies to benchmark performance, cut costs, and mitigate single points of failure.
Context Management and Information Isolation
Context is the scarcest resource; a tight context window quickly degrades model performance. Every bit of context should be used efficiently:
Sub-agents communicate with
Builderonly via final messages; intermediate steps are invisible toBuilder.Builderjust needs “task done, result is XXX”; detailed process info is provided only on exceptions.Compactioncan use three strategies simultaneously:auto(automatic triggers),prune(active trimming), andreserved(reserved space).
Step Three: Add the Plan Workflow to Handle More Complex Tasks
Why Plan Before Execution?
For complex tasks, invoking Builder directly without planning leads to four typical problems:
- Insufficient context window (already discussed).
- Lack of global view: Agent changes first file, then mid-way discovers conflicts in file three, requiring rework.
- No recoverability: interruption or context exhaustion causes progress loss; must start over.
- No auditability: Humans don’t know Agent’s intent until after execution, risking costly misdirection.
The two-phase Plan-Execute workflow addresses these issues. We add a dedicated Planner responsible for crafting structured Plan files.
Planner Agent
Planner takes complex tasks and outputs structured Plan files. Plans are persisted as files, bringing four benefits:
- Global view isn’t lost due to context exhaustion.
- Progress can resume after interruption.
- Both machine and human can review and adjust plans before execution (human-in-the-loop).
Planner has read/write access restricted to a specific directory (e.g., ./plans/<name>.md), physically isolating planning from execution.
Structure of the Plan File
The Plan file is a structured execution checklist with three core design points:
Batch Thinking: Tasks in the same batch are independent and can run in parallel; batches have dependencies and must run sequentially. Parallel execution drastically speeds things up but requires logic to identify true independence—this judgment is the
Planner’s responsibility.Five Elements of a Task:
- Assigned role: which Sub-agent executes it.
- Description: executable task explanation including files and expectations.
- Involved files: list of files to create, modify, or read.
- Verification criteria: clear checks.
- Status flags:
pending/in-progress/completed/blocked
Mandatory final verification batch: The last
Batchmust include integration tests and final review. Passing individual tasks doesn’t guarantee overall correctness—e.g., Task A changes an interface, Task B still uses old interface; each test passes alone but combined fails.
Executing the Plan
Executing the plan is managing a state machine to drive tasks from pending to completed. A simple Skill can describe this.
Core execution steps:
- Load the
Planfile and scan all task statuses. - Find the first incomplete
Batchand start executing it. - For each task: mark as
in-progressand write to file → delegate to Agent → on response update status and write immediately. Batch gate: only after all tasks in current batch finish can the next batch start.- Execute the final verification
Batch.
Checkpoint recovery: If tasks marked in-progress remain on load, it means last execution was interrupted. Recovery steps:
- Locate interrupted tasks.
- Assign
Explorerto assess damage: Are files complete? Are thereTODOplaceholders? Does the project compile? - Based on assessment: completed → validated by
Reviewer; partial → continued byCoder; damaged → fixed byCoder.
Core principle: Never assume interrupted work is done.
Additional Behavioral Rules
Scope Per Delegation: No delegating toCodertasks involving more than 5 files at once; split bigger tasks.Done Means Verified: Completion criterion is meeting verification checks, not just “code written” or “all tasks delegated”.Persist Progress Eagerly: Write status changes to file immediately.Specification Drift Check: After all batches, re-read Goal and Verification Criteria word by word, compare with outputs to detect any silent feature creep, missing requirements, or deviations.
Advanced Concept: Ralph Loop
For even more complex projects needing long AI autonomy with clear goals, you can introduce the Ralph Loop method.
In March 2026, Anthropic accidentally leaked all 512,000 lines of TypeScript source code for Claude Code. A Korean developer, Sigrid Jin, rewrote the entire codebase from TypeScript to Python in two hours using Ralph Loop, without writing a single line manually—fully relying on Agent-driven iterative development until success.
Ralph Loop’s key idea: break requirements into independent items, process one per iteration, proceed only after passing validation; each iteration starts a fresh Agent context. Memory between iterations is passed through files (git history, progress files, requirement status), avoiding context degradation from long runs.
The flow:
- List all requirements, mark each as passed or failed.
- Launch fresh Agent context and pick the highest priority unmet item.
- Implement and validate it; if pass, mark done.
- Append experience to progress file for next iteration’s reference.
- Repeat until all pass or reach max iterations.
Ralph Loop can be implemented as a Skill, invoked and driven by the Builder to repeatedly execute the existing Agent Team.
Summary
Building an Agent Team is fundamentally architectural design, not just Prompt engineering. It’s like designing microservices or team workflows. Interestingly, once you start splitting roles, defining permissions, and designing state transitions, you realize these insights reflect how human teams work: why clear responsibility boundaries matter, why persistent workflows matter, why review and testing can’t be skipped. Agents hold up a mirror, reflecting your understanding of organizing work.