This article documents the combination of AI tools and models I used across analysis, development, and management from late 2025 to early 2026.

Given the rapid iteration of tools and models on the market, any specific product recommendations tend to become outdated quickly. What matters more is the decision-making framework behind tool selection. The key insight is this: in the AI era, choosing development tools isn’t about finding a single best model or solution. Instead, you build a classification-based judgment framework and apply a combination strategy to achieve a global optimum. This mindset will remain relevant for years to come.

Conversation Platforms: Daily Interaction

PlatformPrimary UseNotes
GeminiRoutine tech Q&AGlobally stable access, top-tier search capability, strong hallucination control, $15/month offers solid value
ManusDeep research (technical reports, investment analysis)Max model is pricey but delivers high-quality output; $1 can buy a logically rigorous, well-cited deep analysis
Google NotebookLMDeep research (knowledge organization and summarization)Uses document libraries as knowledge sources to support multi-turn, context-aware Q&A and content refinement

Development Tools (IDE/CLI): Driving Work

ToolPositioning and Notes
OpenCode (CLI/Web UI)Open-source alternative to Claude Code, fast iteration, strong customization, stable access. Its value lies not in single capabilities but as an easy-access, simple-interaction multi-agent collaboration platform
GitHub Copilot (Web)Typical workflow: select repo, describe needs, start VM, auto-fix issues, open Pull Request review, collaborate on changes, merge. Suited for medium-complexity tasks where you want full end-to-end automation
Zed (Editor)Lightweight IDE, supports OpenCode integration in sidebar
Trae, VS Code + ExtensionsDeprecated

OpenCode’s key advantage is the oh-my-opencode plugin — a multi-agent system with specialized role assignments. The trade-off: frequent agent interactions, longer task chains, and heavy token consumption. But it delivers on quality in complex technical scenarios. Once you learn the interaction patterns and match the right SToA models to each role, it can sustain deep, autonomous technical discussions for hours.

GitHub Copilot’s web version takes a different approach: highly integrated, scenario-specific, targeting end-to-end automation. It intentionally limits multi-turn deep interaction in favor of streamlining from problem description to code merge. For medium-complexity tasks with clear boundaries, it delivers extreme efficiency and integrates seamlessly with GitHub’s PR review process — the experience is close to making a full-cycle open source contribution.

Each has its emphasis: OpenCode prioritizes control and depth, while GitHub Copilot pursues end-to-end automation. In practice, they complement each other — OpenCode for complex architectural discussions, Copilot for targeted bug fixes.

Model Selection: Match Capability by Use Case

Selecting models starts with clearly defining the task scenario. Among all tasks, domain and business understanding requires special handling.

Domain/business understanding doesn’t rely on pretrained general knowledge — it’s entirely based on user-provided context (business documents, historical code, decision logs). The model simply extracts information, cross-references data, and synthesizes logical conclusions. Pretraining knowledge here actively hurts: it induces hallucinations that degrade output accuracy. What matters is how well a model handles long-context completeness and faithfully reproduces information. Any model with a sufficiently long context window can do this. Pretrained knowledge and in-model memory have little to no positive impact — and can be detrimental.

Beyond domain understanding, different models and strategies align with typical use cases:

Use CasePreferred ModelBackup / AlternativeWhy
Task decomposition & workflow planningClaude Opus 4.6Kimi K2.5Requires exceptional logical decomposition and step planning; Opus is nearly perfect here
Proposal & code reviewGPT 5.3-CodexClaude Opus 4.6Demands sharp code quality standards, security flaw detection, and architectural judgment
Comprehensive development (architecture + implementation)Claude Opus 4.6GLM-5Must handle deep architecture trade-offs and produce actionable code
Independent closed-problem solvingGPT 5.4 Pro-Single-model single-agent, best for one-off problems like implementing an algorithm without extra intervention
Simple code implementationKimi K2.5MiniMax M2.5Extremely cost-effective for function filling, simple scripts, data transformations
Search & project understandingAny low-cost model-Focus is on integrating search MCP and language server calls for real-time info and code semantics; model capability is secondary
Multilingual technical writingGemini / GPT / Claude-Styles differ: Gemini is rigorous, GPT fluent, Claude structured — choose as needed
Chinese technical writingClaude Opus 4.6GLM-5Logical and expressive accuracy matters most
Development documentationClaude Opus 4.6-Must clearly and structurally convey complex technical decisions; Opus leads in logic

How I Think About Each Model

After working with these models for months, here’s how I think of each one:

  • Claude Opus 4.6: An experienced, meticulous senior engineer. Delivers high-quality output and often proposes unexpectedly insightful technical choices, engineering trade-offs, and designs in discussions. Still requires end-to-end validation.
  • GLM-5: Competent but inconsistent. Mostly reliable and handles composite tasks, but occasionally glitches out.
  • GLM-4.7: A junior GLM-5, less polished.
  • Kimi K2.5 / MiniMax M2.5: Extremely cheap “interns.” Reliable only for very well-defined, narrow-scope, deterministic tasks. MiniMax tends to work faster but rougher.
  • Claude Sonnet / Haiku: Comparable to average North American programmers. In the Chinese market, many cost-effective alternatives outperform them — generally not worth using.

Cloud Service Procurement

Balance cost, stability, and performance. Match purchasing to task needs, ensure services are reliable with enough redundancy to avoid disruptions.

ServiceMonthly FeeNotes
Zhipu Coding Plan Pro (a Chinese AI model provider)¥499Provides GLM-5 and others but SLA guarantees are weak; occasional failures
Volcano Ark Coding Plan (ByteDance’s model platform)¥200Model quality average, slow updates; strengths lie in large quotas and low latency — great for bulk low-cost tasks
GitHub Copilot Pro+$39High quota, but token consumption triples when paired with Claude Opus 4.6; costs need strict accounting
OpenCode ZenPay-as-you-goMulti-model aggregation service with deep OpenCode ecosystem integration, unified management and scheduling
PPIO (a Chinese GPU cloud provider)Pay-as-you-goOffers GLM-5 service, good latency and stability; backup when Zhipu is unstable

Conclusion

Deep development work requires combination strategies, not single tools. The key is breaking workflows into clear use cases, then matching suitable models and services. Efficiency gains come not from chasing the strongest model but from continuously decomposing and reorganizing problem structure.

You don’t have to get everything perfect at once. Start simple: pick a familiar conversation platform, pair it with code tools that deliver clear outcomes, and separate your usage scenarios. Once accustomed to this triage-like process, you’ll sense each tool’s strengths and limits in context. From there, based on real pain points and budget, gradually build a rational division of labor. Keep in mind that switching across multiple models and tools carries learning curves and transition costs — pace yourself.