Systems, Not Scripts

Agents that orchestrate agents.

There's a moment in every AI-native project where you hit the wall. The single-agent, single-prompt approach that got you 80% of the way there can't get you the last 20%. The context window fills up. The model loses track of the plan. The output quality degrades because you're asking one thing to do everything.

That wall is where the real engineering begins. And the answer isn't a better prompt. It's a better system.

The one-shot prompt is a dead end#

Let me be blunt: if your entire AI workflow is "paste a big prompt into a chat and hope for the best," you're leaving most of the value on the table.

One-shot prompts work for isolated tasks. "Write a function that does X." "Summarize this document." "Convert this data to JSON." These are stateless, context-free, and simple enough that a single call handles them.

But real work isn't like that. Real work has dependencies. Step three depends on the output of step two, which depends on a decision made in step one. Real work requires different capabilities at different stages. Real work needs error handling, retry logic, and fallback paths.

You can try to encode all of that into a single prompt. It's like writing an entire application in one function — technically possible, practically insane.

What a system looks like#

I want to be honest about this: most of my "multi-agent systems" are not automated pipelines. They're me in a terminal, passing output between steps, using different models for different parts of the work.

Here's how adding a new artifact to datagobes.dev actually works:

I describe what I want and let Claude Code build it. I might use Sonnet for a first draft because it's fast and creative.
I open the result in a browser and test it myself. Does it work on mobile? Is it accessible? Does it feel right?
If something's off, I paste the issue back and iterate. Sometimes I switch to Opus for a tricky bug.
Once it looks good, I ask Claude to add the meta tags and verify the build passes.

That's four steps with two or three different models, but it's not some automated orchestration engine. It's a workflow — a consistent sequence of steps where I know what each stage needs and who (or what) should do it. The "system" is the discipline of doing it the same way each time, not a piece of software that manages it for me.

The important shift is going from "throw everything at one long conversation" to "break the work into stages with clear handoffs." Whether those handoffs are automated or manual matters less than having them at all.

Composable > monolithic#

The power of systems over scripts is composability. Each agent is a building block. You can:

Swap models without changing the workflow. When Anthropic shipped Sonnet 4.6 in February, I upgraded one agent — the rest of the pipeline didn't notice.
Add stages without rewriting everything. Need a security audit step? Add an agent between review and integration. The rest of the pipeline doesn't change.
Reuse agents across different workflows. The same review agent that checks artifacts can check landing pages. The same integration agent works for both.
Debug in isolation. When something goes wrong, you know which agent produced the bad output. You can inspect its inputs, its reasoning, and its outputs independently.

This is the same principle that made Unix pipes, microservices, and React components successful: small, focused units composed into complex behavior.

When to actually automate#

Most people don't need LangGraph or CrewAI yet. Those frameworks exist for a reason — complex multi-step workflows with branching logic, error recovery, and parallel execution. But if your "multi-agent system" is three steps that you run once a day, a shell script or a CLAUDE.md with clear instructions is plenty.

The frameworks start earning their keep when:

You're running the same workflow dozens of times and manual handoffs become the bottleneck
The workflow has branching logic — different paths depending on intermediate results
Multiple steps can run in parallel and you're wasting time waiting
You need reliability — retry logic, error handling, checkpointing

For everything else, "I copy the output from step one and paste it into step two" is a perfectly valid multi-agent architecture. Seriously. The value is in the decomposition, not the automation. Automate later, when the manual version is proven and the repetition justifies it.

The real orchestration layer#

Here's something I've found: the best "orchestration layer" for most solo developers is a good CLAUDE.md file and consistent habits.

My CLAUDE.md encodes how I work — which models to use for what, what patterns to follow, what to avoid. It's not a state machine. It's a document that keeps the agent aligned with my preferences across sessions. Combined with custom skills (reusable prompt-and-tool bundles for specific tasks), it gives me most of the benefits of formal orchestration without any of the infrastructure.

The formal orchestration tools — LangGraph, CrewAI, Google ADK — become relevant when you're building products where agents run without you watching. Customer-facing agents, automated pipelines, scheduled workflows. That's real orchestration. What most of us do day-to-day is closer to "structured collaboration with AI tools."

Custom skills: the practical middle ground#

One pattern that's delivered a lot of value without any framework overhead is custom skills — reusable prompt-and-tool bundles for specific tasks. Claude Code supports these natively, and MCP servers extend what they can access.

For example, I have a skill for running privacy audits on websites. It includes the scanning methodology, the output format, the visual theme for the report, and access to the tools it needs. When I type /privacy-scan https://example.com, all of that is bundled up and executed. It's not a conversation — it's a repeatable capability.

The difference between a skill and a prompt is the difference between a recipe and "make something with chicken." Both work, but one produces consistent results.

Why repeatable workflows beat long conversations#

Conversations are great for exploration. But they accumulate context that dilutes focus, they're impossible to reproduce exactly, and if the session crashes you lose everything.

Even a simple checklist — "step 1: generate, step 2: review, step 3: integrate, step 4: test" — is a better workflow than a single long conversation trying to do all four at once. The context stays focused at each step, you can use different models for different steps, and you can restart any step independently if it fails.

You don't need a framework to think in systems. You need the habit of breaking work into stages with clear handoffs.

Getting started#

You don't need a fancy framework to think in systems. Start with:

Identify the stages in your current workflow. What do you do first, second, third?
Separate the concerns. Which stages need different capabilities?
Define the interfaces. What does each stage produce? What does the next stage consume?
Automate one handoff. Pick the most mechanical transition and let an agent handle it.
Iterate. Add agents, refine prompts, adjust the workflow as you learn what works.

The goal isn't to build a complex orchestration system on day one. The goal is to start thinking in systems — recognizing that the structure of your workflow is as important as the quality of any individual prompt.

Multi-agent architectures, custom skills, structured workflows. Composable pieces over one-shot prompts.