Building an Agent Harness from Scratch

16 April 2026

Note: This post was written by Claude, directed and proofread by me. I told it what to write, corrected what it got wrong, and shaped every section.

I've been using Claude Code every day for months. I know how to use it. But I didn't really know how it worked — not at the code level.

So I built a mini version of it from scratch. I called it Zen Code — the command is zen, same idea as how Claude Code's command is claude. TypeScript, bun, and only one external dependency: the Anthropic SDK. No frameworks, no abstractions. Just the raw mechanics of an AI agent in a terminal.

This post is about what I learned. Specifically: what an agent harness is, why it matters, and how to build one.

What is an Agent Harness?

An agent harness is the scaffolding that turns a language model into an agent.

A language model on its own just takes text in and returns text out. It's stateless. It doesn't remember your previous message, can't read files, can't run commands, and stops when it finishes a response. That's it.

An agent harness is the code around the model that adds:

Memory — a conversation history that gets passed on every API call
Tools — functions the model can call (read a file, run a command, write code)
A loop — the mechanism that keeps calling the model until it's done

Claude Code is an agent harness. Cursor's agent mode is an agent harness. When you type a prompt and the AI goes off and edits five files, runs a build, fixes a lint error, and commits the result — that's an agent harness at work.

Understanding how one works changes how you use them.

The Agentic Loop

The heart of every AI agent is the agentic loop. One field in the API response drives the entire thing: stop_reason.

When you call the Anthropic API, the response always has a stop_reason. Two values matter:

"end_turn" — the model is done. Print the response and wait for the next user message.
"tool_use" — the model wants to call a tool. Run it, send the result back, and call the API again.

That's the loop. Keep going until stop_reason is "end_turn".

async function runAgentLoop(userInput: string): Promise<void> {
  messages.push({ role: 'user', content: userInput }); // user message

  // infinite loop
  while (true) {
    // model api call
    const response = await client.messages.create({
      system: SYSTEM_PROMPT,
      tools: toolDefinitions,
      messages,
    });

    // the raw content array
    messages.push({ role: 'assistant', content: response.content });

    // Model is done — print response and return to REPL
    if (response.stop_reason !== 'tool_use') {
      const text = response.content
        .filter((b) => b.type === 'text')
        .map((b) => b.text)
        .join('');
      console.log(`\n⏺ ${text}\n`); // agent response
      return;
    }

    // Model wants tools — execute them, collect results, loop it
    const toolResults = await executeTools(response.content);
    messages.push({ role: 'user', content: toolResults });
  }
}

Tools: How the Model Calls Your Code

The model can't call your functions directly. It requests them by name.

When you define tools, you're writing JSON Schema descriptions — name, description, and an input_schema. The model reads these descriptions to decide which tool to call and when.

The description is the most important part. A vague description leads to wrong tool usage.

{
  name: "edit_file",
  description: "Replace an exact string in a file. old_string must match exactly once.",
  input_schema: {
    type: "object",
    properties: {
      path: { type: "string", description: "Path to the file." },
      old_string: { type: "string", description: "Exact text to replace." },
      new_string: { type: "string", description: "Replacement text." },
    },
    required: ["path", "old_string", "new_string"],
  },
},

When the model wants to use a tool, it returns a block with the tool name and inputs.

Your code matches on the name, runs the function, and sends the result back with the same id so the API can pair them up.

One important rule: tool executors should never throw. Return errors as strings instead — if a tool throws, the loop crashes.

If it returns an error string, the model sees what went wrong and tries a different approach.

Permission Mode

Before executing any tool, the agent can pause and ask for approval. This is how Claude Code's manual mode works.

The key detail: when the user denies a tool, you don't throw or break. You push a tool_result saying it was denied. The model sees that message, explains what happened, and tries a different approach. The loop continues.

Two modes:

manual — asks before every tool (default, good for learning)
auto — executes without asking (fast, good when you trust the model)

Switch mid-session with /auto or /manual.

The System Prompt

The system prompt shapes the model's behavior — what it is, what tools it has, how it should act. It's passed as the system field on every API call. Keep it short — it adds to your input token count on every single call.

Putting It Together: Zen Code

The final structure of Zen Code:

src/
  index.ts        — REPL + agentic loop
  tools.ts        — tool definitions (JSON Schema)
  executor.ts     — tool implementations (read_file, edit_file, bash, etc.)
  system-prompt.ts — system prompt
  loader.ts       — terminal spinner

Six tools:

Tool	What it does
`read_file`	Read a file's contents
`list_files`	List files matching a glob pattern
`edit_file`	Replace an exact string in a file
`diff`	Preview what an edit would change — without applying it
`bash`	Run a raw shell command
`run`	Execute a script file with bun

What This Taught Me

stop_reason is the heartbeat. Every AI agent, every agentic IDE, every autonomous coding tool — they all have a version of this loop. The loop only continues because of one field.

Tool descriptions are model prompts. The description is how the model decides when and how to call a tool. Write them like you're telling a smart junior dev exactly when to use a function and what to watch out for.

Manual mode is how you learn. The model doesn't remember your last message — you pass the full conversation history on every call. Running Zen Code in manual mode, approving every tool call one by one, makes that concrete.

I learned more about how Claude Code works in one hour of building this than in months of using it.

The model is the intelligence. The harness is what makes it an agent.

The full code is on GitHub. If you want to understand how Claude Code (or any agentic tool) works, the best way is to build a small one yourself. It's not that much code. But it changes how you think about what's happening when you hand off a task to an agent.

If you have questions, find me on X.