Deep dive

How to Build AI Agents with Claude: A Practical Guide

July 202612 min readCode examples in TypeScript

"AI agent" is the most inflated term in software right now — it gets applied to everything from a cron job that calls an LLM to systems that genuinely plan, act, and correct themselves. This guide uses the working definition Anthropic uses: an agent is a model using tools in a loop, where the model itself decides what to do next based on what happened last.

That definition is worth internalizing because each part carries a design decision: which model, which tools, and what controls the loop. Get those three right and agents are surprisingly simple. Get them wrong and you have an expensive random walk. Let's build one properly.

First: do you actually need an agent?

The most useful advice in Anthropic's agent-building guidance is negative: don't build an agent where a workflow will do. If your task has a known sequence of steps — summarize, then classify, then route — hard-code the sequence and make one LLM call per step. That's a workflow: cheaper, faster, testable, and predictable.

Agents earn their complexity when the path can't be known in advance: debugging a failing test (each finding changes the next step), researching an open-ended question, operating a browser, fixing a customer issue that could be one of forty things. The signature is branching: if step 3 depends on what step 2 discovered, you want an agent. If not, you want a pipeline.

The agent loop

Every agent — Claude Code included — is architecturally this:

the loop, in pseudocode

messages = [system_prompt, user_task]

while true:
    response = claude(messages, tools)

    if response has tool_calls:
        results = execute(tool_calls)   # your code runs here
        messages += [response, results] # feed back what happened
    else:
        return response                 # done — Claude answered

Claude sees the task, decides whether it needs a tool, your code executes the tool and returns the result, and Claude decides what that result means for the next step. All the intelligence lives in the model; all the capability lives in the tools you hand it. The loop is dumb glue — and it should stay that way.

Tools: the part you actually design

A tool is a function you describe to Claude in JSON Schema: name, description, parameters. Claude never executes anything itself — it emits a structured request, your code runs the function, and returns the result. Here's the minimal real thing with the Anthropic SDK:

agent.ts — a minimal working agent

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const tools = [
  {
    name: "get_order",
    description:
      "Look up an order by ID. Returns status, items, and shipping info.",
    input_schema: {
      type: "object" as const,
      properties: { order_id: { type: "string" } },
      required: ["order_id"],
    },
  },
];

async function runAgent(task: string) {
  const messages: Anthropic.MessageParam[] = [{ role: "user", content: task }];

  while (true) {
    const res = await client.messages.create({
      model: "claude-sonnet-5",
      max_tokens: 4096,
      system: "You are a support agent. Use tools to check facts; never guess order state.",
      tools,
      messages,
    });

    if (res.stop_reason !== "tool_use") {
      return res.content; // final answer
    }

    messages.push({ role: "assistant", content: res.content });
    const results = [];
    for (const block of res.content) {
      if (block.type === "tool_use") {
        const output = await getOrder(block.input); // your real function
        results.push({
          type: "tool_result" as const,
          tool_use_id: block.id,
          content: JSON.stringify(output),
        });
      }
    }
    messages.push({ role: "user", content: results });
  }
}

That's a complete agent. Everything else — memory, planning, multi-agent orchestration — is elaboration on this loop. Three design rules that matter more than any framework:

Tool descriptions are prompts. Claude chooses tools based on the description. "Look up an order by ID" with the failure modes documented beats a bare function signature every time.
Return errors as information, not exceptions. When a tool fails, send the error text back into the loop — a good agent reads "404: order not found" and changes strategy. That self-correction is the whole point of the architecture.
Fewer, better tools. Ten crisp tools outperform forty overlapping ones. If two tools confuse you, they confuse the model.

MCP: tools you don't have to build

The Model Context Protocol (MCP) is an open standard for exactly the interface above — a server exposes tools, any MCP-capable client (Claude Code, the Claude apps, your own agent) can use them. Before writing a custom tool for GitHub, Postgres, Slack, or a headless browser, check whether an MCP server already exists; hundreds do. We keep a curated list in our guide to Claude MCP servers.

The fastest path: don't build the loop at all

If your agent's job touches code, files, or shell commands, the pragmatic move in 2026 is to build on top of Claude Code rather than from scratch. It already ships the hardened loop: permissions, sandboxing, context management, subagents, MCP support. You customize it with three primitives:

CLAUDE.md — persistent instructions the agent reads every session (your domain knowledge, conventions, guardrails).
Skills — folders teaching repeatable capabilities, loaded on demand.
Subagents — scoped workers with their own context windows and tool allowlists, for fan-out work like "review these 12 files in parallel."

The same machinery is available programmatically as the Claude Agent SDK — the engine of Claude Code as a library, loop and permissions included, for agents that need to run headless or inside your product.

Reliability: the unglamorous 80%

Getting an agent to work once is a weekend. Getting it dependable is the actual job:

Stop conditions. Cap iterations, tokens, and wall-clock time. An agent that can loop can loop forever.
Permissions by blast radius. Reads can be automatic; writes should be gated; deletes and payments should require a human. Claude Code's permission model is a good template to copy.
Evals over vibes. Collect 20–50 real tasks with known-good outcomes and run them on every change to your prompt or tools. Agent behavior shifts in non-obvious ways; regression tests catch what demos hide.
Log the full trajectory. When an agent fails, the failure is usually three steps before the visible error. Persist every message and tool result; you cannot debug what you didn't record.
Prompt injection is real. Anything your agent reads — web pages, emails, file contents — can contain instructions aimed at it. Treat retrieved content as data, never as commands, and gate side effects accordingly.

Want the pre-built version? The ClaudeThings kits are exactly this philosophy productized: 89 specialized agents, 103 skills, and 181 commands for engineering and marketing work, installed into Claude Code with one command. See the kits →

FAQ

Which Claude model should power an agent? +

Start with the most capable model and make it work; optimize cost later. Agent errors compound across loop iterations, so model quality pays off non-linearly. A common production pattern: flagship model for planning and recovery, a faster model for mechanical substeps.

Do I need LangChain or a framework? +

No. The loop is ~40 lines, and owning it means you understand every failure. Frameworks earn their place when you need their specific infrastructure (tracing, deployment, team conventions) — adopt one for those reasons, not because agents seem hard.

When do I need multiple agents? +

Later than you think. Multi-agent systems help when contexts genuinely must be separate (a researcher fanning out to parallel readers) or when tool allowlists should differ by role. Most tasks that look multi-agent are one good agent with better tools.

How do agents remember things between sessions? +

The model doesn't — your system does. Give the agent a memory tool (a file or database it reads and writes), or persist notes like Claude Code's CLAUDE.md. Design what gets remembered deliberately; append-only memory dumps degrade fast.

How to Build AI Agents with Claude: A Practical Guide

First: do you actually need an agent?

The agent loop

Tools: the part you actually design

MCP: tools you don't have to build

The fastest path: don't build the loop at all

Reliability: the unglamorous 80%

FAQ

Keep reading

Getting Started with Claude Code

10 Prompting Techniques for Claude

The Best MCP Servers for Claude