LLM Fundamentals: Part 9 -- The Agentic Loop | BDIGITAL

This is Part 9 of the LLM Fundamentals series.

In Post 8, I covered how Claude calls tools: you define schemas, the model emits structured tool_use blocks, your code executes the operation, and you send the result back. A single tool call is a request-response pair. But what happens when the task requires five tool calls in sequence, each informed by the results of the last, with the model deciding at every step whether to continue or stop?

You build a loop. And that loop is the bridge between “calling an API” and “building an agent.”

Ten Lines That Change Everything

Anthropic’s documentation describes the agentic loop as a while loop keyed on stop_reason. While the model wants to use tools, you execute them and keep going. When it stops asking for tools, you are done.

import anthropic

client = anthropic.Anthropic()
messages = [{"role": "user", "content": user_query}]

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=4096,
    tools=tools,
    messages=messages,
)

while response.stop_reason == "tool_use":
    tool_results = []
    for block in response.content:
        if block.type == "tool_use":
            result = execute_tool(block.name, block.input)
            tool_results.append({
                "type": "tool_result",
                "tool_use_id": block.id,
                "content": result,
            })

    messages.append({"role": "assistant", "content": response.content})
    messages.append({"role": "user", "content": tool_results})

    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=4096,
        tools=tools,
        messages=messages,
    )

# response now contains the final answer
print(response.content[0].text)

That is the entire pattern. Send a request. If Claude wants to call tools, execute them, append the results, and send another request. Repeat until the model produces a final answer. No framework required.

Where Posts 0-8 Converge

Reading this loop in isolation, it looks trivial. Writing while stop_reason == "tool_use" is not the hard part. Making each iteration reliable, affordable, and correct is where everything from the first eight posts becomes essential.

Tokens are cost per iteration. Every pass through the loop is a full API call. From Post 1: input tokens include the entire conversation history, tool definitions, and all prior tool results. Output tokens include the model’s reasoning and its next tool call. A 10-step agent loop does not cost 10x a single call because context accumulates, so step 10 re-processes everything from steps 1 through 9. Token awareness stops being academic and starts being financial.

Context windows limit loop duration. From Post 3: every tool result appended to the conversation eats into the context budget. I have watched agent loops cross 200,000 tokens in under 20 steps when tool results return verbose JSON. Context rot degrades quality as the window fills, so an agent that runs for 30 steps may perform worse on step 30 than step 5 even though it has more information available. Managing what stays in context, pruning old tool results, summarizing intermediate state, is the difference between an agent that works and one that drifts.

Structured output makes tool calls parseable. From Post 7: when Claude emits a tool_use block, the input field conforms to the JSON schema you defined for that tool. Without schema enforcement, you would be parsing free-form text to figure out what function the model wants to call and with what arguments. Structured output turns that into a dictionary lookup.

Prompt engineering determines tool selection quality. From Post 5: tool descriptions are prompts. When Claude decides whether to call search_database or query_api, it is choosing based on those descriptions and the system prompt guiding its behavior. Vague tool descriptions produce wrong tool selections. Clear, specific descriptions with concrete examples produce reliable routing. Every prompting principle from Post 5 applies directly to how you write tool schemas.

Text generation is the engine under everything. From Post 2: each iteration through the loop is autoregressive token generation. Claude does not plan the entire tool sequence upfront. It generates one response at a time, decides to call a tool or stop, and only sees the result of that decision on the next iteration. Understanding this removes the temptation to expect global planning from what is fundamentally a local, one-step-at-a-time process.

Stop Reasons as Control Flow

Stop reasons are how you know what happened and what to do next. In a simple chatbot, you mostly see end_turn and ignore the rest. In an agentic loop, stop reasons become your control flow.

Stop Reason	What It Means	Your Action
`end_turn`	Claude finished naturally	Return the response to the user
`tool_use`	Claude wants to call a tool	Execute it, send the result back, loop
`max_tokens`	Response was truncated	Raise `max_tokens` or handle continuation
`pause_turn`	Server-side tool loop hit its iteration limit	Re-send the conversation to continue
`stop_sequence`	A custom stop sequence was hit	Inspect `stop_sequence` and decide whether to extend
`refusal`	Claude declined to respond on safety grounds	Surface to the user; do not retry the same prompt

Stop_reason has to be checked on every single API response. A response that looks complete but has stop_reason: "max_tokens" is silently truncated, which in an agent loop means a tool call might have been cut off mid-JSON. Catching that early prevents cascading failures downstream.

Error Handling: Return, Don’t Throw

Here is a mistake I made early: when a tool execution failed, I threw an exception and broke out of the loop. Claude never learned what went wrong, so it could not recover or try a different approach.

Better pattern: return the error as a tool result.

def execute_tool(name, tool_input):
    try:
        return tool_functions[name](**tool_input)
    except Exception as e:
        return f"Error: {e}"

When Claude receives an error message as a tool_result, it can reason about what went wrong and decide how to proceed. Maybe it retries with different arguments. Maybe it tries a different tool. Maybe it tells the user what happened. Returning errors as data keeps the model in the loop instead of silently killing the conversation.

This maps back to the stateless architecture from Post 4. Each API call is independent. If you break the loop on an error, you lose all accumulated context. If you return the error as a tool result, the model sees the full history and can make an informed decision.

What Makes This Hard

I said the loop is 10 lines of code. That is true. But production agentic loops need guardrails that the basic pattern does not include.

Iteration limits. Without a cap, a confused model can loop indefinitely, burning tokens and hitting rate limits. I add a max_iterations counter and break with a graceful message when it triggers.

Cost tracking. Each iteration reports usage in the response. Summing input_tokens and output_tokens across iterations gives you the real cost of an agent run, which is always higher than you expect.

Context pruning. After a certain number of iterations, I summarize or drop old tool results to stay within the context budget. Anthropic’s context editing feature helps here by letting you selectively trim the conversation history between requests.

Timeout handling. Some tools take seconds, others take minutes. An agent loop needs per-tool timeouts and a strategy for what to tell the model when a tool times out.

None of these are part of the core loop pattern, but all of them are necessary before you point this at real users. And every one of them leans on concepts from earlier in this series: token counting for cost tracking, context window management for pruning, structured output for reliable tool result parsing.

Where Every Concept Meets

Posts 0 through 8 each covered a concept in isolation: tokens, generation, context, the API, prompting, thinking, schemas, tool calling. Each one is useful on its own. The agentic loop is where they all do work simultaneously.

An agent that searches a database, processes results, calls an API, interprets the response, and writes a summary is exercising every concept simultaneously. Tokens determine what it costs. Context determines how far it can go. Structured output determines whether tool calls parse. Prompting determines whether it picks the right tool. Extended thinking from Post 6 can improve decision quality at each step, at the cost of more output tokens per iteration.

Understanding the agentic loop as a mechanical pattern, not a magic capability, is the entire point of this series. It is a while loop. Everything else is engineering.

Next up in Post 10: scaling from loop to agent. That post covers prompt caching for cost reduction across iterations, model routing to match capability to task complexity, and how the Claude Agent SDK wraps this loop with production-ready defaults.