stop_reason is the control flowa while loop, an API call, a tool dispatcher: ten lines that depend on Posts 0-8STOP REASONScheck every responsesilent truncation cascadesend_turn → return answertool_use → execute, loopmax_tokens → continue or raisepause_turn → resend to continuestop_sequence → custom breakrefusal → safety, surface itTHE LOOPwhile tool_useno framework requiredwhile response.stop_reason == "tool_use": for block in content: if block.type == "tool_use": result = run(block.input) append(tool_result) response = create(messages)LLM FUNDAMENTALS · PART 9
· 8 min read ·

LLM Fundamentals: Part 9 -- The Agentic Loop

ai llm-fundamentals

This is Part 9 of the LLM Fundamentals series.

In Post 8, I covered how Claude calls tools: you define schemas, the model emits structured tool_use blocks, your code executes the operation, and you send the result back. A single tool call is a request-response pair. But what happens when the task requires five tool calls in sequence, each informed by the results of the last, with the model deciding at every step whether to continue or stop?

You build a loop. And that loop is the bridge between “calling an API” and “building an agent.”

Ten Lines That Change Everything

Anthropic’s documentation describes the agentic loop as a while loop keyed on stop_reason. While the model wants to use tools, you execute them and keep going. When it stops asking for tools, you are done.

import anthropic
client = anthropic.Anthropic()
messages = [{"role": "user", "content": user_query}]
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=4096,
tools=tools,
messages=messages,
)
while response.stop_reason == "tool_use":
tool_results = []
for block in response.content:
if block.type == "tool_use":
result = execute_tool(block.name, block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": result,
})
messages.append({"role": "assistant", "content": response.content})
messages.append({"role": "user", "content": tool_results})
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=4096,
tools=tools,
messages=messages,
)
# response now contains the final answer
print(response.content[0].text)

That is the entire pattern. Send a request. If Claude wants to call tools, execute them, append the results, and send another request. Repeat until the model produces a final answer. No framework required.

Five-node horizontal flow showing the agentic loop. User query feeds into messages.create with tools, which produces a stop_reason branch. The tool_use branch (amber) flows down to execute_tool then back through an append-and-recall step that loops back to messages.create. The end_turn branch (emerald) exits to the final answer. Caption: amber is the loop edge, emerald is the exit, zinc is your code.

Where Posts 0-8 Converge

Reading this loop in isolation, it looks trivial. Writing while stop_reason == "tool_use" is not the hard part. Making each iteration reliable, affordable, and correct is where everything from the first eight posts becomes essential.

Tokens are cost per iteration. Every pass through the loop is a full API call. From Post 1: input tokens include the entire conversation history, tool definitions, and all prior tool results. Output tokens include the model’s reasoning and its next tool call. A 10-step agent loop does not cost 10x a single call because context accumulates, so step 10 re-processes everything from steps 1 through 9. Token awareness stops being academic and starts being financial.

Context windows limit loop duration. From Post 3: every tool result appended to the conversation eats into the context budget. I have watched agent loops cross 200,000 tokens in under 20 steps when tool results return verbose JSON. Context rot degrades quality as the window fills, so an agent that runs for 30 steps may perform worse on step 30 than step 5 even though it has more information available. Managing what stays in context, pruning old tool results, summarizing intermediate state, is the difference between an agent that works and one that drifts.

Structured output makes tool calls parseable. From Post 7: when Claude emits a tool_use block, the input field conforms to the JSON schema you defined for that tool. Without schema enforcement, you would be parsing free-form text to figure out what function the model wants to call and with what arguments. Structured output turns that into a dictionary lookup.

Prompt engineering determines tool selection quality. From Post 5: tool descriptions are prompts. When Claude decides whether to call search_database or query_api, it is choosing based on those descriptions and the system prompt guiding its behavior. Vague tool descriptions produce wrong tool selections. Clear, specific descriptions with concrete examples produce reliable routing. Every prompting principle from Post 5 applies directly to how you write tool schemas.

Text generation is the engine under everything. From Post 2: each iteration through the loop is autoregressive token generation. Claude does not plan the entire tool sequence upfront. It generates one response at a time, decides to call a tool or stop, and only sees the result of that decision on the next iteration. Understanding this removes the temptation to expect global planning from what is fundamentally a local, one-step-at-a-time process.

Stop Reasons as Control Flow

Stop reasons are how you know what happened and what to do next. In a simple chatbot, you mostly see end_turn and ignore the rest. In an agentic loop, stop reasons become your control flow.

Stop ReasonWhat It MeansYour Action
end_turnClaude finished naturallyReturn the response to the user
tool_useClaude wants to call a toolExecute it, send the result back, loop
max_tokensResponse was truncatedRaise max_tokens or handle continuation
pause_turnServer-side tool loop hit its iteration limitRe-send the conversation to continue
stop_sequenceA custom stop sequence was hitInspect stop_sequence and decide whether to extend
refusalClaude declined to respond on safety groundsSurface to the user; do not retry the same prompt

Stop_reason has to be checked on every single API response. A response that looks complete but has stop_reason: "max_tokens" is silently truncated, which in an agent loop means a tool call might have been cut off mid-JSON. Catching that early prevents cascading failures downstream.

Error Handling: Return, Don’t Throw

Here is a mistake I made early: when a tool execution failed, I threw an exception and broke out of the loop. Claude never learned what went wrong, so it could not recover or try a different approach.

Better pattern: return the error as a tool result.

def execute_tool(name, tool_input):
try:
return tool_functions[name](**tool_input)
except Exception as e:
return f"Error: {e}"

When Claude receives an error message as a tool_result, it can reason about what went wrong and decide how to proceed. Maybe it retries with different arguments. Maybe it tries a different tool. Maybe it tells the user what happened. Returning errors as data keeps the model in the loop instead of silently killing the conversation.

This maps back to the stateless architecture from Post 4. Each API call is independent. If you break the loop on an error, you lose all accumulated context. If you return the error as a tool result, the model sees the full history and can make an informed decision.

What Makes This Hard

I said the loop is 10 lines of code. That is true. But production agentic loops need guardrails that the basic pattern does not include.

Iteration limits. Without a cap, a confused model can loop indefinitely, burning tokens and hitting rate limits. I add a max_iterations counter and break with a graceful message when it triggers.

Cost tracking. Each iteration reports usage in the response. Summing input_tokens and output_tokens across iterations gives you the real cost of an agent run, which is always higher than you expect.

Context pruning. After a certain number of iterations, I summarize or drop old tool results to stay within the context budget. Anthropic’s context editing feature helps here by letting you selectively trim the conversation history between requests.

Timeout handling. Some tools take seconds, others take minutes. An agent loop needs per-tool timeouts and a strategy for what to tell the model when a tool times out.

None of these are part of the core loop pattern, but all of them are necessary before you point this at real users. And every one of them leans on concepts from earlier in this series: token counting for cost tracking, context window management for pruning, structured output for reliable tool result parsing.

Where Every Concept Meets

Posts 0 through 8 each covered a concept in isolation: tokens, generation, context, the API, prompting, thinking, schemas, tool calling. Each one is useful on its own. The agentic loop is where they all do work simultaneously.

An agent that searches a database, processes results, calls an API, interprets the response, and writes a summary is exercising every concept simultaneously. Tokens determine what it costs. Context determines how far it can go. Structured output determines whether tool calls parse. Prompting determines whether it picks the right tool. Extended thinking from Post 6 can improve decision quality at each step, at the cost of more output tokens per iteration.

Understanding the agentic loop as a mechanical pattern, not a magic capability, is the entire point of this series. It is a while loop. Everything else is engineering.

Next up in Post 10: scaling from loop to agent. That post covers prompt caching for cost reduction across iterations, model routing to match capability to task complexity, and how the Claude Agent SDK wraps this loop with production-ready defaults.