· 6 min read ·

LLM Fundamentals: Part 0 -- What Are LLMs?

Large language models are prediction machines built on transformers. This article outlines what they are, how developers use them, and where agent workflows fit in.

ai llm-fundamentals

About This Series

LLM Fundamentals is a series that walks through the core concepts behind large language models, from tokens to agent loops. Each post builds on the last:

  1. What Are LLMs? (this post) — the big picture before the deep dive
  2. Tokens — what LLMs actually process
  3. Text Generation — how models pick the next token, and what temperature really controls
  4. Context Windows — working memory, context rot, and why more isn’t always better
  5. Messages API — roles, statelessness, and conversation structure
  6. Prompt Engineering vs Harness Engineering — where the prompt ends and the system begins
  7. Extended Thinking — chain of thought as an API feature, not a prompting hack
  8. Structured Output — guaranteed schemas for machine-readable responses
  9. Tool Use — giving models the ability to act
  10. Agentic Loop — where everything from posts 0-8 converges
  11. From Loop to Agent — scaling to production with caching, model routing, and SDKs

Posts 0 through 3 are provider-agnostic, sourced from academic research and open documentation. Posts 4 through 10 use Anthropic’s API for examples because that is what I build with. Concepts map to other providers, but implementation details differ.

A Prediction Machine

Large language models are neural networks trained on massive text datasets to do one thing: predict the next word. More precisely, they predict the next token, a concept I will break down in Part 1. But the high-level idea is straightforward. Given a sequence of text, an LLM produces a probability distribution over what comes next, picks a token, appends it, and repeats.

Vaswani et al. introduced the transformer architecture in 2017, and that paper changed everything. Before transformers, neural networks for language processed text sequentially, one word at a time, losing track of earlier context as sequences grew longer. Transformers introduced a mechanism called self-attention that lets the model consider all positions in a sequence simultaneously, weighing which parts of the input matter most for predicting each output. Every major LLM today, from GPT-4 to Claude to Llama, builds on this architecture.

“Large” refers to parameter count. Parameters are the learned numerical values inside the network that determine how it processes input. GPT-2 had 1.5 billion. GPT-3 scaled to 175 billion. Current frontier models have not disclosed exact counts, but estimates put them well beyond that. More parameters generally mean more capacity to encode patterns from training data, though the relationship between size and capability is not linear.

From Research to Developer Tool

For years after the transformer paper, LLMs remained research artifacts. You could read the papers, maybe download a smaller model, but integrating one into a product required serious ML infrastructure. Most developers never touched them directly.

APIs changed that. When OpenAI released the GPT-3 API in 2020, any developer with an API key could send text in and get completions back. No GPU clusters, no model hosting, no training pipeline. Just an HTTP request and a JSON response. Anthropic’s Claude API and other providers followed, each offering models with different strengths and trade-offs.

I started building with these APIs about two years ago, and what struck me was how quickly the mental model shifts. You stop thinking about LLMs as exotic AI and start thinking about them as a text-processing layer in your stack, something you call between your application logic and your user interface, or between one data format and another.

What Developers Actually Build

Most production LLM usage falls into a handful of patterns:

Classification and routing. Given an input, assign it to a category. Support ticket triage, content moderation, intent detection in conversational interfaces. LLMs handle this well because they can classify based on semantic meaning rather than keyword matching, catching edge cases that rule-based systems miss.

Extraction and transformation. Pull structured data from unstructured text. Extract names, dates, and dollar amounts from contracts. Convert natural language queries into SQL. Transform data between formats. Structured output features make this reliable enough for production pipelines where you need guaranteed JSON schemas, not best-effort prose.

Summarization and synthesis. Condense long documents, generate meeting notes, produce briefings from multiple sources. This was one of the earliest practical applications and it remains one of the most common.

Code generation and assistance. From autocomplete suggestions to full function implementations to debugging help. I use Claude for code daily, and the workflow feels less like “AI writing code for me” and more like pair programming with someone who has read a lot of documentation.

Conversational interfaces. Customer support bots, internal knowledge assistants, interactive documentation. Natural language as an interface layer on top of existing systems and data.

Content creation. Drafting, editing, rephrasing, translating. Not “press a button and get a blog post” but rather using the model as a writing tool that can generate first drafts, suggest alternatives, or adapt tone for different audiences.

Each of these patterns works because LLMs can process natural language with context awareness that previous approaches could not match. A rules-based system can extract a date from a string if the format is predictable. An LLM can extract a date from “let’s circle back after the holiday weekend” because it understands what that phrase means in context.

Agent Workflows

Here is where things get interesting, and where this series is ultimately headed.

All of the patterns above treat the LLM as a single-turn tool: send input, get output, done. Agent workflows go further. An LLM receives a goal, breaks it into steps, decides which tools to call (search APIs, databases, code interpreters, file systems), interprets the results, and loops until the task is complete.

Anthropic’s documentation on building agents describes the core loop: the model reasons about what to do next, selects a tool, processes the result, and decides whether to continue or stop. Each iteration through this loop is a full LLM call, with the accumulated context of every previous step informing the next decision.

Each iteration through this loop is a full LLM call, with tokens driving cost, context windows limiting how far the conversation can stretch, prompt engineering shaping tool selection, and structured output making results parseable. Agent workflows surface every mechanical detail that single-turn applications let you ignore.

What Comes Next

Part 1 covers tokens: the actual unit that LLMs process. Not words, not characters, but subword chunks produced by an algorithm called Byte Pair Encoding. Tokens determine your costs, your context limits, and how the model “sees” your input. Everything in this series measures in tokens, so that is where we start.