This written version of the video tutorial was generated by an LLM from the video transcript, and supervised by me, Alejandro.
Pi is a minimalist coding agent, but the architecture behind it is a very useful blueprint for understanding how modern terminal-based agents work.
In this tutorial, we’ll walk through Pi from the inside out: the core agent loop, how sessions and context are assembled, how tools are exposed, how extensions and skills plug in, and how the interactive terminal UI adds the day-to-day user experience on top of the core agent runtime.
The Two Main Layers of Pi
A good way to understand Pi is to split it into two layers.
The first layer is the agent core. This is the part that runs the actual agentic loop: initialize context, call the model, parse the response, execute tools when needed, and keep iterating until the task is complete.
The second layer is Pi Interactive. This is the terminal user interface and user-facing workflow around that core: chat input, session management, compaction, skills, commands, and the interface you use when you run Pi from your terminal.
That separation is important because the core does not have to be tied to the TUI. The same agent core can also be called programmatically or through an SDK-style integration.
The Core Agent Loop
At the center of Pi is the agent loop. Every interaction follows the same general pattern:
- Initialize the context.
- Send the current state to the model.
- Receive either a final answer or tool calls.
- Execute tool calls.
- Append the results back into the conversation.
- Repeat until the model finishes.
This loop is simple, but it is the foundation of almost every coding agent.
The interesting part is not that there is a loop. The interesting part is what gets placed into the context before the model is called, and how the tool results are fed back into the conversation.
Context Initialization
When a Pi session starts, the agent has to assemble the context that the model will see.
That context typically includes:
- The base system prompt
- Project-specific instructions
- Available tools
- Session history
- User messages
- Any relevant skill or extension instructions
Pi keeps the base system prompt intentionally small. You can customize behavior with project-level instructions, but the default design stays minimal. That makes it easier to understand what the agent is doing and where behavior is coming from.
This is one of the reasons Pi is a great educational project: the architecture is compact enough to reason about, but still contains the important pieces you need in a real coding agent.
Sessions and Conversation State
Sessions are how Pi keeps continuity across turns.
A session stores the conversation history, tool results, and the state needed to continue the task. When the agent receives a new message, it does not start from scratch. It reloads the relevant session state, rebuilds the context, and continues the loop.
This matters because coding tasks often require multiple steps:
- Read files
- Understand project structure
- Edit code
- Run commands
- Inspect errors
- Iterate on the fix
Without session state, every turn would lose important information. With session state, the agent can behave more like a persistent collaborator.
Tools: How the Agent Acts on the World
Tools are the bridge between the model and the environment.
The model itself can reason and produce text, but it cannot directly read files, run commands, or edit code. Pi exposes those capabilities as tools. The model can request a tool call, Pi executes it, and the result is added back to the conversation.
Typical coding-agent tools include:
- Reading files
- Running shell commands
- Editing files
- Writing new files
- Searching the repository
The important design detail is that tools are not just random functions. They are part of the model’s prompt and schema. The model needs to know what each tool does, what inputs it accepts, and what kind of result to expect.
Once you understand tools, the agent loop becomes much clearer: the model plans the next step, Pi executes the requested action, and the model uses the result to decide what to do next.
Extensions
Extensions let Pi add capabilities without hardcoding everything into the core.
This is an important architectural choice. If every feature lived directly in the core agent, the system would become harder to maintain and harder to customize. With extensions, Pi can keep the core small while allowing extra behavior to be plugged in when needed.
An extension can contribute things like:
- Additional tools
- Custom commands
- Modified prompts
- Project-specific integrations
- Extra workflow behavior
This pattern is useful if you are designing your own agent. Keep the core loop small, then create extension points for everything that should be optional or customizable.
System Prompts and Project Instructions
The system prompt defines the agent’s baseline behavior: how it should communicate, how it should use tools, and what rules it should follow.
Pi’s default prompt is intentionally minimal. More specific behavior can come from local project instructions. This lets the same agent adapt to different repositories without changing the global implementation.
For example, one project might tell the agent to use uv for Python commands, while another might define a specific release workflow or documentation style. Those project instructions become part of the context for that session.
The result is a layered prompt system:
- Base agent behavior
- Global user or environment instructions
- Project-specific instructions
- Skill or workflow-specific instructions
- The current user request
That layering is what lets Pi stay general while still behaving appropriately inside a specific codebase.
Pi Interactive: The Terminal UI Layer
On top of the core agent runtime, Pi provides an interactive terminal interface.
This is the layer most users experience directly. It handles the chat UI, input box, streaming output, commands, session selection, and other interface details.
Architecturally, the important point is that the TUI is not the whole agent. It is a user interface around the agent core. This separation makes the design cleaner because the core can remain focused on the agent loop while the TUI focuses on user experience.
Compaction
Long-running agent sessions eventually run into context limits. Compaction is the mechanism that keeps sessions useful without sending the entire raw history forever.
Instead of keeping every token from every previous turn, Pi can summarize or compress older conversation state into a more compact representation. The model still gets the important information, but the prompt remains within the available context window.
This is especially important for coding agents because a single task can involve many file reads, command outputs, and edits. Without compaction, useful sessions would become too large very quickly.
Good compaction keeps:
- The user’s goal
- Important decisions
- Files that were changed
- Current blockers
- Relevant commands and results
And it removes or compresses noisy details that are no longer needed.
Skills
Skills are reusable workflow instructions.
Instead of forcing the model to rediscover the same process every time, a skill can define a repeatable procedure: what files to read, what commands to run, what checks to perform, and where the approval gates are.
This is very useful for complex workflows. For example, a video release workflow might include producing a base edit, generating captions, adding overlays, rendering, uploading, writing a blog post, and creating social posts. A skill can encode that process so the agent follows it consistently.
The key idea is that skills are not just documentation for humans. They are operational instructions for the agent.
Why This Architecture Works
Pi works well because it is built around a small set of clear abstractions:
- A core loop that talks to the model
- A context builder that assembles instructions and history
- A tool system that lets the model act
- Sessions that preserve state
- Extensions that add optional capabilities
- A TUI that wraps the agent in a usable interface
- Compaction and skills for longer workflows
None of these pieces are overly complicated on their own. But together, they create a capable coding agent that is still easy to understand.
If you want to build your own agent, this is the architecture I would study first. Start with the loop, add tool execution, persist session state, then layer in customization through prompts, extensions, and skills.
