Hermes Agent Architecture Explained

This written version of the video tutorial was generated by an LLM from the video transcript, and supervised by me, Alejandro.

Hermes is an always-on AI agent. In this tutorial, we’ll look at its architecture from the top down: the core agent loop, how context is assembled, how memory works, how gateways connect Hermes to messaging platforms, and how cron jobs let it run scheduled tasks.

The goal is not to inspect every implementation detail. The goal is to understand the shape of the system so you can use Hermes more effectively and borrow the same patterns when you build your own agents.

High-Level Architecture

At the center of Hermes is the AI agent core. This is the part that receives messages, builds context, calls the language model, executes tools, and returns a response.

There are several ways to connect to that core:

CLI: the direct command-line interface you use when you run Hermes from a terminal.
Gateway: a long-running process that connects Hermes to messaging services like Telegram, email, Slack, Discord, SMS, or WhatsApp.
API: another integration point for connecting external systems to the agent.

Around the agent core, Hermes also has tools, skills, memory, prompt files, and session storage. These are the pieces that make it more than a simple chat wrapper.

The Agent Loop

The Hermes loop is straightforward:

The user sends a message.
Hermes builds the context for the request.
The context and message history are sent to the LLM.
The LLM may decide to call tools.
Tool results are sent back to the LLM.
The LLM produces a final response.
Hermes updates memory when there is something useful to remember.

That last step is important. Hermes is designed to improve as you use it. After an interaction, it can analyze what happened and decide whether new information should be written into memory for future turns.

Context Construction

Context is the material Hermes sends to the language model on each turn. In Hermes, the context is intentionally minimal and file-based.

The most important files are:

soul.md: the personality and behavior instructions for the agent.
user.md: information Hermes has learned about the user.
memory.md: arbitrary durable notes, workflows, tool usage details, and facts learned during conversations.

The soul.md file is similar to a system prompt. It describes what the agent is, what tone it should use, what goals it should optimize for, and how it should behave. When Hermes is first installed, this may be empty or fall back to a default prompt.

The user.md file is different. Hermes can update it as it learns things about you. For example, if you tell Hermes that you are a software engineer working on a specific project, that can become part of future context.

The memory.md file is broader. It is not only about you. It can contain workflows, facts, useful tool instructions, or other long-lived information that the agent should remember.

In addition to those files, Hermes appends recent message history, tool descriptions, skill descriptions, and, when configured, relevant external memory.

Context Compression

Long-running agents eventually hit context limits. Hermes handles that with context compression.

By default, Hermes checks whether the conversation has reached a configured threshold of the model context window. The default threshold is around 50%, although it can be adjusted when using smaller models or models with smaller context windows.

When the threshold is reached, Hermes summarizes the earlier conversation and replaces older messages with that summary. The summary keeps the useful state:

the goal of the conversation
completed actions
active state
blockers
key decisions
resolved questions
relevant files
previous summaries
turns that still need to be incorporated

This is one of the most important patterns in practical agent architecture. The agent needs enough history to stay coherent, but it cannot keep every raw message forever.

The Gateway

The gateway is what lets Hermes talk through external messaging platforms.

For example, a gateway can connect Hermes to:

Telegram
Discord
email
SMS
WhatsApp
Slack

Each provider has its own integration model. Some use webhooks. Others are polled. Hermes does not have one universal gateway that magically works with everything by default; each gateway needs to be configured.

For example, with Telegram you can run:

hermes setup gateway

Then you configure the Telegram integration and the identity that should be treated as the “home” user for that gateway.

The gateway does more than receive messages. It also has to map external messages into the format Hermes expects, find the right session history, build the context, and send the message into the agent loop.

Session identifiers matter here. A Telegram session, for example, needs to be distinguishable from a Slack session or email thread. Hermes stores these sessions locally so it can continue the right conversation later.

Memory

Hermes memory has three main parts.

First, there are the Markdown memory files:

soul.md
user.md
memory.md

These are always part of the context after the system prompt.

Second, Hermes stores session history in SQLite. Every interaction can be associated with a session identifier. This is especially useful for gateways, because a Telegram conversation and an email thread need separate histories.

Third, Hermes can use external memory providers. Examples mentioned in the video include mem0 and SuperMemory. These systems specialize in storing and retrieving memories for agents.

External memory is optional. Most people do not need it at first. But when enabled, it lets Hermes retrieve relevant memories beyond the local Markdown files and SQLite session history.

Cron Jobs

Hermes also supports cron jobs, but its cron system is not the same as the operating system cron process.

Hermes has its own loop that checks scheduled jobs. This lets you ask the agent to do things like:

send an email every morning with AI news
post a daily update to Slack
send a weekly message to your boss
run recurring agent tasks

One implementation detail is worth noting: although documentation may describe cron jobs as being stored in SQLite, the analyzed implementation stores them as plain JSON under the cron directory.

That means Hermes checks a jobs.json file on an interval and executes the scheduled jobs it finds there.

Cron jobs also interact with gateways. A cron job does not automatically know where to send a message. If you want a scheduled task to notify you on Telegram, Discord, Slack, or another platform, the relevant gateway needs to be configured.

Why This Architecture Works

Hermes is interesting because the individual pieces are simple:

a loop
a context builder
memory files
SQLite sessions
optional external memory
gateways
scheduled jobs

The power comes from how those pieces fit together. The agent can be used from the terminal, reached through messaging apps, remember useful information, compress long conversations, and run background tasks.

That is the practical architecture of an always-on agent. You do not need a huge framework to understand it. You need a clean loop, a clear context model, reliable memory, and integration points that match how people actually communicate.

References

Hermes video
Hermes project link: coming soon
OpenClaw link: coming soon
mem0 link: coming soon
SuperMemory link: coming soon