Understanding AI Agent Architecture Through Nanobot: A Deep Dive into ContextBuilder
Overview
OpenClaw reportedly contains around 400,000 lines of code, making direct reading and comprehension quite challenging. Therefore, this series uses Nanobot to learn OpenClaw's distinctive features.
Nanobot is an ultra-lightweight personal AI assistant framework open-sourced by the HKU Data Science Laboratory (HKUDS), positioned as an "Ultra-Lightweight OpenClaw." It's perfectly suited for learning Agent architecture.
Rich contextual information forms the foundation for effective Agent planning and action. An Agent requires access to various types of "context" during operation:
| Context Type | Examples | Storage Method |
|---|---|---|
| Conversation History | What the user just said | JSON / Database |
| Long-term Memory | User preferences, past summaries | Vector DB / Knowledge Graph / Text |
| External Knowledge | RAG-retrieved documents | Vector DB / API / Text |
| Tool Definitions | Callable function descriptions | Code / MCP Protocol / Text |
| Human Input | Annotations, corrections, reviews | Text / Forms |
| Temporary Drafts | Intermediate reasoning results | Memory / Temporary Files |
These elements differ in format, storage, and access methods. Without unified abstraction, integrating each new resource requires writing extensive glue code. How to store, select, compress, and fit these into the limited token window—this is what truly determines AI effectiveness.
The ContextBuilder class serves as Nanobot Agent's "contextual brain," integrating dispersed identity, memory, skills, and runtime information into standardized LLM-recognizable dialogue context. Its core value lies in shielding the complexity of context construction, providing the Agent with "out-of-the-box" complete dialogue context, serving as the central hub connecting Agent modules with the LLM.
Prompt System Architecture
OpenClaw's Approach
OpenClaw's prompt system consists of a set of Markdown files placed in the workspace directory, each serving a specific role. These injected Markdown files come from a set of .md files in the Workspace, each with unique functions and easy readability:
- AGENTS.md: Operation manual. How the Agent should think, when to use which tools, what safety rules to follow, and in what order to do things.
- SOUL.md: Personality and soul. Tone, boundaries, priorities. Want the Agent to be concise without extra suggestions? Write it here. Want a friendly assistant? Also write it here.
- USER.md: Your user profile. How to address you, your profession, your preferences. The Agent reads this file before every response.
- MEMORY.md: Long-term memory. Facts that must never be lost.
- YYYY-MM-DD.md: Daily logs. What happened today, which tasks are in progress, what you discussed. Tomorrow, the Agent opens yesterday's log and continues the context.
- BOOTSTRAP.md: First-run ritual (one-time, only for fresh workspaces), such as guided conversations.
- IDENTITY.md: Identity and atmosphere. A very short file, but it sets the overall tone.
- HEARTBEAT.md: Regular check清单."Check emails", "see if monitoring is running".
- TOOLS.md: Local tool hints. Where scripts are located, which commands are available. This way the Agent doesn't need to guess but knows exactly.
Nanobot's Implementation
Nanobot employs a similar Markdown file system:
BOOTSTRAP_FILES = ["AGENTS.md", "SOUL.md", "USER.md", "TOOLS.md", "IDENTITY.md"]SOUL.md content example:
# Soul
I am nanobot 🐈, a personal AI assistant.
## Personality
- Helpful and friendly
- Concise and to the point
- Curious and eager to learn
## Values
- Accuracy over speed
- User privacy and safety
- Transparency in actions
## Communication Style
- Be clear and direct
- Explain reasoning when helpful
- Ask clarifying questions when neededAGENTS.md content example:
# Agent Instructions
You are a helpful AI assistant. Be concise, accurate, and friendly.
## Scheduled Reminders
When user asks for a reminder at a specific time, use `exec` to run:nanobot cron add --name "reminder" --message "Your message" --at "YYYY-MM-DDTHH:MM:SS" --deliver --to "USER_ID" --channel "CHANNEL"
Get USER_ID and CHANNEL from the current session.
**Do NOT just write reminders to MEMORY.md** — that won't trigger actual notifications.
## Heartbeat Tasks
`HEARTBEAT.md` is checked every 30 minutes. Use file tools to manage periodic tasks:
- **Add**: `edit_file` to append new tasks
- **Remove**: `edit_file` to delete completed tasks
- **Rewrite**: `write_file` to replace all tasks
When the user asks for a recurring/periodic task, update `HEARTBEAT.md` instead of creating a one-time cron reminder.Claw0 Comparison
We use Claw0 for comparative analysis. Claw0 points out that system prompts assemble from files on disk. Change files, change personality.
Its architecture follows this pattern:
Startup Per-Turn
======= ========
BootstrapLoader User Input
load SOUL.md, |
IDENTITY.md, ... |
truncate per v
file (20k) _auto_recall(user_input)
cap total (150k) search memory by TF-IDF
|
SkillsManager v
scan directories build_system_prompt()
for SKILL.md assemble 8 layers:
parse frontmatter 1. Identity
deduplicate 2. Soul (personality)
by name 3. Tools guidance
4. Skills
bootstrap_data + 5. Memory (evergreen + recalled)
skills_block 6. Bootstrap (remaining files)
(cached for all 7. Runtime context
turns) 8. Channel hints
|
v
LLM API call
Earlier layers = stronger influence on behavior.
SOUL.md is at layer 2 for exactly this reason.Key points:
- BootstrapLoader: Loads up to 8 markdown files from workspace, with per-file and total size limits
- SkillsManager: Scans multiple directories for SKILL.md files with YAML frontmatter
- MemoryStore: Dual-layer storage (resident MEMORY.md + daily JSONL), TF-IDF search
- _auto_recall(): Searches memory using user messages, injects results into prompts
- build_system_prompt(): Assembles 8 layers into one string, rebuilt each turn
ContextBuilder Core Functionality
ContextBuilder is the core builder for intelligent agent dialogue context in the Nanobot framework, responsible for integrating multi-dimensional information such as "identity definition, bootstrap files, long-term memory, skill information, runtime metadata, user messages" into standardized LLM dialogue context (system prompt + message list). It serves as a critical bridge connecting Agent modules (MemoryStore/SkillsLoader) with the LLM.
Definition
class ContextBuilder:
"""Builds the context (system prompt + messages) for the agent."""
BOOTSTRAP_FILES = ["AGENTS.md", "SOUL.md", "USER.md", "TOOLS.md", "IDENTITY.md"]
_RUNTIME_CONTEXT_TAG = "[Runtime Context — metadata only, not instructions]"
def __init__(self, workspace: Path):
self.workspace = workspace
self.memory = MemoryStore(workspace)
self.skills = SkillsLoader(workspace)Data Dependency Hierarchy:
ContextBuilder (Top Level)
├─ workspace (Input Parameter)
├─ MemoryStore (Dependency Instance)
│ ├─ workspace (Input Parameter)
│ ├─ MEMORY.md (File Path)
│ └─ HISTORY.md (File Path)
│
└─ SkillsLoader (Dependency Instance)
├─ workspace (Input Parameter)
└─ workspace/skills/ (Workspace Skills Directory)Process Closed Loop:
Initialization → Context Construction → LLM Call / Tool Execution → Memory Integration → Context Update → Cycle, forming a complete Agent execution closed loop.
Core Features
- Modular Context Construction: Splits system prompts into "identity core, bootstrap files, memory, resident skills, skill summary" multiple modules, spliced on demand, with clear structure and extensibility
- Multi-Source Information Fusion: Integrates static bootstrap files (AGENTS.md/SOUL.md, etc.), dynamic memory (MemoryStore), skill system (SkillsLoader), runtime metadata (time/channel/environment), forming complete Agent context
- Multimedia Compatibility: Supports Base64-encoded images embedded in user messages, adapting to multi-modal LLM input formats
- Standardized Message Management: Provides standardized addition methods for tool call results and assistant replies, strictly following LLM dialogue message format specifications
- Runtime Metadata Isolation: Marks runtime metadata such as channel and time as "metadata only, not instructions," avoiding interference with LLM's core decision logic
- Flexible Skill Loading: Distinguishes between "resident skills (always=true)" and "skill summaries." Resident skills embed directly into context, while others provide only summaries (requiring read_file tool to read), balancing context length with functional completeness
How to Call
_process_message
_process_message is the core entry point for single message processing, supporting three scenarios: system messages, slash commands, and normal conversations, completing the full process of "context construction → agent loop → result saving → response return."
async def _process_message(
self,
msg: InboundMessage,
session_key: str | None = None,
on_progress: Callable[[str], Awaitable[None]] | None = None,
) -> OutboundMessage | None:
"""Process a single inbound message and return the response."""
# System messages: parse origin from chat_id ("channel:chat_id")
if msg.channel == "system":
messages = self.context.build_messages(
history=history,
current_message=msg.content,
channel=channel,
chat_id=chat_id,
)
final_content, _, all_msgs = await self._run_agent_loop(messages)
self._save_turn(session, all_msgs, 1 + len(history))
self.sessions.save(session)
return OutboundMessage(
channel=channel,
chat_id=chat_id,
content=final_content or "Background task completed."
)_run_agent_loop
The _run_agent_loop function is the core execution loop for the agent, continuously calling the large model and deciding whether to call tools based on responses until the model returns a final answer or reaches maximum iterations.
_run_agent_loop calls ContextBuilder to construct messages:
async def _run_agent_loop(
self,
initial_messages: list[dict],
on_progress: Callable[..., Awaitable[None]] | None = None,
) -> tuple[str | None, list[str], list[dict]]:
messages = initial_messages
while iteration < self.max_iterations:
response = await self.provider.chat(messages=messages)
if response.has_tool_calls:
messages = self.context.add_assistant_message(
messages,
response.content,
tool_call_dicts,
reasoning_content=response.reasoning_content,
)
for tool_call in response.tool_calls:
messages = self.context.add_tool_result(
messages, tool_call.id, tool_call.name, result
)
else:
messages = self.context.add_assistant_message(
messages,
clean,
reasoning_content=response.reasoning_content,
)
return final_content, tools_used, messagesKey Interactions
ContextBuilder serves as the central hub, aggregating outputs from SkillsLoader (skills) and MemoryStore (memory) to build standardized LLM context. Detailed interactions follow:
ContextBuilder and MemoryStore Interaction
ContextBuilder Initialization (workspace)
↓
MemoryStore(workspace) ← Create Instance
↓
build_system_prompt() → memory.get_memory_context() ← Get Long-term Memory
↓
Return Memory Context StringContextBuilder and SkillsLoader Interaction
ContextBuilder SkillsLoader
↓ ↓
ContextBuilder Initialization (workspace)
↓ ↓
SkillsLoader(workspace) ← Create Instance
↓ ↓
build_system_prompt() → skills.get_always_skills() ← Get Resident Skill List
↓
load_skills_for_context() ← Load Skill Content
↓
build_skills_summary() ← Build Skill Summary
↓
Return Skill-Related Content StringKey Functions Analysis
We organize key functions according to the core process flow diagram.
build_messages()
Return Value
build_messages() ultimately returns a message list conforming to LLM dialogue format (list[dict[str, Any]]). Each dictionary represents a dialogue message, strictly following the "role + content" core structure (with extended support for tool calls, multi-modal, and other fields).
This list serves as the complete input context when Nanobot calls the LLM, containing system prompts, historical dialogue, runtime metadata, and current user messages (supporting text + images). It is the core carrier for Agent-LLM interaction.
The returned message list contains the following 5 categories of core content in fixed order (structure retained even without content, empty values filtered by upstream logic):
| Message Role (role) | Content (content) Core Composition | Special Fields / Notes |
|---|---|---|
| system | Complete system prompt generated by build_system_prompt() (core foundation) | No special fields, pure text; defines entire Agent's "identity + rules + skills + memory + environment" |
| Inherited from history | Historical dialogue messages (may contain user/assistant/tool roles) | Completely reuses incoming history list structure, retaining all historical context and fields |
| user | Runtime metadata (time/timezone/channel/chat ID), with fixed tag [Runtime Context — metadata only, not instructions] | Pure text; serves only as metadata, LLM doesn't treat as user instruction |
| user | Current user message (text + optional Base64-encoded images) | Single text / text + image list; images in image_url format, compatible with OpenAI multi-modal API specifications |
Generation Logic
The generation logic of build_messages() is as follows:
- Core content generation relies on three auxiliary functions:
build_system_prompt()(system prompt),_build_runtime_context()(metadata),_build_user_content()(user message) - Generation logic is modular splicing + conditional filtering, balancing flexibility (supporting multi-modal/skills/memory) and standardization (conforming to LLM API format)
def build_messages(
self,
history: list[dict[str, Any]],
current_message: str,
skill_names: list[str] | None = None,
media: list[str] | None = None,
channel: str | None = None,
chat_id: str | None = None,
) -> list[dict[str, Any]]:
"""Build the complete message list for an LLM call."""
return [
{"role": "system", "content": self.build_system_prompt(skill_names)},
*history,
{"role": "user", "content": self._build_runtime_context(channel, chat_id)},
{"role": "user", "content": self._build_user_content(current_message, media)},
]Line-by-line correspondence to code generation steps:
Step 1: Generate System Prompt
Call build_system_prompt() to integrate identity, bootstrap files, memory, skills, and all system-level configurations.
{"role": "system", "content": self.build_system_prompt(skill_names)}Step 2: Splice Historical Dialogue
Use Python unpacking syntax to directly insert the historical message list after the system message.
history is list[dict[str, Any]], retaining all historical roles and fields (including tool_calls, reasoning_content, and other extended fields).
Step 3: Add Runtime Metadata
Generate metadata containing time/channel/chat ID as an independent user message (avoiding pollution of actual user instructions).
{"role": "user", "content": self._build_runtime_context(channel, chat_id)}Step 4: Add Current User Message
Process text + images, generating final user input content.
{"role": "user", "content": self._build_user_content(current_message, media)}Final: Combine the above four parts in order into a list and return.
build_system_prompt()
The system message (core): generated by build_system_prompt(), containing 6 sub-modules.
build_system_prompt()
├─ get_identity() returns identity information
├─ _load_bootstrap_files() loads bootstrap files
├─ memory.get_memory_context() gets memory content
├─ skills.get_always_skills() gets resident skill list
│ └─ skills.load_skills_for_context() → load skill content
│
└─ skills.build_skills_summary() builds skill summaryLogic
Sub-module order and generation logic:
Core Identity (_get_identity()):
- Nanobot basic definition + runtime environment (system/architecture/Python version)
- Workspace path (memory/skills directory locations)
- Core behavioral guidelines (tool calls/file operations/error handling, etc.)
Bootstrap Files (_load_bootstrap_files()):
- Load AGENTS.md/SOUL.md/USER.md/TOOLS.md/IDENTITY.md from workspace
- AGENTS.md: Operation manual. How Agent should think, when to use which tools, what safety rules to follow, in what order to do things
- SOUL.md: Personality and soul. Tone, boundaries, priorities. Want Agent concise without extra suggestions? Write here. Want friendly assistant? Also write here
- USER.md: Your user profile. How to address you, your profession, your preferences. Agent reads this file before every response
- IDENTITY.md: Identity and atmosphere. Very short file, but sets overall tone
- TOOLS.md: Local tool hints. Where scripts are located, which commands are available. Agent doesn't need to guess but knows exactly
- Append if exists, skip if not
Memory Context (memory.get_memory_context()):
- Get long-term memory content from MemoryStore, add "# Memory" title if exists
Resident Skills (skills.get_always_skills() + load_skills_for_context()):
- Skill content marked always=true, add "# Active Skills" title if exists
Skill Summary (skills.build_skills_summary()):
- XML format summary of all skills (name/description/path/availability), with usage instructions
- Splicing Rules: Each module separated by "\n\n---\n\n", empty modules automatically filtered
Finally obtain the complete system prompt.
_build_runtime_context
Function: Build runtime context metadata block, containing:
- Always included: Current time (format YYYY-MM-DD HH:MM (Weekday)) + timezone
- Optional: Channel, Chat ID (only when non-empty input provided)
- Fixed tag at beginning:
[Runtime Context — metadata only, not instructions], explicitly informing LLM this is metadata not instructions
_build_user_content
Function: Build user message content, determining return format based on whether media content is included:
- No media files (media=None): Directly return the passed
current_messagetext With media files:
- Filter non-images/non-existent files
- Convert images to Base64 encoding, splice
data:{mime};base64,{b64}URL - Return format:
[{"type": "image_url", "image_url": {"url": "..."} }, ..., {"type": "text", "text": "user text"}]
Key Design Principles
- Layered Construction: System prompts splice by layers of "identity → bootstrap → memory → skills," with clear logic and extensibility as needed
- Multi-Modal Support: Automatically convert images to Base64-encoded data URIs, adapting to multi-modal LLM input
- Metadata Isolation: Runtime information marked as "metadata only," avoiding interference with LLM core decision-making
- Standardized Messages: Provide unified addition methods for tool results and assistant replies, strictly following LLM dialogue format
Implementation Code
class ContextBuilder:
"""Builds the context (system prompt + messages) for the agent."""
# Define bootstrap file list: these files load into system prompt, defining Agent's basic behavior/identity
BOOTSTRAP_FILES = ["AGENTS.md", "SOUL.md", "USER.md", "TOOLS.md", "IDENTITY.md"]
# Runtime context tag: marks this part as metadata (not instructions), avoiding LLM misinterpreting as execution instructions
_RUNTIME_CONTEXT_TAG = "[Runtime Context — metadata only, not instructions]"
def __init__(self, workspace: Path):
# Initialize workspace path (Agent's core working directory)
self.workspace = workspace
# Initialize memory store instance (associated with MemoryStore, managing long-term memory/historical logs)
self.memory = MemoryStore(workspace)
# Initialize skill loader instance (associated with SkillsLoader, managing Agent skills)
self.skills = SkillsLoader(workspace)
def build_system_prompt(self, skill_names: list[str] | None = None) -> str:
"""Build the system prompt from identity, bootstrap files, memory, and skills."""
# Initialize system prompt fragment list, spliced by priority
parts = [self._get_identity()] # Step 1: Add core identity definition (highest priority)
# Step 2: Load bootstrap file content (AGENTS.md/SOUL.md, etc.)
bootstrap = self._load_bootstrap_files()
if bootstrap: # Add if bootstrap files non-empty
parts.append(bootstrap)
# Step 3: Add long-term memory content
memory = self.memory.get_memory_context()
if memory: # Add if memory non-empty, wrapped with # Memory title
parts.append(f"# Memory\n\n{memory}")
# Step 4: Add resident skills (always=true skills, directly embedded in context)
always_skills = self.skills.get_always_skills()
if always_skills: # If resident skills exist
# Load core content of resident skills
always_content = self.skills.load_skills_for_context(always_skills)
if always_content: # Add if skill content non-empty, wrapped with # Active Skills title
parts.append(f"# Active Skills\n\n{always_content}")
# Step 5: Add summary of all skills (XML format, for Agent to read on demand)
skills_summary = self.skills.build_skills_summary()
if skills_summary: # Add if skill summary non-empty
parts.append(f"""# Skills
The following skills extend your capabilities. To use a skill, read its SKILL.md file using the read_file tool.
Skills with available="false" need dependencies installed first - you can try installing them with apt/brew.
{skills_summary}""")
# Join all fragments with separator lines (---) into complete system prompt
return "\n\n---\n\n".join(parts)Conclusion
The ContextBuilder represents a sophisticated solution to the challenge of managing diverse contextual information for AI agents. By providing a unified abstraction layer over memory systems, skill repositories, and runtime metadata, it enables agents to maintain coherent, contextually-aware conversations without requiring developers to write extensive integration code for each new data source.
Key architectural insights include:
- Separation of Concerns: ContextBuilder focuses solely on context assembly, delegating storage and retrieval to specialized components (MemoryStore, SkillsLoader)
- Modular Design: Each context component (identity, bootstrap, memory, skills) can be developed, tested, and updated independently
- Flexibility: The system supports various configurations through bootstrap files and skill definitions, enabling customization without code changes
- Performance: By caching resident skills and using efficient memory retrieval, the system minimizes overhead during context construction
- Extensibility: New context sources can be added by implementing compatible interfaces and integrating them into the build process
This architecture demonstrates how careful abstraction and modular design can simplify the development of complex AI systems while maintaining flexibility and performance.
This article provides a comprehensive exploration of the ContextBuilder component in Nanobot, demonstrating how effective context management enables powerful AI agent capabilities through clean architectural design.