ContextBuilder Architecture: The Central Hub for AI Agent Context Management in Nanobot Framework
Executive Summary
OpenClaw comprises approximately 400,000 lines of code, making comprehensive reading and comprehension exceedingly challenging. Therefore, this series explores OpenClaw's distinctive features through Nanobot, an ultra-lightweight personal AI assistant framework open-sourced by Hong Kong University Data Science Laboratory (HKUDS), positioned as "Ultra-Lightweight OpenClaw"—ideal for learning Agent architecture.
Rich contextual information forms the foundation for effective Agent planning and action. An Agent requires access to various "contexts" during operation:
| Context Type | Examples | Storage Method |
|---|---|---|
| Conversation History | What the user just said | JSON / Database |
| Long-term Memory | User preferences, past summaries | Vector Database / Knowledge Graph / Text |
| External Knowledge | RAG-retrieved documents | Vector Database / API / Text |
| Tool Definitions | Callable function descriptions | Code / MCP Protocol / Text |
| Human Input | Annotations, corrections, reviews | Text / Forms |
| Temporary Drafts | Reasoning intermediate results | Memory / Temporary Files |
These elements differ in format, storage, and access methods. Without unified abstraction, integrating each new resource requires writing extensive glue code. How these elements store, select, compress, and fit into limited token windows—this truly determines AI effectiveness.
The ContextBuilder class serves as the Nanobot Agent's "contextual brain," integrating dispersed identity, memory, skills, and runtime information into standardized LLM-recognizable dialogue contexts. Its core value lies in shielding context construction complexity, providing Agents with "out-of-the-box" complete dialogue contexts, serving as the central hub connecting Agent modules with LLM.
Prompt System Architecture
OpenClaw's Markdown-Based Prompt System
OpenClaw's prompt system comprises a set of Markdown files placed in the workspace directory, each承担ing specific responsibilities. These injected Markdown files originate from a set of .md files in the Workspace, each with unique functions and easy readability:
AGENTS.md: Operation manual. How the Agent should think, when to use which tools, what safety rules to follow, and in what order to perform tasks.
SOUL.md: Personality and soul. Tone, boundaries, priorities. Want the Agent concise without excessive suggestions? Write it here. Want a friendly assistant? Also write it here.
USER.md: Your user profile. How to address you, your profession, your preferences. The Agent reads this file before every response.
MEMORY.md: Long-term memory. Facts that must never be lost.
YYYY-MM-DD.md: Daily logs. What happened today, which tasks are in progress, what you discussed. Tomorrow, the Agent opens yesterday's log and continues the context.
BOOTSTRAP.md: First-run ceremony (one-time, injected only for全新workspaces), such as guided dialogues.
IDENTITY.md: Identity and atmosphere. A very short file, but it sets the overall tone.
HEARTBEAT.md: Regular checklists. "Check email," "See if monitoring is running."
TOOLS.md: Local tool hints. Where scripts reside, which commands are available. This way, the Agent doesn't need to guess but knows exactly.
Nanobot's Similar Markdown File System
Nanobot employs a similar Markdown file system:
BOOTSTRAP_FILES = ["AGENTS.md", "SOUL.md", "USER.md", "TOOLS.md", "IDENTITY.md"]SOUL.md content example:
# Soul
I am nanobot 🐈, a personal AI assistant.
## Personality
- Helpful and friendly
- Concise and to the point
- Curious and eager to learn
## Values
- Accuracy over speed
- User privacy and safety
- Transparency in actions
## Communication Style
- Be clear and direct
- Explain reasoning when helpful
- Ask clarifying questions when neededAGENTS.md content example:
# Agent Instructions
You are a helpful AI assistant. Be concise, accurate, and friendly.
## Scheduled Reminders
When user asks for a reminder at a specific time, use `exec` to run:nanobot cron add --name "reminder" --message "Your message" --at "YYYY-MM-DDTHH:MM:SS" --deliver --to "USER_ID" --channel "CHANNEL"
Get USER_ID and CHANNEL from the current session.
**Do NOT just write reminders to MEMORY.md** — that won't trigger actual notifications.
## Heartbeat Tasks
`HEARTBEAT.md` is checked every 30 minutes. Use file tools to manage periodic tasks:
- **Add**: `edit_file` to append new tasks
- **Remove**: `edit_file` to delete completed tasks
- **Rewrite**: `write_file` to replace all tasks
When the user asks for a recurring/periodic task, update `HEARTBEAT.md` instead of creating a one-time cron reminder.Claw0 Comparative Analysis
Using Claw0 for comparative argumentation, Claw0指出system prompts assemble from files on disk. Change files, change personality.
Its architecture follows:
Startup Per-Turn
======= ========
BootstrapLoader User Input
load SOUL.md, |
IDENTITY.md, ... |
truncate per v
file (20k) _auto_recall(user_input)
cap total search memory by TF-IDF
(150k) |
| v
SkillsManager build_system_prompt()
scan directories assemble 8 layers:
for SKILL.md 1. Identity
parse frontmatter 2. Soul (personality)
deduplicate by 3. Tools guidance
name 4. Skills
| 5. Memory (evergreen + recalled)
v 6. Bootstrap (remaining files)
bootstrap_data + 7. Runtime context
skills_block 8. Channel hints
(cached for all turns)
|
v
LLM API callEarlier layers = stronger influence on behavior. SOUL.md sits at layer 2 for exactly this reason.
Key points:
- BootstrapLoader: Loads up to 8 markdown files from workspace, with per-file and total limits.
- SkillsManager: Scans multiple directories for SKILL.md files with YAML frontmatter.
- MemoryStore: Dual-layer storage (resident MEMORY.md + daily JSONL), TF-IDF search.
- _auto_recall(): Searches memory using user messages, injecting results into prompts.
- build_system_prompt(): Assembles 8 layers into one string, rebuilding each turn.
ContextBuilder Fundamental Functionality
ContextBuilder serves as the core builder for intelligent agent dialogue contexts in the Nanobot framework, responsible for integrating multi-dimensional information—including "identity definitions, bootstrap files, long-term memory, skill information, runtime metadata, user messages"—into standardized LLM dialogue contexts (system prompt + message lists). It acts as the critical bridge connecting Agent modules (MemoryStore/SkillsLoader) with LLM.
Definition and Dependencies
class ContextBuilder:
"""Builds the context (system prompt + messages) for the agent."""
BOOTSTRAP_FILES = ["AGENTS.md", "SOUL.md", "USER.md", "TOOLS.md", "IDENTITY.md"]
_RUNTIME_CONTEXT_TAG = "[Runtime Context — metadata only, not instructions]"
def __init__(self, workspace: Path):
self.workspace = workspace
self.memory = MemoryStore(workspace)
self.skills = SkillsLoader(workspace)Data Dependency Hierarchy:
ContextBuilder (Top Level)
├─ workspace (Input Parameter)
├─ MemoryStore (Dependency Instance)
│ ├─ workspace (Input Parameter)
│ ├─ MEMORY.md (File Path)
│ └─ HISTORY.md (File Path)
└─ SkillsLoader (Dependency Instance)
├─ workspace (Input Parameter)
└─ workspace/skills/ (Workspace Skills Directory)Process Closed Loop: Initialization → Context Construction → LLM Call / Tool Execution → Memory Integration → Context Update → Cycle, forming a complete Agent execution closed loop.
Core Features
Modular Context Construction: Divides system prompts into "identity core, bootstrap files, memory, resident skills, skill summary" multiple modules, splicing on-demand with clear structure and extensibility.
Multi-Source Information Fusion: Integrates static bootstrap files (AGENTS.md/SOUL.md, etc.), dynamic memory (MemoryStore), skill systems (SkillsLoader), and runtime metadata (time/channel/environment), forming complete Agent context.
Multimedia Compatibility: Supports Base64-encoded images embedded in user messages, adapting to multi-modal LLM input formats.
Standardized Message Management: Provides standardized addition methods for tool call results and assistant replies, strictly following LLM dialogue message format specifications.
Runtime Metadata Isolation: Marks channel, time, and other runtime metadata as "metadata only, not instructions," avoiding interference with LLM's core decision logic.
Flexible Skill Loading: Distinguishes "resident skills (always=true)" from "skill summaries." Resident skills embed directly into context, while others provide only summaries (requiring read_file tool to read), balancing context length with functional completeness.
Invocation Methods
_process_message: Single Message Processing Entry
_process_message serves as the core entry point for single message processing, supporting system messages, slash commands, and ordinary conversation three scenarios, completing the "context construction → agent loop → result saving → response return" full workflow.
async def _process_message(
self,
msg: InboundMessage,
session_key: str | None = None,
on_progress: Callable[[str], Awaitable[None]] | None = None,
) -> OutboundMessage | None:
"""Process a single inbound message and return the response."""
# System messages: parse origin from chat_id ("channel:chat_id")
if msg.channel == "system":
messages = self.context.build_messages(
history=history,
current_message=msg.content,
channel=channel,
chat_id=chat_id,
)
final_content, _, all_msgs = await self._run_agent_loop(messages)
self._save_turn(session, all_msgs, 1 + len(history))
self.sessions.save(session)
return OutboundMessage(
channel=channel,
chat_id=chat_id,
content=final_content or "Background task completed."
)_run_agent_loop: Core Execution Cycle
The _run_agent_loop function represents the agent's core execution cycle, continuously calling the large model and determining whether to call tools based on responses until the model returns a final answer or reaches maximum iterations.
_run_agent_loop calls ContextBuilder to construct messages:
async def _run_agent_loop(
self,
initial_messages: list[dict],
on_progress: Callable[..., Awaitable[None]] | None = None,
) -> tuple[str | None, list[str], list[dict]]:
messages = initial_messages
while iteration < self.max_iterations:
response = await self.provider.chat(messages=messages)
if response.has_tool_calls:
messages = self.context.add_assistant_message(
messages,
response.content,
tool_call_dicts,
reasoning_content=response.reasoning_content,
)
for tool_call in response.tool_calls:
messages = self.context.add_tool_result(
messages,
tool_call.id,
tool_call.name,
result
)
else:
messages = self.context.add_assistant_message(
messages,
clean,
reasoning_content=response.reasoning_content,
)
return final_content, tools_used, messagesKey Interaction Diagrams
ContextBuilder as Central Hub
ContextBuilder serves as the core hub, aggregating SkillsLoader (skills) and MemoryStore (memory) outputs, constructing standardized LLM contexts. Detailed interactions follow:
ContextBuilder and MemoryStore Interaction:
ContextBuilder initialization(workspace)
↓
MemoryStore(workspace) ← Create instance
↓
build_system_prompt() → memory.get_memory_context() ← Get long-term memory
↓
Return memory context stringContextBuilder and SkillsLoader Interaction:
ContextBuilder SkillsLoader
↓ ↓
ContextBuilder initialization(workspace) SkillsLoader(workspace) ← Create instance
↓
build_system_prompt() → skills.get_always_skills() ← Get resident skill list
↓
load_skills_for_context() ← Load skill content
↓
build_skills_summary() ← Build skill summary
↓
Return skill-related content stringKey Function Analysis
We organize key functions according to the core process flowchart.
build_messages(): Complete Message List Construction
Return Value
build_messages() ultimately returns a message list (list[dict[str, Any]]) conforming to LLM dialogue format. Each dictionary represents one dialogue message, strictly following the "role + content" core structure (with extended support for tool calls, multi-modal, and other fields).
This list serves as the complete input context when Nanobot calls LLM, containing system prompts, historical dialogues, runtime metadata, and current user messages (supporting text + images), forming the core carrier for Agent-LLM interaction.
The returned message list contains the following 5 core content types in fixed order (structure retained even without content, empty values filtered by upstream logic):
| Message Role | Content Core Composition | Special Fields / Notes |
|---|---|---|
| system | Complete system prompt generated by build_system_prompt() (core foundation) | No special fields, pure text; defines Agent's "identity + rules + skills + memory + environment" |
| Inherited from history | Historical dialogue messages (may contain user/assistant/tool roles) | Completely reuses incoming history list structure, retaining all historical context |
| user | Runtime metadata (time/timezone/channel/chat_id), with fixed tag [Runtime Context — metadata only, not instructions] | Pure text; serves only as metadata, LLM won't treat as user instruction |
| user | Current user message (text + optional base64-encoded images) | Single text / text + image list; images in image_url format, compatible with OpenAI multi-modal API specification |
Generation Logic
build_messages()'s generation logic follows:
- Core content generation relies on three auxiliary functions: build_system_prompt() (system prompt), _build_runtime_context() (metadata), _build_user_content() (user message)
- Generation logic employs modular splicing + conditional filtering, balancing flexibility (supporting multi-modal/skills/memory) and standardization (conforming to LLM API format)
def build_messages(
self,
history: list[dict[str, Any]],
current_message: str,
skill_names: list[str] | None = None,
media: list[str] | None = None,
channel: str | None = None,
chat_id: str | None = None,
) -> list[dict[str, Any]]:
"""Build the complete message list for an LLM call."""
return [
{"role": "system", "content": self.build_system_prompt(skill_names)},
*history,
{"role": "user", "content": self._build_runtime_context(channel, chat_id)},
{"role": "user", "content": self._build_user_content(current_message, media)},
]Line-by-line code correspondence generation steps:
Step 1: Generate System Prompt
Call build_system_prompt() to integrate identity, bootstrap files, memory, skills, and all system-level configurations.
{"role": "system", "content": self.build_system_prompt(skill_names)}Step 2: Splice Historical Dialogues
Use Python unpacking syntax to directly insert historical message lists after system messages.
history is list[dict[str, Any]], retaining all historical roles and fields (including tool_calls, reasoning_content, and other extended fields).
Step 3: Add Runtime Metadata
Generate metadata containing time/channel/chat_id as independent user messages (avoiding contamination of user's real instructions).
{"role": "user", "content": self._build_runtime_context(channel, chat_id)}Step 4: Add Current User Message
Process text + images, generating final user input content.
{"role": "user", "content": self._build_user_content(current_message, media)}Final: Combine the above four parts in order into a list and return.
build_system_prompt(): System Prompt Core Construction
The system message (core) generates via build_system_prompt(), containing 6 sub-modules:
build_system_prompt()
├─ get_identity() returns identity information
├─ _load_bootstrap_files() loads bootstrap files
├─ memory.get_memory_context() gets memory content
├─ skills.get_always_skills() gets resident skill list
│ └─ skills.load_skills_for_context() → load skill content
└─ skills.build_skills_summary() builds skill summaryLogic and Sub-Module Order
Core Identity (_get_identity()):
- nanobot basic definition + runtime environment (system/architecture/Python version)
- Workspace path (memory/skills directory locations)
- Core behavioral guidelines (tool calls/file operations/error handling, etc.)
Bootstrap Files (_load_bootstrap_files()):
Load workspace's AGENTS.md/SOUL.md/USER.md/TOOLS.md/IDENTITY.md:
- AGENTS.md: Operation manual. How Agent should think, when to use which tools, what safety rules follow, what order to do things.
- SOUL.md: Personality and soul. Tone, boundaries, priorities.
- USER.md: Your user profile. How to address you, your profession, your preferences. Agent reads this file before every response.
- IDENTITY.md: Identity and atmosphere. Very short file, but sets overall tone.
- TOOLS.md: Local tool hints. Where scripts reside, which commands available. Agent doesn't need to guess but knows exactly.
- Add if exists, skip if not
Memory Context (memory.get_memory_context()):
Get long-term memory content from MemoryStore, add "# Memory" title if exists.
Resident Skills (skills.get_always_skills() + load_skills_for_context()):
Mark always=true skill content, add "# Active Skills" title if exists.
Skill Summary (skills.build_skills_summary()):
All skills' XML format summary (name/description/path/availability), including usage instructions.
Splicing Rules: Separate modules with "\n\n---\n\n", empty modules auto-filtered.
Ultimately obtain the complete system prompt.
_build_runtime_context: Runtime Metadata Construction
Function: Build runtime context metadata block, including:
- Always includes: Current time (format YYYY-MM-DD HH:MM (Weekday)) + timezone
- Optional: Channel (channel), Chat ID (chat_id) (only when non-empty provided)
- Fixed opening tag: [Runtime Context — metadata only, not instructions], explicitly informing LLM this is metadata not instructions.
_build_user_content: User Message Content Construction
Function: Build user message content, determining return format based on whether media content exists:
- No media files (media=None): Directly return incoming current_message text
With media files:
- Filter non-images/non-existent files
- Convert images to base64 encoding, splice data:{mime};base64,{b64} URL
- Return format: [{"type": "image_url", "image_url": {"url": "..."}}, ..., {"type": "text", "text": "user text"}]
Key Code Implementation
class ContextBuilder:
"""Builds the context (system prompt + messages) for the agent."""
# Define bootstrap file list: these files load into system prompt, defining Agent's basic behavior/identity
BOOTSTRAP_FILES = ["AGENTS.md", "SOUL.md", "USER.md", "TOOLS.md", "IDENTITY.md"]
# Runtime context tag: marks this part as metadata (not instructions), avoiding LLM misinterpreting as execution instructions
_RUNTIME_CONTEXT_TAG = "[Runtime Context — metadata only, not instructions]"
def __init__(self, workspace: Path):
# Initialize workspace path (Agent's core working directory)
self.workspace = workspace
# Initialize memory store instance (associated with MemoryStore, managing long-term memory/historical logs)
self.memory = MemoryStore(workspace)
# Initialize skill loader instance (associated with SkillsLoader, managing Agent skills)
self.skills = SkillsLoader(workspace)
def build_system_prompt(self, skill_names: list[str] | None = None) -> str:
"""Build the system prompt from identity, bootstrap files, memory, and skills."""
# Initialize system prompt fragment list, splicing by priority
parts = [self._get_identity()] # Step 1: Add core identity definition (highest priority)
# Step 2: Load bootstrap file content (AGENTS.md/SOUL.md, etc.)
bootstrap = self._load_bootstrap_files()
if bootstrap: # Add when bootstrap files non-empty
parts.append(bootstrap)
# Step 3: Add long-term memory content
memory = self.memory.get_memory_context()
if memory: # Add when memory non-empty, wrapped with # Memory title
parts.append(f"# Memory\n\n{memory}")
# Step 4: Add resident skills (always=true skills, directly embedded in context)
always_skills = self.skills.get_always_skills()
if always_skills: # When resident skills exist
# Load resident skill core content
always_content = self.skills.load_skills_for_context(always_skills)
if always_content: # Add when skill content non-empty, wrapped with # Active Skills title
parts.append(f"# Active Skills\n\n{always_content}")
# Step 5: Add all skill summaries (XML format, for Agent on-demand reading)
skills_summary = self.skills.build_skills_summary()
if skills_summary: # Add when skill summary non-empty
parts.append(f"""# Skills
The following skills extend your capabilities. To use a skill, read its SKILL.md file using the read_file tool.
Skills with available="false" need dependencies installed first - you can try installing them with apt/brew.
{skills_summary}""")
# Splice all fragments with separator lines (---) into complete system prompt
return "\n\n---\n\n".join(parts)Design Principles and Best Practices
Layered Construction
System prompts build layer by layer following "identity → bootstrap → memory → skills," providing clear logic and on-demand extensibility. Earlier layers exert stronger influence on Agent behavior, which is why SOUL.md sits at layer 2.
Multi-Modal Support
Automatically convert images to Base64-encoded data URIs, adapting to multi-modal LLM inputs. This enables seamless integration of visual information into Agent reasoning processes.
Metadata Isolation
Runtime information marked as "metadata only" avoids interfering with LLM core decision-making. This separation ensures temporal and contextual awareness without contaminating instruction interpretation.
Standardized Messages
Provides unified addition methods for tool results and assistant replies, strictly following LLM dialogue format. This standardization enables consistent behavior across different LLM providers.
Practical Implementation Considerations
Token Budget Management
ContextBuilder must balance comprehensive context with token limitations. Strategies include:
- Truncating historical conversations when approaching limits
- Prioritizing recent interactions over older ones
- Summarizing extended memory content
- Conditional skill loading based on task relevance
Performance Optimization
Context construction occurs every turn, making efficiency critical:
- Cache bootstrap file contents after initial load
- Implement lazy loading for skill definitions
- Use incremental memory updates rather than full reloads
- Profile and optimize hot paths in message construction
Error Handling
Robust error handling ensures graceful degradation:
- Missing bootstrap files shouldn't crash the Agent
- Memory read failures should log warnings, not halt execution
- Skill loading errors should report clearly for debugging
- Invalid media files should filter silently
Conclusion and Future Directions
ContextBuilder represents the architectural centerpiece of Nanobot's Agent framework, elegantly solving the complex challenge of multi-source context integration. By providing a clean abstraction layer between diverse information sources and LLM consumption, it enables developers to focus on Agent logic rather than context management plumbing.
The modular design facilitates extension as new context types emerge. Future enhancements might include:
- Vector-based memory retrieval for semantic search
- Dynamic skill discovery based on conversation context
- Multi-agent context sharing for collaborative scenarios
- Compression techniques for extended conversation histories
Understanding ContextBuilder's architecture provides valuable insights applicable to broader Agent system design, demonstrating how thoughtful abstraction can tame complexity while maintaining flexibility and performance.
For developers building AI Agents, the patterns established in ContextBuilder offer a proven foundation for context management, balancing the competing demands of completeness, efficiency, and maintainability.