Understanding AI Agent Architecture Through Nanobot: A Deep Dive into ContextBuilder

Overview

OpenClaw reportedly contains around 400,000 lines of code, making direct reading and comprehension quite challenging. Therefore, this series uses Nanobot to learn OpenClaw's distinctive features.

Nanobot is an ultra-lightweight personal AI assistant framework open-sourced by the HKU Data Science Laboratory (HKUDS), positioned as an "Ultra-Lightweight OpenClaw." It's perfectly suited for learning Agent architecture.

Rich contextual information forms the foundation for effective Agent planning and action. An Agent requires access to various types of "context" during operation:

Context Type	Examples	Storage Method
Conversation History	What the user just said	JSON / Database
Long-term Memory	User preferences, past summaries	Vector DB / Knowledge Graph / Text
External Knowledge	RAG-retrieved documents	Vector DB / API / Text
Tool Definitions	Callable function descriptions	Code / MCP Protocol / Text
Human Input	Annotations, corrections, reviews	Text / Forms
Temporary Drafts	Intermediate reasoning results	Memory / Temporary Files

These elements differ in format, storage, and access methods. Without unified abstraction, integrating each new resource requires writing extensive glue code. How to store, select, compress, and fit these into the limited token window—this is what truly determines AI effectiveness.

The ContextBuilder class serves as Nanobot Agent's "contextual brain," integrating dispersed identity, memory, skills, and runtime information into standardized LLM-recognizable dialogue context. Its core value lies in shielding the complexity of context construction, providing the Agent with "out-of-the-box" complete dialogue context, serving as the central hub connecting Agent modules with the LLM.

Prompt System Architecture

OpenClaw's Approach

OpenClaw's prompt system consists of a set of Markdown files placed in the workspace directory, each serving a specific role. These injected Markdown files come from a set of .md files in the Workspace, each with unique functions and easy readability:

AGENTS.md: Operation manual. How the Agent should think, when to use which tools, what safety rules to follow, and in what order to do things.
SOUL.md: Personality and soul. Tone, boundaries, priorities. Want the Agent to be concise without extra suggestions? Write it here. Want a friendly assistant? Also write it here.
USER.md: Your user profile. How to address you, your profession, your preferences. The Agent reads this file before every response.
MEMORY.md: Long-term memory. Facts that must never be lost.
YYYY-MM-DD.md: Daily logs. What happened today, which tasks are in progress, what you discussed. Tomorrow, the Agent opens yesterday's log and continues the context.
BOOTSTRAP.md: First-run ritual (one-time, only for fresh workspaces), such as guided conversations.
IDENTITY.md: Identity and atmosphere. A very short file, but it sets the overall tone.
HEARTBEAT.md: Regular check清单."Check emails", "see if monitoring is running".
TOOLS.md: Local tool hints. Where scripts are located, which commands are available. This way the Agent doesn't need to guess but knows exactly.

Nanobot's Implementation

Nanobot employs a similar Markdown file system:

BOOTSTRAP_FILES = ["AGENTS.md", "SOUL.md", "USER.md", "TOOLS.md", "IDENTITY.md"]

SOUL.md content example:

# Soul

I am nanobot 🐈, a personal AI assistant.

## Personality
- Helpful and friendly
- Concise and to the point
- Curious and eager to learn

## Values
- Accuracy over speed
- User privacy and safety
- Transparency in actions

## Communication Style
- Be clear and direct
- Explain reasoning when helpful
- Ask clarifying questions when needed

AGENTS.md content example:

# Agent Instructions

You are a helpful AI assistant. Be concise, accurate, and friendly.

## Scheduled Reminders

When user asks for a reminder at a specific time, use `exec` to run:

nanobot cron add --name "reminder" --message "Your message" --at "YYYY-MM-DDTHH:MM:SS" --deliver --to "USER_ID" --channel "CHANNEL"

Get USER_ID and CHANNEL from the current session.

**Do NOT just write reminders to MEMORY.md** — that won't trigger actual notifications.

## Heartbeat Tasks

`HEARTBEAT.md` is checked every 30 minutes. Use file tools to manage periodic tasks:

- **Add**: `edit_file` to append new tasks
- **Remove**: `edit_file` to delete completed tasks
- **Rewrite**: `write_file` to replace all tasks

When the user asks for a recurring/periodic task, update `HEARTBEAT.md` instead of creating a one-time cron reminder.

Claw0 Comparison

We use Claw0 for comparative analysis. Claw0 points out that system prompts assemble from files on disk. Change files, change personality.

Its architecture follows this pattern:

Startup          Per-Turn
=======          ========
BootstrapLoader  User Input
load SOUL.md,    |
IDENTITY.md, ... |
truncate per     v
file (20k)       _auto_recall(user_input)
cap total (150k)   search memory by TF-IDF
                   |
SkillsManager      v
scan directories   build_system_prompt()
for SKILL.md       assemble 8 layers:
parse frontmatter  1. Identity
deduplicate        2. Soul (personality)
by name            3. Tools guidance
                   4. Skills
bootstrap_data +   5. Memory (evergreen + recalled)
skills_block       6. Bootstrap (remaining files)
(cached for all    7. Runtime context
turns)             8. Channel hints
                   |
                   v
                   LLM API call

Earlier layers = stronger influence on behavior.
SOUL.md is at layer 2 for exactly this reason.

Key points:

BootstrapLoader: Loads up to 8 markdown files from workspace, with per-file and total size limits
SkillsManager: Scans multiple directories for SKILL.md files with YAML frontmatter
MemoryStore: Dual-layer storage (resident MEMORY.md + daily JSONL), TF-IDF search
_auto_recall(): Searches memory using user messages, injects results into prompts
build_system_prompt(): Assembles 8 layers into one string, rebuilt each turn

ContextBuilder Core Functionality

ContextBuilder is the core builder for intelligent agent dialogue context in the Nanobot framework, responsible for integrating multi-dimensional information such as "identity definition, bootstrap files, long-term memory, skill information, runtime metadata, user messages" into standardized LLM dialogue context (system prompt + message list). It serves as a critical bridge connecting Agent modules (MemoryStore/SkillsLoader) with the LLM.

Definition

class ContextBuilder:
    """Builds the context (system prompt + messages) for the agent."""

    BOOTSTRAP_FILES = ["AGENTS.md", "SOUL.md", "USER.md", "TOOLS.md", "IDENTITY.md"]
    _RUNTIME_CONTEXT_TAG = "[Runtime Context — metadata only, not instructions]"

    def __init__(self, workspace: Path):
        self.workspace = workspace
        self.memory = MemoryStore(workspace)
        self.skills = SkillsLoader(workspace)

Data Dependency Hierarchy:

ContextBuilder (Top Level)
├─ workspace (Input Parameter)
├─ MemoryStore (Dependency Instance)
│  ├─ workspace (Input Parameter)
│  ├─ MEMORY.md (File Path)
│  └─ HISTORY.md (File Path)
│
└─ SkillsLoader (Dependency Instance)
   ├─ workspace (Input Parameter)
   └─ workspace/skills/ (Workspace Skills Directory)

Process Closed Loop:
Initialization → Context Construction → LLM Call / Tool Execution → Memory Integration → Context Update → Cycle, forming a complete Agent execution closed loop.

Core Features

Modular Context Construction: Splits system prompts into "identity core, bootstrap files, memory, resident skills, skill summary" multiple modules, spliced on demand, with clear structure and extensibility
Multi-Source Information Fusion: Integrates static bootstrap files (AGENTS.md/SOUL.md, etc.), dynamic memory (MemoryStore), skill system (SkillsLoader), runtime metadata (time/channel/environment), forming complete Agent context
Multimedia Compatibility: Supports Base64-encoded images embedded in user messages, adapting to multi-modal LLM input formats
Standardized Message Management: Provides standardized addition methods for tool call results and assistant replies, strictly following LLM dialogue message format specifications
Runtime Metadata Isolation: Marks runtime metadata such as channel and time as "metadata only, not instructions," avoiding interference with LLM's core decision logic
Flexible Skill Loading: Distinguishes between "resident skills (always=true)" and "skill summaries." Resident skills embed directly into context, while others provide only summaries (requiring read_file tool to read), balancing context length with functional completeness

How to Call

_process_message

_process_message is the core entry point for single message processing, supporting three scenarios: system messages, slash commands, and normal conversations, completing the full process of "context construction → agent loop → result saving → response return."

async def _process_message(
    self,
    msg: InboundMessage,
    session_key: str | None = None,
    on_progress: Callable[[str], Awaitable[None]] | None = None,
) -> OutboundMessage | None:
    """Process a single inbound message and return the response."""
    # System messages: parse origin from chat_id ("channel:chat_id")
    if msg.channel == "system":
        messages = self.context.build_messages(
            history=history,
            current_message=msg.content,
            channel=channel,
            chat_id=chat_id,
        )
        final_content, _, all_msgs = await self._run_agent_loop(messages)
        self._save_turn(session, all_msgs, 1 + len(history))
        self.sessions.save(session)
        return OutboundMessage(
            channel=channel,
            chat_id=chat_id,
            content=final_content or "Background task completed."
        )

_run_agent_loop

The _run_agent_loop function is the core execution loop for the agent, continuously calling the large model and deciding whether to call tools based on responses until the model returns a final answer or reaches maximum iterations.

_run_agent_loop calls ContextBuilder to construct messages:

async def _run_agent_loop(
    self,
    initial_messages: list[dict],
    on_progress: Callable[..., Awaitable[None]] | None = None,
) -> tuple[str | None, list[str], list[dict]]:

    messages = initial_messages

    while iteration < self.max_iterations:
        response = await self.provider.chat(messages=messages)

        if response.has_tool_calls:
            messages = self.context.add_assistant_message(
                messages,
                response.content,
                tool_call_dicts,
                reasoning_content=response.reasoning_content,
            )

            for tool_call in response.tool_calls:
                messages = self.context.add_tool_result(
                    messages, tool_call.id, tool_call.name, result
                )
        else:
            messages = self.context.add_assistant_message(
                messages,
                clean,
                reasoning_content=response.reasoning_content,
            )

    return final_content, tools_used, messages

Key Interactions

ContextBuilder serves as the central hub, aggregating outputs from SkillsLoader (skills) and MemoryStore (memory) to build standardized LLM context. Detailed interactions follow:

ContextBuilder and MemoryStore Interaction

ContextBuilder Initialization (workspace)
    ↓
MemoryStore(workspace) ← Create Instance
    ↓
build_system_prompt() → memory.get_memory_context() ← Get Long-term Memory
    ↓
Return Memory Context String

ContextBuilder and SkillsLoader Interaction

ContextBuilder          SkillsLoader
    ↓                       ↓
ContextBuilder Initialization (workspace)
    ↓                       ↓
                          SkillsLoader(workspace) ← Create Instance
    ↓                       ↓
build_system_prompt() → skills.get_always_skills() ← Get Resident Skill List
    ↓
load_skills_for_context() ← Load Skill Content
    ↓
build_skills_summary() ← Build Skill Summary
    ↓
Return Skill-Related Content String

Key Functions Analysis

We organize key functions according to the core process flow diagram.

build_messages()

Return Value

build_messages() ultimately returns a message list conforming to LLM dialogue format (list[dict[str, Any]]). Each dictionary represents a dialogue message, strictly following the "role + content" core structure (with extended support for tool calls, multi-modal, and other fields).

This list serves as the complete input context when Nanobot calls the LLM, containing system prompts, historical dialogue, runtime metadata, and current user messages (supporting text + images). It is the core carrier for Agent-LLM interaction.

The returned message list contains the following 5 categories of core content in fixed order (structure retained even without content, empty values filtered by upstream logic):

Message Role (role)	Content (content) Core Composition	Special Fields / Notes
system	Complete system prompt generated by build_system_prompt() (core foundation)	No special fields, pure text; defines entire Agent's "identity + rules + skills + memory + environment"
Inherited from history	Historical dialogue messages (may contain user/assistant/tool roles)	Completely reuses incoming history list structure, retaining all historical context and fields
user	Runtime metadata (time/timezone/channel/chat ID), with fixed tag `[Runtime Context — metadata only, not instructions]`	Pure text; serves only as metadata, LLM doesn't treat as user instruction
user	Current user message (text + optional Base64-encoded images)	Single text / text + image list; images in image_url format, compatible with OpenAI multi-modal API specifications

Generation Logic

The generation logic of build_messages() is as follows:

Core content generation relies on three auxiliary functions: build_system_prompt() (system prompt), _build_runtime_context() (metadata), _build_user_content() (user message)
Generation logic is modular splicing + conditional filtering, balancing flexibility (supporting multi-modal/skills/memory) and standardization (conforming to LLM API format)

def build_messages(
    self,
    history: list[dict[str, Any]],
    current_message: str,
    skill_names: list[str] | None = None,
    media: list[str] | None = None,
    channel: str | None = None,
    chat_id: str | None = None,
) -> list[dict[str, Any]]:
    """Build the complete message list for an LLM call."""
    return [
        {"role": "system", "content": self.build_system_prompt(skill_names)},
        *history,
        {"role": "user", "content": self._build_runtime_context(channel, chat_id)},
        {"role": "user", "content": self._build_user_content(current_message, media)},
    ]

Line-by-line correspondence to code generation steps:

Step 1: Generate System Prompt

Call build_system_prompt() to integrate identity, bootstrap files, memory, skills, and all system-level configurations.

{"role": "system", "content": self.build_system_prompt(skill_names)}

Step 2: Splice Historical Dialogue

Use Python unpacking syntax to directly insert the historical message list after the system message.

history is list[dict[str, Any]], retaining all historical roles and fields (including tool_calls, reasoning_content, and other extended fields).

Step 3: Add Runtime Metadata

Generate metadata containing time/channel/chat ID as an independent user message (avoiding pollution of actual user instructions).

{"role": "user", "content": self._build_runtime_context(channel, chat_id)}

Step 4: Add Current User Message

Process text + images, generating final user input content.

{"role": "user", "content": self._build_user_content(current_message, media)}

Final: Combine the above four parts in order into a list and return.

build_system_prompt()

The system message (core): generated by build_system_prompt(), containing 6 sub-modules.

build_system_prompt()
├─ get_identity() returns identity information
├─ _load_bootstrap_files() loads bootstrap files
├─ memory.get_memory_context() gets memory content
├─ skills.get_always_skills() gets resident skill list
│  └─ skills.load_skills_for_context() → load skill content
│
└─ skills.build_skills_summary() builds skill summary

Logic

Sub-module order and generation logic:

Core Identity (_get_identity()):
- Nanobot basic definition + runtime environment (system/architecture/Python version)
- Workspace path (memory/skills directory locations)
- Core behavioral guidelines (tool calls/file operations/error handling, etc.)
Bootstrap Files (_load_bootstrap_files()):
- Load AGENTS.md/SOUL.md/USER.md/TOOLS.md/IDENTITY.md from workspace
- AGENTS.md: Operation manual. How Agent should think, when to use which tools, what safety rules to follow, in what order to do things
- SOUL.md: Personality and soul. Tone, boundaries, priorities. Want Agent concise without extra suggestions? Write here. Want friendly assistant? Also write here
- USER.md: Your user profile. How to address you, your profession, your preferences. Agent reads this file before every response
- IDENTITY.md: Identity and atmosphere. Very short file, but sets overall tone
- TOOLS.md: Local tool hints. Where scripts are located, which commands are available. Agent doesn't need to guess but knows exactly
- Append if exists, skip if not
Memory Context (memory.get_memory_context()):
- Get long-term memory content from MemoryStore, add "# Memory" title if exists
Resident Skills (skills.get_always_skills() + load_skills_for_context()):
- Skill content marked always=true, add "# Active Skills" title if exists
Skill Summary (skills.build_skills_summary()):
- XML format summary of all skills (name/description/path/availability), with usage instructions
Splicing Rules: Each module separated by "\n\n---\n\n", empty modules automatically filtered

Finally obtain the complete system prompt.

_build_runtime_context

Function: Build runtime context metadata block, containing:

Always included: Current time (format YYYY-MM-DD HH:MM (Weekday)) + timezone
Optional: Channel, Chat ID (only when non-empty input provided)
Fixed tag at beginning: [Runtime Context — metadata only, not instructions], explicitly informing LLM this is metadata not instructions

_build_user_content

Function: Build user message content, determining return format based on whether media content is included:

No media files (media=None): Directly return the passed current_message text
With media files:
- Filter non-images/non-existent files
- Convert images to Base64 encoding, splice data:{mime};base64,{b64} URL
- Return format: [{"type": "image_url", "image_url": {"url": "..."} }, ..., {"type": "text", "text": "user text"}]

Key Design Principles

Layered Construction: System prompts splice by layers of "identity → bootstrap → memory → skills," with clear logic and extensibility as needed
Multi-Modal Support: Automatically convert images to Base64-encoded data URIs, adapting to multi-modal LLM input
Metadata Isolation: Runtime information marked as "metadata only," avoiding interference with LLM core decision-making
Standardized Messages: Provide unified addition methods for tool results and assistant replies, strictly following LLM dialogue format

Implementation Code

class ContextBuilder:
    """Builds the context (system prompt + messages) for the agent."""

    # Define bootstrap file list: these files load into system prompt, defining Agent's basic behavior/identity
    BOOTSTRAP_FILES = ["AGENTS.md", "SOUL.md", "USER.md", "TOOLS.md", "IDENTITY.md"]
    
    # Runtime context tag: marks this part as metadata (not instructions), avoiding LLM misinterpreting as execution instructions
    _RUNTIME_CONTEXT_TAG = "[Runtime Context — metadata only, not instructions]"

    def __init__(self, workspace: Path):
        # Initialize workspace path (Agent's core working directory)
        self.workspace = workspace
        # Initialize memory store instance (associated with MemoryStore, managing long-term memory/historical logs)
        self.memory = MemoryStore(workspace)
        # Initialize skill loader instance (associated with SkillsLoader, managing Agent skills)
        self.skills = SkillsLoader(workspace)

    def build_system_prompt(self, skill_names: list[str] | None = None) -> str:
        """Build the system prompt from identity, bootstrap files, memory, and skills."""
        # Initialize system prompt fragment list, spliced by priority
        parts = [self._get_identity()]  # Step 1: Add core identity definition (highest priority)

        # Step 2: Load bootstrap file content (AGENTS.md/SOUL.md, etc.)
        bootstrap = self._load_bootstrap_files()
        if bootstrap:  # Add if bootstrap files non-empty
            parts.append(bootstrap)

        # Step 3: Add long-term memory content
        memory = self.memory.get_memory_context()
        if memory:  # Add if memory non-empty, wrapped with # Memory title
            parts.append(f"# Memory\n\n{memory}")

        # Step 4: Add resident skills (always=true skills, directly embedded in context)
        always_skills = self.skills.get_always_skills()
        if always_skills:  # If resident skills exist
            # Load core content of resident skills
            always_content = self.skills.load_skills_for_context(always_skills)
            if always_content:  # Add if skill content non-empty, wrapped with # Active Skills title
                parts.append(f"# Active Skills\n\n{always_content}")

        # Step 5: Add summary of all skills (XML format, for Agent to read on demand)
        skills_summary = self.skills.build_skills_summary()
        if skills_summary:  # Add if skill summary non-empty
            parts.append(f"""# Skills

The following skills extend your capabilities. To use a skill, read its SKILL.md file using the read_file tool.
Skills with available="false" need dependencies installed first - you can try installing them with apt/brew.

{skills_summary}""")

        # Join all fragments with separator lines (---) into complete system prompt
        return "\n\n---\n\n".join(parts)

Conclusion

The ContextBuilder represents a sophisticated solution to the challenge of managing diverse contextual information for AI agents. By providing a unified abstraction layer over memory systems, skill repositories, and runtime metadata, it enables agents to maintain coherent, contextually-aware conversations without requiring developers to write extensive integration code for each new data source.

Key architectural insights include:

Separation of Concerns: ContextBuilder focuses solely on context assembly, delegating storage and retrieval to specialized components (MemoryStore, SkillsLoader)
Modular Design: Each context component (identity, bootstrap, memory, skills) can be developed, tested, and updated independently
Flexibility: The system supports various configurations through bootstrap files and skill definitions, enabling customization without code changes
Performance: By caching resident skills and using efficient memory retrieval, the system minimizes overhead during context construction
Extensibility: New context sources can be added by implementing compatible interfaces and integrating them into the build process

This architecture demonstrates how careful abstraction and modular design can simplify the development of complex AI systems while maintaining flexibility and performance.

This article provides a comprehensive exploration of the ContextBuilder component in Nanobot, demonstrating how effective context management enables powerful AI agent capabilities through clean architectural design.