ContextBuilder Architecture: The Central Hub for AI Agent Context Management in Nanobot Framework

Executive Summary

OpenClaw comprises approximately 400,000 lines of code, making comprehensive reading and comprehension exceedingly challenging. Therefore, this series explores OpenClaw's distinctive features through Nanobot, an ultra-lightweight personal AI assistant framework open-sourced by Hong Kong University Data Science Laboratory (HKUDS), positioned as "Ultra-Lightweight OpenClaw"—ideal for learning Agent architecture.

Rich contextual information forms the foundation for effective Agent planning and action. An Agent requires access to various "contexts" during operation:

Context Type	Examples	Storage Method
Conversation History	What the user just said	JSON / Database
Long-term Memory	User preferences, past summaries	Vector Database / Knowledge Graph / Text
External Knowledge	RAG-retrieved documents	Vector Database / API / Text
Tool Definitions	Callable function descriptions	Code / MCP Protocol / Text
Human Input	Annotations, corrections, reviews	Text / Forms
Temporary Drafts	Reasoning intermediate results	Memory / Temporary Files

These elements differ in format, storage, and access methods. Without unified abstraction, integrating each new resource requires writing extensive glue code. How these elements store, select, compress, and fit into limited token windows—this truly determines AI effectiveness.

The ContextBuilder class serves as the Nanobot Agent's "contextual brain," integrating dispersed identity, memory, skills, and runtime information into standardized LLM-recognizable dialogue contexts. Its core value lies in shielding context construction complexity, providing Agents with "out-of-the-box" complete dialogue contexts, serving as the central hub connecting Agent modules with LLM.

Prompt System Architecture

OpenClaw's Markdown-Based Prompt System

OpenClaw's prompt system comprises a set of Markdown files placed in the workspace directory, each承担ing specific responsibilities. These injected Markdown files originate from a set of .md files in the Workspace, each with unique functions and easy readability:

AGENTS.md: Operation manual. How the Agent should think, when to use which tools, what safety rules to follow, and in what order to perform tasks.

SOUL.md: Personality and soul. Tone, boundaries, priorities. Want the Agent concise without excessive suggestions? Write it here. Want a friendly assistant? Also write it here.

USER.md: Your user profile. How to address you, your profession, your preferences. The Agent reads this file before every response.

MEMORY.md: Long-term memory. Facts that must never be lost.

YYYY-MM-DD.md: Daily logs. What happened today, which tasks are in progress, what you discussed. Tomorrow, the Agent opens yesterday's log and continues the context.

BOOTSTRAP.md: First-run ceremony (one-time, injected only for全新workspaces), such as guided dialogues.

IDENTITY.md: Identity and atmosphere. A very short file, but it sets the overall tone.

HEARTBEAT.md: Regular checklists. "Check email," "See if monitoring is running."

TOOLS.md: Local tool hints. Where scripts reside, which commands are available. This way, the Agent doesn't need to guess but knows exactly.

Nanobot's Similar Markdown File System

Nanobot employs a similar Markdown file system:

BOOTSTRAP_FILES = ["AGENTS.md", "SOUL.md", "USER.md", "TOOLS.md", "IDENTITY.md"]

SOUL.md content example:

# Soul

I am nanobot 🐈, a personal AI assistant.

## Personality
- Helpful and friendly
- Concise and to the point
- Curious and eager to learn

## Values
- Accuracy over speed
- User privacy and safety
- Transparency in actions

## Communication Style
- Be clear and direct
- Explain reasoning when helpful
- Ask clarifying questions when needed

AGENTS.md content example:

# Agent Instructions

You are a helpful AI assistant. Be concise, accurate, and friendly.

## Scheduled Reminders

When user asks for a reminder at a specific time, use `exec` to run:

nanobot cron add --name "reminder" --message "Your message" --at "YYYY-MM-DDTHH:MM:SS" --deliver --to "USER_ID" --channel "CHANNEL"

Get USER_ID and CHANNEL from the current session.

**Do NOT just write reminders to MEMORY.md** — that won't trigger actual notifications.

## Heartbeat Tasks

`HEARTBEAT.md` is checked every 30 minutes. Use file tools to manage periodic tasks:
- **Add**: `edit_file` to append new tasks
- **Remove**: `edit_file` to delete completed tasks
- **Rewrite**: `write_file` to replace all tasks

When the user asks for a recurring/periodic task, update `HEARTBEAT.md` instead of creating a one-time cron reminder.

Claw0 Comparative Analysis

Using Claw0 for comparative argumentation, Claw0指出system prompts assemble from files on disk. Change files, change personality.

Its architecture follows:

Startup        Per-Turn
=======        ========
BootstrapLoader  User Input
load SOUL.md,    |
IDENTITY.md, ... |
truncate per     v
file (20k)     _auto_recall(user_input)
cap total        search memory by TF-IDF
(150k)           |
|                v
SkillsManager    build_system_prompt()
scan directories assemble 8 layers:
for SKILL.md     1. Identity
parse frontmatter 2. Soul (personality)
deduplicate by   3. Tools guidance
name             4. Skills
|                5. Memory (evergreen + recalled)
v                6. Bootstrap (remaining files)
bootstrap_data + 7. Runtime context
skills_block     8. Channel hints
(cached for all turns)
|
v
LLM API call

Earlier layers = stronger influence on behavior. SOUL.md sits at layer 2 for exactly this reason.

Key points:

BootstrapLoader: Loads up to 8 markdown files from workspace, with per-file and total limits.
SkillsManager: Scans multiple directories for SKILL.md files with YAML frontmatter.
MemoryStore: Dual-layer storage (resident MEMORY.md + daily JSONL), TF-IDF search.
_auto_recall(): Searches memory using user messages, injecting results into prompts.
build_system_prompt(): Assembles 8 layers into one string, rebuilding each turn.

ContextBuilder Fundamental Functionality

ContextBuilder serves as the core builder for intelligent agent dialogue contexts in the Nanobot framework, responsible for integrating multi-dimensional information—including "identity definitions, bootstrap files, long-term memory, skill information, runtime metadata, user messages"—into standardized LLM dialogue contexts (system prompt + message lists). It acts as the critical bridge connecting Agent modules (MemoryStore/SkillsLoader) with LLM.

Definition and Dependencies

class ContextBuilder:
    """Builds the context (system prompt + messages) for the agent."""
    
    BOOTSTRAP_FILES = ["AGENTS.md", "SOUL.md", "USER.md", "TOOLS.md", "IDENTITY.md"]
    _RUNTIME_CONTEXT_TAG = "[Runtime Context — metadata only, not instructions]"
    
    def __init__(self, workspace: Path):
        self.workspace = workspace
        self.memory = MemoryStore(workspace)
        self.skills = SkillsLoader(workspace)

Data Dependency Hierarchy:

ContextBuilder (Top Level)
├─ workspace (Input Parameter)
├─ MemoryStore (Dependency Instance)
│  ├─ workspace (Input Parameter)
│  ├─ MEMORY.md (File Path)
│  └─ HISTORY.md (File Path)
└─ SkillsLoader (Dependency Instance)
   ├─ workspace (Input Parameter)
   └─ workspace/skills/ (Workspace Skills Directory)

Process Closed Loop: Initialization → Context Construction → LLM Call / Tool Execution → Memory Integration → Context Update → Cycle, forming a complete Agent execution closed loop.

Core Features

Modular Context Construction: Divides system prompts into "identity core, bootstrap files, memory, resident skills, skill summary" multiple modules, splicing on-demand with clear structure and extensibility.

Multi-Source Information Fusion: Integrates static bootstrap files (AGENTS.md/SOUL.md, etc.), dynamic memory (MemoryStore), skill systems (SkillsLoader), and runtime metadata (time/channel/environment), forming complete Agent context.

Multimedia Compatibility: Supports Base64-encoded images embedded in user messages, adapting to multi-modal LLM input formats.

Standardized Message Management: Provides standardized addition methods for tool call results and assistant replies, strictly following LLM dialogue message format specifications.

Runtime Metadata Isolation: Marks channel, time, and other runtime metadata as "metadata only, not instructions," avoiding interference with LLM's core decision logic.

Flexible Skill Loading: Distinguishes "resident skills (always=true)" from "skill summaries." Resident skills embed directly into context, while others provide only summaries (requiring read_file tool to read), balancing context length with functional completeness.

Invocation Methods

_process_message: Single Message Processing Entry

_process_message serves as the core entry point for single message processing, supporting system messages, slash commands, and ordinary conversation three scenarios, completing the "context construction → agent loop → result saving → response return" full workflow.

async def _process_message(
    self,
    msg: InboundMessage,
    session_key: str | None = None,
    on_progress: Callable[[str], Awaitable[None]] | None = None,
) -> OutboundMessage | None:
    """Process a single inbound message and return the response."""
    # System messages: parse origin from chat_id ("channel:chat_id")
    if msg.channel == "system":
        messages = self.context.build_messages(
            history=history,
            current_message=msg.content, 
            channel=channel, 
            chat_id=chat_id,
        )
        final_content, _, all_msgs = await self._run_agent_loop(messages)
        self._save_turn(session, all_msgs, 1 + len(history))
        self.sessions.save(session)
        return OutboundMessage(
            channel=channel, 
            chat_id=chat_id,
            content=final_content or "Background task completed."
        )

_run_agent_loop: Core Execution Cycle

The _run_agent_loop function represents the agent's core execution cycle, continuously calling the large model and determining whether to call tools based on responses until the model returns a final answer or reaches maximum iterations.

_run_agent_loop calls ContextBuilder to construct messages:

async def _run_agent_loop(
    self,
    initial_messages: list[dict],
    on_progress: Callable[..., Awaitable[None]] | None = None,
) -> tuple[str | None, list[str], list[dict]]:
    messages = initial_messages
    
    while iteration < self.max_iterations:
        response = await self.provider.chat(messages=messages)
        
        if response.has_tool_calls:
            messages = self.context.add_assistant_message(
                messages, 
                response.content, 
                tool_call_dicts,
                reasoning_content=response.reasoning_content,
            )
            
            for tool_call in response.tool_calls:
                messages = self.context.add_tool_result(
                    messages, 
                    tool_call.id, 
                    tool_call.name, 
                    result
                )
        else:
            messages = self.context.add_assistant_message(
                messages, 
                clean, 
                reasoning_content=response.reasoning_content,
            )
    
    return final_content, tools_used, messages

Key Interaction Diagrams

ContextBuilder as Central Hub

ContextBuilder serves as the core hub, aggregating SkillsLoader (skills) and MemoryStore (memory) outputs, constructing standardized LLM contexts. Detailed interactions follow:

ContextBuilder and MemoryStore Interaction:

ContextBuilder initialization(workspace)
    ↓
MemoryStore(workspace) ← Create instance
    ↓
build_system_prompt() → memory.get_memory_context() ← Get long-term memory
    ↓
Return memory context string

ContextBuilder and SkillsLoader Interaction:

ContextBuilder          SkillsLoader
    ↓                       ↓
ContextBuilder initialization(workspace)  SkillsLoader(workspace) ← Create instance
    ↓
build_system_prompt() → skills.get_always_skills() ← Get resident skill list
    ↓
load_skills_for_context() ← Load skill content
    ↓
build_skills_summary() ← Build skill summary
    ↓
Return skill-related content string

Key Function Analysis

We organize key functions according to the core process flowchart.

build_messages(): Complete Message List Construction

Return Value

build_messages() ultimately returns a message list (list[dict[str, Any]]) conforming to LLM dialogue format. Each dictionary represents one dialogue message, strictly following the "role + content" core structure (with extended support for tool calls, multi-modal, and other fields).

This list serves as the complete input context when Nanobot calls LLM, containing system prompts, historical dialogues, runtime metadata, and current user messages (supporting text + images), forming the core carrier for Agent-LLM interaction.

The returned message list contains the following 5 core content types in fixed order (structure retained even without content, empty values filtered by upstream logic):

Message Role	Content Core Composition	Special Fields / Notes
system	Complete system prompt generated by build_system_prompt() (core foundation)	No special fields, pure text; defines Agent's "identity + rules + skills + memory + environment"
Inherited from history	Historical dialogue messages (may contain user/assistant/tool roles)	Completely reuses incoming history list structure, retaining all historical context
user	Runtime metadata (time/timezone/channel/chat_id), with fixed tag [Runtime Context — metadata only, not instructions]	Pure text; serves only as metadata, LLM won't treat as user instruction
user	Current user message (text + optional base64-encoded images)	Single text / text + image list; images in image_url format, compatible with OpenAI multi-modal API specification

Generation Logic

build_messages()'s generation logic follows:

Core content generation relies on three auxiliary functions: build_system_prompt() (system prompt), _build_runtime_context() (metadata), _build_user_content() (user message)
Generation logic employs modular splicing + conditional filtering, balancing flexibility (supporting multi-modal/skills/memory) and standardization (conforming to LLM API format)

def build_messages(
    self,
    history: list[dict[str, Any]],
    current_message: str,
    skill_names: list[str] | None = None,
    media: list[str] | None = None,
    channel: str | None = None,
    chat_id: str | None = None,
) -> list[dict[str, Any]]:
    """Build the complete message list for an LLM call."""
    return [
        {"role": "system", "content": self.build_system_prompt(skill_names)},
        *history,
        {"role": "user", "content": self._build_runtime_context(channel, chat_id)},
        {"role": "user", "content": self._build_user_content(current_message, media)},
    ]

Line-by-line code correspondence generation steps:

Step 1: Generate System Prompt

Call build_system_prompt() to integrate identity, bootstrap files, memory, skills, and all system-level configurations.

{"role": "system", "content": self.build_system_prompt(skill_names)}

Step 2: Splice Historical Dialogues

Use Python unpacking syntax to directly insert historical message lists after system messages.

history is list[dict[str, Any]], retaining all historical roles and fields (including tool_calls, reasoning_content, and other extended fields).

Step 3: Add Runtime Metadata

Generate metadata containing time/channel/chat_id as independent user messages (avoiding contamination of user's real instructions).

{"role": "user", "content": self._build_runtime_context(channel, chat_id)}

Step 4: Add Current User Message

Process text + images, generating final user input content.

{"role": "user", "content": self._build_user_content(current_message, media)}

Final: Combine the above four parts in order into a list and return.

build_system_prompt(): System Prompt Core Construction

The system message (core) generates via build_system_prompt(), containing 6 sub-modules:

build_system_prompt()
├─ get_identity() returns identity information
├─ _load_bootstrap_files() loads bootstrap files
├─ memory.get_memory_context() gets memory content
├─ skills.get_always_skills() gets resident skill list
│  └─ skills.load_skills_for_context() → load skill content
└─ skills.build_skills_summary() builds skill summary

Logic and Sub-Module Order

Core Identity (_get_identity()):

nanobot basic definition + runtime environment (system/architecture/Python version)
Workspace path (memory/skills directory locations)
Core behavioral guidelines (tool calls/file operations/error handling, etc.)

Bootstrap Files (_load_bootstrap_files()):
Load workspace's AGENTS.md/SOUL.md/USER.md/TOOLS.md/IDENTITY.md:

AGENTS.md: Operation manual. How Agent should think, when to use which tools, what safety rules follow, what order to do things.
SOUL.md: Personality and soul. Tone, boundaries, priorities.
USER.md: Your user profile. How to address you, your profession, your preferences. Agent reads this file before every response.
IDENTITY.md: Identity and atmosphere. Very short file, but sets overall tone.
TOOLS.md: Local tool hints. Where scripts reside, which commands available. Agent doesn't need to guess but knows exactly.
Add if exists, skip if not

Memory Context (memory.get_memory_context()):
Get long-term memory content from MemoryStore, add "# Memory" title if exists.

Resident Skills (skills.get_always_skills() + load_skills_for_context()):
Mark always=true skill content, add "# Active Skills" title if exists.

Skill Summary (skills.build_skills_summary()):
All skills' XML format summary (name/description/path/availability), including usage instructions.

Splicing Rules: Separate modules with "\n\n---\n\n", empty modules auto-filtered.

Ultimately obtain the complete system prompt.

_build_runtime_context: Runtime Metadata Construction

Function: Build runtime context metadata block, including:

Always includes: Current time (format YYYY-MM-DD HH:MM (Weekday)) + timezone
Optional: Channel (channel), Chat ID (chat_id) (only when non-empty provided)
Fixed opening tag: [Runtime Context — metadata only, not instructions], explicitly informing LLM this is metadata not instructions.

_build_user_content: User Message Content Construction

Function: Build user message content, determining return format based on whether media content exists:

No media files (media=None): Directly return incoming current_message text
With media files:
- Filter non-images/non-existent files
- Convert images to base64 encoding, splice data:{mime};base64,{b64} URL
- Return format: [{"type": "image_url", "image_url": {"url": "..."}}, ..., {"type": "text", "text": "user text"}]

Key Code Implementation

class ContextBuilder:
    """Builds the context (system prompt + messages) for the agent."""
    
    # Define bootstrap file list: these files load into system prompt, defining Agent's basic behavior/identity
    BOOTSTRAP_FILES = ["AGENTS.md", "SOUL.md", "USER.md", "TOOLS.md", "IDENTITY.md"]
    # Runtime context tag: marks this part as metadata (not instructions), avoiding LLM misinterpreting as execution instructions
    _RUNTIME_CONTEXT_TAG = "[Runtime Context — metadata only, not instructions]"
    
    def __init__(self, workspace: Path):
        # Initialize workspace path (Agent's core working directory)
        self.workspace = workspace
        # Initialize memory store instance (associated with MemoryStore, managing long-term memory/historical logs)
        self.memory = MemoryStore(workspace)
        # Initialize skill loader instance (associated with SkillsLoader, managing Agent skills)
        self.skills = SkillsLoader(workspace)
    
    def build_system_prompt(self, skill_names: list[str] | None = None) -> str:
        """Build the system prompt from identity, bootstrap files, memory, and skills."""
        # Initialize system prompt fragment list, splicing by priority
        parts = [self._get_identity()]  # Step 1: Add core identity definition (highest priority)
        
        # Step 2: Load bootstrap file content (AGENTS.md/SOUL.md, etc.)
        bootstrap = self._load_bootstrap_files()
        if bootstrap:  # Add when bootstrap files non-empty
            parts.append(bootstrap)
        
        # Step 3: Add long-term memory content
        memory = self.memory.get_memory_context()
        if memory:  # Add when memory non-empty, wrapped with # Memory title
            parts.append(f"# Memory\n\n{memory}")
        
        # Step 4: Add resident skills (always=true skills, directly embedded in context)
        always_skills = self.skills.get_always_skills()
        if always_skills:  # When resident skills exist
            # Load resident skill core content
            always_content = self.skills.load_skills_for_context(always_skills)
            if always_content:  # Add when skill content non-empty, wrapped with # Active Skills title
                parts.append(f"# Active Skills\n\n{always_content}")
        
        # Step 5: Add all skill summaries (XML format, for Agent on-demand reading)
        skills_summary = self.skills.build_skills_summary()
        if skills_summary:  # Add when skill summary non-empty
            parts.append(f"""# Skills

The following skills extend your capabilities. To use a skill, read its SKILL.md file using the read_file tool.
Skills with available="false" need dependencies installed first - you can try installing them with apt/brew.

{skills_summary}""")
        
        # Splice all fragments with separator lines (---) into complete system prompt
        return "\n\n---\n\n".join(parts)

Design Principles and Best Practices

Layered Construction

System prompts build layer by layer following "identity → bootstrap → memory → skills," providing clear logic and on-demand extensibility. Earlier layers exert stronger influence on Agent behavior, which is why SOUL.md sits at layer 2.

Multi-Modal Support

Automatically convert images to Base64-encoded data URIs, adapting to multi-modal LLM inputs. This enables seamless integration of visual information into Agent reasoning processes.

Metadata Isolation

Runtime information marked as "metadata only" avoids interfering with LLM core decision-making. This separation ensures temporal and contextual awareness without contaminating instruction interpretation.

Standardized Messages

Provides unified addition methods for tool results and assistant replies, strictly following LLM dialogue format. This standardization enables consistent behavior across different LLM providers.

Practical Implementation Considerations

Token Budget Management

ContextBuilder must balance comprehensive context with token limitations. Strategies include:

Truncating historical conversations when approaching limits
Prioritizing recent interactions over older ones
Summarizing extended memory content
Conditional skill loading based on task relevance

Performance Optimization

Context construction occurs every turn, making efficiency critical:

Cache bootstrap file contents after initial load
Implement lazy loading for skill definitions
Use incremental memory updates rather than full reloads
Profile and optimize hot paths in message construction

Error Handling

Robust error handling ensures graceful degradation:

Missing bootstrap files shouldn't crash the Agent
Memory read failures should log warnings, not halt execution
Skill loading errors should report clearly for debugging
Invalid media files should filter silently

Conclusion and Future Directions

ContextBuilder represents the architectural centerpiece of Nanobot's Agent framework, elegantly solving the complex challenge of multi-source context integration. By providing a clean abstraction layer between diverse information sources and LLM consumption, it enables developers to focus on Agent logic rather than context management plumbing.

The modular design facilitates extension as new context types emerge. Future enhancements might include:

Vector-based memory retrieval for semantic search
Dynamic skill discovery based on conversation context
Multi-agent context sharing for collaborative scenarios
Compression techniques for extended conversation histories

Understanding ContextBuilder's architecture provides valuable insights applicable to broader Agent system design, demonstrating how thoughtful abstraction can tame complexity while maintaining flexibility and performance.

For developers building AI Agents, the patterns established in ContextBuilder offer a proven foundation for context management, balancing the competing demands of completeness, efficiency, and maintainability.