Learning Architecture Through Nanobot Source Code: Skills System Deep Dive

Overview

OpenClaw contains approximately 400,000 lines of code, making direct source code comprehension challenging. Therefore, this series learns OpenClaw's distinctive features through Nanobot.

Nanobot is an ultra-lightweight personal AI assistant framework open-sourced by HKU Data Science Laboratory (HKUDS), positioned as "Ultra-Lightweight OpenClaw." It's highly suitable for learning Agent architecture.

What is a Skill?

Skill (技能) packages specific domain expertise, workflows, and best practices into reusable instruction modules embedded into AI context. This enables AI to automatically activate and apply these professional capabilities when processing related tasks.

Simply put, Skill is a "skill manual" teaching AI how to work. By encapsulating domain knowledge, standard procedures, and execution tools into pluggable modules, Skills enable intelligent agents to achieve professional capability accumulation, sharing, and continuous optimization for the first time—truly advancing from "automated execution" to "professional thinking and action."

The Essence of Skill:

Transforming "tacit knowledge" into "explicit procedures."

Fundamental Principles

Function Calling Basics

Function Calling's essence: For LLM to achieve tool invocation, it actually only needs: function name, parameter requirements, and a description.

However, under the Function Calling framework, users face several challenges:

How to integrate into existing systems?
How to accurately guide the model to execute tasks according to specific processes and rules using text (not code)?
Writing all knowledge into prompts causes context explosion.

Solution: MCP and Skills

Both MCP and Skills are based on Function Calling—they represent different application approaches built upon this foundational capability.

MCP (Model Context Protocol):

Enables AI to "execute"—perform tools, read/write data
Focuses on "what can be done"

Skill:

Teaches AI "how to do"—provides operational manuals (telling the model step-by-step procedures)
Includes processes, standards, strategies, scripts, code
Skills aren't "trained into the model" but written into System Prompt
Loaded automatically during context construction
LLM learns to use through reading comprehension

Integration with Existing Systems

MCP Approach:

MCP is a protocol standardizing interaction between large models and external capabilities. If Tools solve "how models call a function," MCP solves "how models interact with long-existing, reusable capability services."

MCP's core addresses docking with existing systems. Its value lies in providing a standardized docking protocol, enabling different tools and data sources to be used by LLM in a unified manner.

Essentially, MCP resembles an API—Agents only care about what "parameters" the API submits and what "results" it returns.

Guiding Models Through Specific Processes

Skills Approach:

Skills provide a way for users to define instructions, scripts, and resources using text, forming reusable task workflows.

Skills essentially represent sub-agent packaging:

Skills are a clever application atop Function Calling: encapsulating the "load document" operation into a function, enabling the model to automatically search and load fixed skills documents when needed
Skills let users write workflows using text, replacing the logic of chaining various API calls previously written in code within MCP function calls
This approach enhances workflow environmental adaptability—models can dynamically adjust strategies based on actual situations, handle uncertainty, and understand natural language intents
Skills support more complex workflow definitions—through SKILL.md documents instructing LLM how to combine multiple interface calls, resulting in higher success rates for long workflow tasks

Solving Context Explosion

Two-Layer Loading Paradigm:

Layer 1 (Always Present): System prompt contains skill names (low cost)
Layer 2 (On Demand): Full content in tool_result

System Prompt (Layer 1):
+----------------------------------+
| You are a coding agent.          |
| Skills available:                |
| - git: Git workflow helpers      | ~100 tokens/skill
| - test: Testing best practices   |
+----------------------------------+

When model calls load_skill("git"):
+----------------------------------+
| tool_result (Layer 2):           |
| <skill name="git">               |
| Full git workflow instructions...| ~2000 tokens
| Step 1: ...                      |
| </skill>                         |
+----------------------------------+

Skills: The Experience Library for Large Models

Skill = Reusable Work Methods

Skills solidify expert experience, ensuring every execution follows best practices—transforming human experiential tasks into AI-reusable modules.

Skills serve as the project's streamlined extension mechanism. Instead of stuffing all possible functionality into the core codebase, new capabilities are defined as SKILL.md files—Markdown instruction documents teaching intelligent agents how to use specific tools or execute specific workflows.

Skill's Essence: Prompt + Tools

Prompt: Tells LLM "Who I am," "What I can do," "When to use me"
Tools: Concrete function implementations (typically CLI commands or API calls)

Tool vs. Skill Distinction:

Aspect	Tools	Skills
Position	Atomic capabilities	Task modules
Composition	Function definitions only (JSON Schema)	Prompt + Code + Process definitions
Focus	"I can call this API"	"I know how to complete this work"
Context	Stateless, typically one-time calls	Stateful, multi-step reasoning through Prompt guidance
Reusability	Code-level reuse	Business logic-level reuse

Core Principles of Skills

Skill's core lies in "modular packaging" and "progressive disclosure." Its emergence represents essentially a great innovation in engineering approaches. Its core goal is more efficiently releasing and harnessing the large model's existing potential.

Skills don't exist because large models suddenly became "smarter"—they package all elements required for completing specific professional tasks (knowledge, processes, tools) into independent modules, helping large models shift from aimless "flowing" to goal-oriented, step-clear "professional execution."

Key Innovations:

Operating Mechanism: The agent initially loads only lightweight skill index directories. After recognizing specific task requirements, it dynamically loads corresponding skill detailed instructions and resources. This "on-demand loading" mode achieves extreme utilization of context windows.
Accumulated Capabilities: Skills transform complex tasks into callable, reusable, composable "skill modules." Agent implementation is shifting from single Q&A toward accumulative reusable capabilities.
Overcoming Professional Deficiency: It makes implicit domain expert knowledge (such as financial analysis SOP, code review checklists) explicit and structured, becoming concrete instructions the model can understand and execute. This gives general large models deep professional capabilities. It's not adding new "capabilities" to AI (AI itself is already very capable) but adding domain knowledge and operational procedures—telling it: in this specific scenario, how should it act, what tools to use, what constraints to observe.
Resolving Context Explosion: Compared to traditional approaches of stuffing all knowledge into prompts or solidifying in lengthy workflows, Skill's "on-demand loading" mechanism can reduce context consumption by 40%-60% in complex tasks.

Skills and Architecture Selection

Critical Question:

When Single Agent encounters knowledge bottlenecks, besides splitting into Multi-Agent, is there a lighter-weight path?

Skill Mechanism's Answer:

Return to Single Agent architecture ontology, but endow it with strong dynamic expansion capabilities. This mode enables single Agents to possess "localized specialization" capabilities: macroscopically maintaining unified memory and state, microscopically flexibly mastering thousands of vertical domain professional knowledge like calling tools.

Architecture Selection Principle:

If Single Agent can solve it, never use complex architectures. When encountering knowledge bottlenecks, prioritize introducing Agent Skills mechanisms, expanding capability boundaries through dynamic progressive Skill loading.

Why Skills Maintain Context Consistency

Prompt Engineering Perspective:

Layer	Characteristics	Function
System Prompt	Constant, unchanging	Core system instructions (persona identity, basic requirements) remain stable, ensuring model cognitive unity
User Prompt	Dynamically injected	Skills content progressively disclosed as "user input" or "tool return results"

What This Means for the Model:

Like users continuously providing new reference materials during conversation, not forcibly changing its "persona."

Analogy:

Multi-Agent: Making the model "split personality" into multiple personas taking turns appearing
Skills: Letting the same expert consult different professional manuals when needed—persona unchanged, knowledge dynamically expanded

Triple Benefits of Agent Skills Architecture

1. Low-Cost Knowledge Injection

Truly achieves "manual-izing" massive domain knowledge—models read on-demand, no need for full pre-loading. This is lighter than Multi-Agent and more precise than RAG (RAG is passive reception after retrieval; Skills are active invocation with clear purpose).

2. Global Context Consistency

Since always executed by the same main Agent (similar to Orchestrator in Multi-Agent, but without routing layer), it completely knows:

Executed steps
Loaded Agent Skills
Current task status

This thoroughly eliminates information fragmentation and redundant work problems in Multi-Agent—no routing error risks, no context blind spots between sub-Agents.

3. Avoiding Context Explosion

Through "read a bit, do a bit, read a bit more" streaming processing, effectively controls instantaneous context length. Compared to Multi-Agent where each sub-node must maintain independent System Prompt redundancy, Skills only load when needed, releasing after use.

Essential Insight

Skill uses file system's structured capabilities to replace complex network communication protocols, using progressive information disclosure to replace violent full injection. This isn't simple "functional replacement" but architectural philosophy transformation: from "trading complexity for capability" toward "trading orchestration precision for efficiency."

Skill Structure

Skill isn't "a plugin list in a unified directory" but a layered discovery, workspace and sharing scope organized workflow system.

Three-Layer Structure (Three-Level Loading Mechanism):

Layer 1: Metadata (Always Loaded)

Skill description, functionality, and interface definitions
This information is always loaded in context, approximately 100 tokens
AI uses this to judge whether to trigger Skill

Layer 2: Instructions (Loaded on Trigger)

Structured professional operation steps and judgment logic
Loaded only after Skill is triggered

Layer 3: Resources (Loaded on Demand)

Data, templates, or tool links required for execution
Loaded only when AI judges necessary, no size limit
Scripts can be executed directly
references/ for detailed explanations
scripts/ for truly reusable scripts

Therefore, skill isn't "just one markdown file" but a group of resources organized around certain tasks. Or: Capability equals Files. Reuse + Resource Requirements = Skill.

Typical Skill Directory Structure

some-repository/
├── AGENTS.md                    # Permanent repository guidelines
└── .agents/
    └── skills/
        ├── rot13-encryption/    # AgentSkills standard (progressive disclosure)
        │   ├── SKILL.md
        │   ├── scripts/
        │   │   └── rot13.sh
        │   └── references/
        │       └── README.md
        ├── another-agentskill/  # AgentSkills standard
        │   ├── SKILL.md
        │   └── scripts/
        │       └── placeholder.sh
        └── legacy_trigger_this.md  # Legacy/OpenHands format

Through this three-layer structure (Metadata → Instructions → Code and Resources), a skill can not only be correctly recognized and triggered by Claude but can truly complete the full closed loop from "understanding requirements" to "executing tasks."

Skill vs Workflow vs Prompt

Skill vs Workflow

Workflow:

Pre-orchestrated, linear, fixed-path execution flowchart
Ensures complex tasks are "stably done correctly"

Skill:

Modular, composable, on-demand callable professional capability packages
Aims to enable intelligent agents to "professionally do correctly"

Skill opens a new stage of intelligent agent specialization and scaled application through knowledge packaging and flexible scheduling that the former finds difficult to implement. Skill's represented "capability modularization" and "knowledge engineering" direction is undoubtedly the key path for building next-generation professional, high-reliability intelligent agents.

Skill vs Prompt

Dimension	Prompt	Skill
Trigger	Manual pasting	Automatic matching
Capability	Text only	Documents + Scripts + Templates
Reusability	Low	High
Context	Full load every time	Progressive loading

Application Scenarios

Three Problem Types Suitable for Skills

Domain Knowledge AI Lacks
Repetitive Workflows
Operations Requiring Deterministic Execution (compared to direct prompts, Skills excel in reusability and consistency)

Situations Not Suitable for Skills

One-Time Tasks: You only use this workflow once; writing Skill takes longer than doing it directly
Highly Personalized: Each requirement is completely different, no patterns to solidify
AI Already Proficient: General coding, writing, translation, etc.—AI already has strong default behaviors; adding Skills may be superfluous

Future Outlook

In the future, many software's core assets may be: a set of high-quality, callable, auditable, reusable Agent Skills.

Agent competition will likely gradually shift from "whose model scores higher" to:

Who is better at solidifying capabilities into skills
Who is better at organizing skills into workflows
Who is better at making these processes run stably in real business

SkillsLoader Component

SkillsLoader is the core component responsible for intelligent agent Skills management in the Nanobot framework.

Core Responsibilities:

Load and manage intelligent agent skill files (with SKILL.md as carrier)
Validate skill files
Provide available skill retrieval, loading, and validation capabilities for agents
Serves as the foundational support module for Agent tool usage/task execution capabilities

Component Relationships

SkillsLoader: Scans plugins
ContextBuilder: Inserts plugin documents into context
LLM: Performs tasks according to plugin documents

Industry Paradigm

Referencing OpenHands documentation, the Skill system has four primary responsibilities:

Context Injection: Add specialized prompts to agent context based on triggers
Trigger Evaluation: Determine when skills should activate (always, keyword, task)
MCP Integration: Load MCP tools associated with repository skills
Third-Party Support: Parse .cursorrules, agents.md, and other skill formats

General Workflow

LLM Using Skills = Reading Documentation + Calling Tools

Humans are responsible for clarifying goals and constraints, Skills encapsulate implementation standards (code/processes), Agents are responsible for understanding intents and scheduling execution.

LLM doesn't need to understand specifically which tools exist and how to use them—it only needs to check tool manuals when needing to use certain tools, then take them out and use them.

Agent's Behavior is Progressive Disclosure:

Initialization: Only view Skill's name and description, glance at what this tool can do
Agent views skill summary, determines skill matches the task
Discovers need → calls read_file to read SKILL.md
Executes tasks according to steps defined in the document

Specific Workflow

Initialization Phase: Users define instructions, scripts, and resources using text, packaging into Skills (including SKILL.md and optional scripts, reference materials, etc.). LLM reads all Skills' metadata (names and descriptions) at startup. These metadata are loaded into the model's context (system prompts). Agent "knows" which Skills it possesses at every conversation start but doesn't know specific content. Used for subsequent intent matching and skill trigger judgment.
Discovery Phase: When users initiate requests, LLM compares request content against loaded Skills metadata, judging whether a certain Skill is needed. This judgment process is essentially LLM making decisions based on context, same as determining whether to call tools in Function Calling.
Two Key Points:
- Agent only seeks Skills when feeling unable to handle tasks. Simple one-step operations (like "read this file"), even if Description perfectly matches, may not trigger because Agent judges it can complete directly
- Agent naturally tends toward under-triggering. Description should be written proactively, pushing boundaries outward
Loading Phase (Function Calling): If LLM judges a certain Skill is needed (when user requests match a Skill's description), it calls a dedicated loading tool through Function Calling mechanism (like load_skill(skill_name)), reading and loading the corresponding SKILL.md document content into the current context. This step completely relies on Function Calling capabilities.
Execution Phase (Continuing to Use Function Calling): After SKILL.md content (including instructions, processes, examples, etc.) is added to context, LLM executes tasks according to instructions defined in the document. If SKILL.md defines scripts needing execution (like scripts/rotate_pdf.py), LLM still calls functions for executing scripts through Function Calling. If reference materials need loading, similarly calls file reading functions through Function Calling.

Key Observation:

The entire Skills operation process, from loading documents, executing scripts to reading resources, every step cannot leave Function Calling. Skills didn't create new capabilities—it merely organized the Function Calling foundational capability into a more usable form: letting users define workflows using text, letting LLM automatically discover and load relevant knowledge.

This also means Skill activation itself consumes 1-2 tool calls. Therefore, how accurately description is written directly impacts Token consumption and response speed. Mis-triggering means waste, missed triggering means capability loss.

Nanobot Skill Management System Core Features

Dual-Source Priority Loading: Supports "workspace skills" and "builtin skills" dual sources, workspace skills have higher priority (avoiding builtin skills being overridden)
Requirement Validation Mechanism: Automatically validates skill dependencies (CLI tools, environment variables), can filter out unavailable skills with unmet dependencies
Structured Skill Management: Provides skill list queries, single skill loading, skill summary construction (XML format), resident skill filtering, and other capabilities, adapting to Agent context loading, progressive skill usage requirements

Conclusion

Skills represent a fundamental architectural innovation in AI agent design. By transforming tacit knowledge into explicit, reusable modules with progressive disclosure mechanisms, Skills enable:

Scalable Expertise: Domain knowledge can be accumulated and shared across agents
Context Efficiency: On-demand loading dramatically reduces context window consumption
Maintainability: Skills as files enable version control and collaborative improvement
Professional Reliability: Standardized procedures ensure consistent, high-quality outputs

The future of AI agents lies not just in more powerful models but in better-organized knowledge and capabilities. Skills provide the architectural foundation for this evolution.

As the ecosystem matures, we can expect:

Richer skill libraries covering more domains
Better tooling for skill creation and validation
Standardized skill formats enabling interoperability
Community-driven skill marketplaces

The journey from prompts to skills represents AI's evolution from conversational tools to professional partners capable of complex, reliable task execution.