Learning Architecture Through Nanobot Source Code: Skills System Deep Dive
Overview
OpenClaw contains approximately 400,000 lines of code, making direct source code comprehension challenging. Therefore, this series learns OpenClaw's distinctive features through Nanobot.
Nanobot is an ultra-lightweight personal AI assistant framework open-sourced by HKU Data Science Laboratory (HKUDS), positioned as "Ultra-Lightweight OpenClaw." It's highly suitable for learning Agent architecture.
What is a Skill?
Skill (技能) packages specific domain expertise, workflows, and best practices into reusable instruction modules embedded into AI context. This enables AI to automatically activate and apply these professional capabilities when processing related tasks.
Simply put, Skill is a "skill manual" teaching AI how to work. By encapsulating domain knowledge, standard procedures, and execution tools into pluggable modules, Skills enable intelligent agents to achieve professional capability accumulation, sharing, and continuous optimization for the first time—truly advancing from "automated execution" to "professional thinking and action."
The Essence of Skill:
Transforming "tacit knowledge" into "explicit procedures."
Fundamental Principles
Function Calling Basics
Function Calling's essence: For LLM to achieve tool invocation, it actually only needs: function name, parameter requirements, and a description.
However, under the Function Calling framework, users face several challenges:
- How to integrate into existing systems?
- How to accurately guide the model to execute tasks according to specific processes and rules using text (not code)?
- Writing all knowledge into prompts causes context explosion.
Solution: MCP and Skills
Both MCP and Skills are based on Function Calling—they represent different application approaches built upon this foundational capability.
MCP (Model Context Protocol):
- Enables AI to "execute"—perform tools, read/write data
- Focuses on "what can be done"
Skill:
- Teaches AI "how to do"—provides operational manuals (telling the model step-by-step procedures)
- Includes processes, standards, strategies, scripts, code
- Skills aren't "trained into the model" but written into System Prompt
- Loaded automatically during context construction
- LLM learns to use through reading comprehension
Integration with Existing Systems
MCP Approach:
MCP is a protocol standardizing interaction between large models and external capabilities. If Tools solve "how models call a function," MCP solves "how models interact with long-existing, reusable capability services."
MCP's core addresses docking with existing systems. Its value lies in providing a standardized docking protocol, enabling different tools and data sources to be used by LLM in a unified manner.
Essentially, MCP resembles an API—Agents only care about what "parameters" the API submits and what "results" it returns.
Guiding Models Through Specific Processes
Skills Approach:
Skills provide a way for users to define instructions, scripts, and resources using text, forming reusable task workflows.
Skills essentially represent sub-agent packaging:
- Skills are a clever application atop Function Calling: encapsulating the "load document" operation into a function, enabling the model to automatically search and load fixed skills documents when needed
- Skills let users write workflows using text, replacing the logic of chaining various API calls previously written in code within MCP function calls
- This approach enhances workflow environmental adaptability—models can dynamically adjust strategies based on actual situations, handle uncertainty, and understand natural language intents
- Skills support more complex workflow definitions—through SKILL.md documents instructing LLM how to combine multiple interface calls, resulting in higher success rates for long workflow tasks
Solving Context Explosion
Two-Layer Loading Paradigm:
Layer 1 (Always Present): System prompt contains skill names (low cost)
Layer 2 (On Demand): Full content in tool_result
System Prompt (Layer 1):
+----------------------------------+
| You are a coding agent. |
| Skills available: |
| - git: Git workflow helpers | ~100 tokens/skill
| - test: Testing best practices |
+----------------------------------+
When model calls load_skill("git"):
+----------------------------------+
| tool_result (Layer 2): |
| <skill name="git"> |
| Full git workflow instructions...| ~2000 tokens
| Step 1: ... |
| </skill> |
+----------------------------------+Skills: The Experience Library for Large Models
Skill = Reusable Work Methods
Skills solidify expert experience, ensuring every execution follows best practices—transforming human experiential tasks into AI-reusable modules.
Skills serve as the project's streamlined extension mechanism. Instead of stuffing all possible functionality into the core codebase, new capabilities are defined as SKILL.md files—Markdown instruction documents teaching intelligent agents how to use specific tools or execute specific workflows.
Skill's Essence: Prompt + Tools
- Prompt: Tells LLM "Who I am," "What I can do," "When to use me"
- Tools: Concrete function implementations (typically CLI commands or API calls)
Tool vs. Skill Distinction:
| Aspect | Tools | Skills |
|---|---|---|
| Position | Atomic capabilities | Task modules |
| Composition | Function definitions only (JSON Schema) | Prompt + Code + Process definitions |
| Focus | "I can call this API" | "I know how to complete this work" |
| Context | Stateless, typically one-time calls | Stateful, multi-step reasoning through Prompt guidance |
| Reusability | Code-level reuse | Business logic-level reuse |
Core Principles of Skills
Skill's core lies in "modular packaging" and "progressive disclosure." Its emergence represents essentially a great innovation in engineering approaches. Its core goal is more efficiently releasing and harnessing the large model's existing potential.
Skills don't exist because large models suddenly became "smarter"—they package all elements required for completing specific professional tasks (knowledge, processes, tools) into independent modules, helping large models shift from aimless "flowing" to goal-oriented, step-clear "professional execution."
Key Innovations:
- Operating Mechanism: The agent initially loads only lightweight skill index directories. After recognizing specific task requirements, it dynamically loads corresponding skill detailed instructions and resources. This "on-demand loading" mode achieves extreme utilization of context windows.
- Accumulated Capabilities: Skills transform complex tasks into callable, reusable, composable "skill modules." Agent implementation is shifting from single Q&A toward accumulative reusable capabilities.
- Overcoming Professional Deficiency: It makes implicit domain expert knowledge (such as financial analysis SOP, code review checklists) explicit and structured, becoming concrete instructions the model can understand and execute. This gives general large models deep professional capabilities. It's not adding new "capabilities" to AI (AI itself is already very capable) but adding domain knowledge and operational procedures—telling it: in this specific scenario, how should it act, what tools to use, what constraints to observe.
- Resolving Context Explosion: Compared to traditional approaches of stuffing all knowledge into prompts or solidifying in lengthy workflows, Skill's "on-demand loading" mechanism can reduce context consumption by 40%-60% in complex tasks.
Skills and Architecture Selection
Critical Question:
When Single Agent encounters knowledge bottlenecks, besides splitting into Multi-Agent, is there a lighter-weight path?
Skill Mechanism's Answer:
Return to Single Agent architecture ontology, but endow it with strong dynamic expansion capabilities. This mode enables single Agents to possess "localized specialization" capabilities: macroscopically maintaining unified memory and state, microscopically flexibly mastering thousands of vertical domain professional knowledge like calling tools.
Architecture Selection Principle:
If Single Agent can solve it, never use complex architectures. When encountering knowledge bottlenecks, prioritize introducing Agent Skills mechanisms, expanding capability boundaries through dynamic progressive Skill loading.
Why Skills Maintain Context Consistency
Prompt Engineering Perspective:
| Layer | Characteristics | Function |
|---|---|---|
| System Prompt | Constant, unchanging | Core system instructions (persona identity, basic requirements) remain stable, ensuring model cognitive unity |
| User Prompt | Dynamically injected | Skills content progressively disclosed as "user input" or "tool return results" |
What This Means for the Model:
Like users continuously providing new reference materials during conversation, not forcibly changing its "persona."
Analogy:
- Multi-Agent: Making the model "split personality" into multiple personas taking turns appearing
- Skills: Letting the same expert consult different professional manuals when needed—persona unchanged, knowledge dynamically expanded
Triple Benefits of Agent Skills Architecture
1. Low-Cost Knowledge Injection
Truly achieves "manual-izing" massive domain knowledge—models read on-demand, no need for full pre-loading. This is lighter than Multi-Agent and more precise than RAG (RAG is passive reception after retrieval; Skills are active invocation with clear purpose).
2. Global Context Consistency
Since always executed by the same main Agent (similar to Orchestrator in Multi-Agent, but without routing layer), it completely knows:
- Executed steps
- Loaded Agent Skills
- Current task status
This thoroughly eliminates information fragmentation and redundant work problems in Multi-Agent—no routing error risks, no context blind spots between sub-Agents.
3. Avoiding Context Explosion
Through "read a bit, do a bit, read a bit more" streaming processing, effectively controls instantaneous context length. Compared to Multi-Agent where each sub-node must maintain independent System Prompt redundancy, Skills only load when needed, releasing after use.
Essential Insight
Skill uses file system's structured capabilities to replace complex network communication protocols, using progressive information disclosure to replace violent full injection. This isn't simple "functional replacement" but architectural philosophy transformation: from "trading complexity for capability" toward "trading orchestration precision for efficiency."
Skill Structure
Skill isn't "a plugin list in a unified directory" but a layered discovery, workspace and sharing scope organized workflow system.
Three-Layer Structure (Three-Level Loading Mechanism):
Layer 1: Metadata (Always Loaded)
- Skill description, functionality, and interface definitions
- This information is always loaded in context, approximately 100 tokens
- AI uses this to judge whether to trigger Skill
Layer 2: Instructions (Loaded on Trigger)
- Structured professional operation steps and judgment logic
- Loaded only after Skill is triggered
Layer 3: Resources (Loaded on Demand)
- Data, templates, or tool links required for execution
- Loaded only when AI judges necessary, no size limit
- Scripts can be executed directly
references/for detailed explanationsscripts/for truly reusable scripts
Therefore, skill isn't "just one markdown file" but a group of resources organized around certain tasks. Or: Capability equals Files. Reuse + Resource Requirements = Skill.
Typical Skill Directory Structure
some-repository/
├── AGENTS.md # Permanent repository guidelines
└── .agents/
└── skills/
├── rot13-encryption/ # AgentSkills standard (progressive disclosure)
│ ├── SKILL.md
│ ├── scripts/
│ │ └── rot13.sh
│ └── references/
│ └── README.md
├── another-agentskill/ # AgentSkills standard
│ ├── SKILL.md
│ └── scripts/
│ └── placeholder.sh
└── legacy_trigger_this.md # Legacy/OpenHands formatThrough this three-layer structure (Metadata → Instructions → Code and Resources), a skill can not only be correctly recognized and triggered by Claude but can truly complete the full closed loop from "understanding requirements" to "executing tasks."
Skill vs Workflow vs Prompt
Skill vs Workflow
Workflow:
- Pre-orchestrated, linear, fixed-path execution flowchart
- Ensures complex tasks are "stably done correctly"
Skill:
- Modular, composable, on-demand callable professional capability packages
- Aims to enable intelligent agents to "professionally do correctly"
Skill opens a new stage of intelligent agent specialization and scaled application through knowledge packaging and flexible scheduling that the former finds difficult to implement. Skill's represented "capability modularization" and "knowledge engineering" direction is undoubtedly the key path for building next-generation professional, high-reliability intelligent agents.
Skill vs Prompt
| Dimension | Prompt | Skill |
|---|---|---|
| Trigger | Manual pasting | Automatic matching |
| Capability | Text only | Documents + Scripts + Templates |
| Reusability | Low | High |
| Context | Full load every time | Progressive loading |
Application Scenarios
Three Problem Types Suitable for Skills
- Domain Knowledge AI Lacks
- Repetitive Workflows
- Operations Requiring Deterministic Execution (compared to direct prompts, Skills excel in reusability and consistency)
Situations Not Suitable for Skills
- One-Time Tasks: You only use this workflow once; writing Skill takes longer than doing it directly
- Highly Personalized: Each requirement is completely different, no patterns to solidify
- AI Already Proficient: General coding, writing, translation, etc.—AI already has strong default behaviors; adding Skills may be superfluous
Future Outlook
In the future, many software's core assets may be: a set of high-quality, callable, auditable, reusable Agent Skills.
Agent competition will likely gradually shift from "whose model scores higher" to:
- Who is better at solidifying capabilities into skills
- Who is better at organizing skills into workflows
- Who is better at making these processes run stably in real business
SkillsLoader Component
SkillsLoader is the core component responsible for intelligent agent Skills management in the Nanobot framework.
Core Responsibilities:
- Load and manage intelligent agent skill files (with SKILL.md as carrier)
- Validate skill files
- Provide available skill retrieval, loading, and validation capabilities for agents
- Serves as the foundational support module for Agent tool usage/task execution capabilities
Component Relationships
- SkillsLoader: Scans plugins
- ContextBuilder: Inserts plugin documents into context
- LLM: Performs tasks according to plugin documents
Industry Paradigm
Referencing OpenHands documentation, the Skill system has four primary responsibilities:
- Context Injection: Add specialized prompts to agent context based on triggers
- Trigger Evaluation: Determine when skills should activate (always, keyword, task)
- MCP Integration: Load MCP tools associated with repository skills
- Third-Party Support: Parse .cursorrules, agents.md, and other skill formats
General Workflow
LLM Using Skills = Reading Documentation + Calling Tools
Humans are responsible for clarifying goals and constraints, Skills encapsulate implementation standards (code/processes), Agents are responsible for understanding intents and scheduling execution.
LLM doesn't need to understand specifically which tools exist and how to use them—it only needs to check tool manuals when needing to use certain tools, then take them out and use them.
Agent's Behavior is Progressive Disclosure:
- Initialization: Only view Skill's name and description, glance at what this tool can do
- Agent views skill summary, determines skill matches the task
- Discovers need → calls read_file to read SKILL.md
- Executes tasks according to steps defined in the document
Specific Workflow
- Initialization Phase: Users define instructions, scripts, and resources using text, packaging into Skills (including SKILL.md and optional scripts, reference materials, etc.). LLM reads all Skills' metadata (names and descriptions) at startup. These metadata are loaded into the model's context (system prompts). Agent "knows" which Skills it possesses at every conversation start but doesn't know specific content. Used for subsequent intent matching and skill trigger judgment.
Discovery Phase: When users initiate requests, LLM compares request content against loaded Skills metadata, judging whether a certain Skill is needed. This judgment process is essentially LLM making decisions based on context, same as determining whether to call tools in Function Calling.
Two Key Points:
- Agent only seeks Skills when feeling unable to handle tasks. Simple one-step operations (like "read this file"), even if Description perfectly matches, may not trigger because Agent judges it can complete directly
- Agent naturally tends toward under-triggering. Description should be written proactively, pushing boundaries outward
- Loading Phase (Function Calling): If LLM judges a certain Skill is needed (when user requests match a Skill's description), it calls a dedicated loading tool through Function Calling mechanism (like load_skill(skill_name)), reading and loading the corresponding SKILL.md document content into the current context. This step completely relies on Function Calling capabilities.
- Execution Phase (Continuing to Use Function Calling): After SKILL.md content (including instructions, processes, examples, etc.) is added to context, LLM executes tasks according to instructions defined in the document. If SKILL.md defines scripts needing execution (like scripts/rotate_pdf.py), LLM still calls functions for executing scripts through Function Calling. If reference materials need loading, similarly calls file reading functions through Function Calling.
Key Observation:
The entire Skills operation process, from loading documents, executing scripts to reading resources, every step cannot leave Function Calling. Skills didn't create new capabilities—it merely organized the Function Calling foundational capability into a more usable form: letting users define workflows using text, letting LLM automatically discover and load relevant knowledge.
This also means Skill activation itself consumes 1-2 tool calls. Therefore, how accurately description is written directly impacts Token consumption and response speed. Mis-triggering means waste, missed triggering means capability loss.
Nanobot Skill Management System Core Features
- Dual-Source Priority Loading: Supports "workspace skills" and "builtin skills" dual sources, workspace skills have higher priority (avoiding builtin skills being overridden)
- Requirement Validation Mechanism: Automatically validates skill dependencies (CLI tools, environment variables), can filter out unavailable skills with unmet dependencies
- Structured Skill Management: Provides skill list queries, single skill loading, skill summary construction (XML format), resident skill filtering, and other capabilities, adapting to Agent context loading, progressive skill usage requirements
Conclusion
Skills represent a fundamental architectural innovation in AI agent design. By transforming tacit knowledge into explicit, reusable modules with progressive disclosure mechanisms, Skills enable:
- Scalable Expertise: Domain knowledge can be accumulated and shared across agents
- Context Efficiency: On-demand loading dramatically reduces context window consumption
- Maintainability: Skills as files enable version control and collaborative improvement
- Professional Reliability: Standardized procedures ensure consistent, high-quality outputs
The future of AI agents lies not just in more powerful models but in better-organized knowledge and capabilities. Skills provide the architectural foundation for this evolution.
As the ecosystem matures, we can expect:
- Richer skill libraries covering more domains
- Better tooling for skill creation and validation
- Standardized skill formats enabling interoperability
- Community-driven skill marketplaces
The journey from prompts to skills represents AI's evolution from conversational tools to professional partners capable of complex, reliable task execution.