Learning OpenClaw Architecture Through Nanobot Source Code: Part 6 - Skills
Overview
OpenClaw contains approximately 400,000 lines of code, making reading and comprehension quite challenging. Therefore, this series learns OpenClaw's features through Nanobot.
Nanobot is an ultra-lightweight personal AI assistant framework open-sourced by the HKU Data Science Laboratory (HKUDS), positioned as "Ultra-Lightweight OpenClaw". It's very suitable for learning Agent architecture.
Skill (技能) packages specific domain expertise, workflow processes, and best practices into reusable instruction modules embedded into AI context, enabling AI to automatically activate and utilize these professional capabilities when handling related tasks.
Simply put, Skill is a "skill manual" teaching AI how to work. By encapsulating domain knowledge, standard processes, and execution tools into pluggable modules, Skill enables intelligent agents to achieve professional capability accumulation, sharing, and continuous optimization for the first time, truly advancing from "automated execution" to "professional thinking and action".
The essence of Skill is transforming "tacit knowledge" into "explicit regulations".
Principles
Function Calling
Function Calling's essence: For LLM to achieve tool invocation, actually the only required content includes: function name, parameter requirements, and a description. However, under the Function Calling framework, users face several problems:
- How to integrate into existing systems
- How to accurately guide models to execute tasks according to specific processes and rules using text (not code)?
- Writing all knowledge into prompts causes context explosion
Solutions
The solutions to the above problems are MCP and Skills.
Both MCP and Skills are based on Function Calling—they're merely different application methods on top of this foundational capability.
- MCP: Enables AI to "be capable of doing", executing tools, reading/writing data
- Skill: Teaches AI "how to do", serving as AI's operation manual (telling the model step-by-step how to do), including processes, standards, strategies, scripts, code. Skills aren't "trained into the model" but written into System Prompt, automatically read when building context—LLM learns to use through reading comprehension.
How to Integrate into Existing Systems?
MCP is a protocol standardizing interaction methods between large models and external capabilities. If Tools solve "how models call a function", then MCP solves "how models interact with long-existing, reusable capability services".
MCP's core addresses docking issues with existing systems. MCP's value lies in providing a standardized docking protocol, enabling different tools and data sources to be used by LLM in a unified manner. Essentially, MCP leans more toward being a set of docking standards (just adding a layer of JSON-RPC protocol conversion on top of Function Calling), not the only docking method.
Or rather, MCP resembles API—Agents only care about what "parameters" the API submits and what "results" it receives.
How to Guide Models to Execute Tasks According to Specific Processes and Rules?
Skills provide a method for users to define instructions, scripts, and resources using text, forming reusable task workflows.
Skills are actually a sub-agent wrapper:
- Skills are merely a clever application on top of Function Calling: encapsulating the "load document" operation into a function, enabling models to automatically search and load fixed skills documents when needed
- Skills enable users to write processes using text, replacing the logic of connecting various API calls written in code within functions called in MCP. This approach enhances process environmental adaptability—because models can dynamically adjust strategies according to actual situations, handle uncertainty, and understand natural language intentions. Skills support more complex process definitions—by informing LLM how to combine multiple interface calls through SKILL.md documents, success rates for long-process tasks become higher.
How to Solve Context Explosion?
A two-layer loading paradigm can be used.
Layer 1: System prompt contains skill names (low cost)
Layer 2: Full content placed in tool_result on-demand
System prompt (Layer 1 -- always present):
+--------------------------------------+
| You are a coding agent. |
| Skills available: |
| - git: Git workflow helpers | ~100 tokens/skill
| - test: Testing best practices |
+--------------------------------------+
When model calls load_skill("git"):
+--------------------------------------+
| tool_result (Layer 2 -- on demand): |
| <skill name="git"> |
| Full git workflow instructions... | ~2000 tokens
| Step 1: ... |
| </skill> |
+--------------------------------------+Skills: Large Model's Experience Library
Skill = Reusable Work Methods—solidifying expert experience, ensuring every execution follows best practices. That is, transforming human experiential tasks into AI-reusable modules.
Skills serve as the project's streamlined extension mechanism. Instead of stuffing all possible functionalities into the core codebase, new capabilities are defined as SKILL.md files—instruction documents written in Markdown, teaching intelligent agents how to use specific tools or execute specific workflows.
Skill's essence is Prompt + Tools.
- Prompt: Tells LLM "Who I am", "What I can do", "When to use me"
- Tools: Concrete function implementations (typically CLI commands or API calls)
Tool is capability, skill is method:
- Tools are low-level interfaces (APIs) for capabilities. Tools resemble "hands and feet"—operational capabilities agents can directly invoke. Tools solve "whether it can be done".
- Skills are high-level templates (SOP + Tools) for tasks. Skills resemble "operation manuals" or "standard procedures". Therefore, skills solve: how it should be done, what steps to follow, when this process suite is most suitable.
| Characteristic | Function Calling | Skills |
|---|---|---|
| Positioning | Atomic capabilities | Task modules |
| Composition | Only contains function definitions (JSON Schema) | Contains Prompt + Code + Process definitions |
| Focus | I can call this API | I know how to complete this work |
| Context | Stateless, typically one-time call | Stateful, multi-step reasoning guided through Prompt |
| Reusability | Code-level reusability | Business logic-level reusability |
Core Principles of Skills
Skill's core lies in "modular encapsulation" and "progressive disclosure". Its emergence is essentially a great innovation in engineering approaches, with the core goal of more efficiently releasing and controlling large models' existing potential. It doesn't exist because large models suddenly became "smarter", but because it packages all elements required to complete specific professional tasks (knowledge, processes, tools) into an independent module, assisting large models in transitioning from aimless "flowing" to goal-oriented, step-clear "professional execution".
Skill's key innovations include:
Operating Mechanism: Agents initially only preload lightweight skill index directories. After identifying specific task requirements, detailed instructions and resources for corresponding skills are dynamically loaded. This "on-demand loading" mode achieves extreme utilization of context windows.
Accumulated Capabilities: Skills transform complex tasks into callable, reusable, composable "skill modules". Agent implementation is shifting from single Q&A toward accumulative reusable capabilities.
Overcoming Professional Deficiency: It makes implicit domain expert knowledge (such as financial analysis SOP, code review checklists) explicit and structured, becoming concrete instructions models can understand and execute, enabling general large models to acquire deep professional capabilities. It's not adding new "capabilities" to AI (AI itself is already very powerful), but adding domain knowledge and operational regulations to AI—telling it: in this specific scenario, how should it be done, what tools to use, what constraints to observe.
Resolving Context Explosion: Compared to traditional methods of stuffing all knowledge into prompts or solidifying in lengthy workflows, Skill's "on-demand loading" mechanism reduces context consumption by 40%-60% in complex tasks.
Skills and Architecture Selection
Let's consider: When Single Agent encounters knowledge bottlenecks, besides splitting into Multi-Agent, is there a lighter path?
The Skill mechanism's answer is—return to Single Agent architecture ontology, but empower it with extremely strong dynamic expansion capabilities. This mode enables single Agents to possess "localized specialization" capabilities: macroscopically maintaining unified memory and state, microscopically flexibly mastering thousands of vertical domain professional knowledge like calling tools.
Therefore, architecture selection principles can be clearly stated:
If Single Agent can solve it, never use complex architectures. When encountering knowledge bottlenecks, prioritize introducing Agent Skills mechanisms, expanding capability boundaries through dynamic progressive loading of Skills.
Why Can Skills Maintain Context Consistency?
Let's decompose its mechanism from Prompt engineering perspective:
| Layer | Characteristics | Function |
|---|---|---|
| System Prompt | Constant and unchanging | Core system instructions (persona identity, basic requirements) remain stable, ensuring model cognitive unity |
| User Prompt | Dynamically injected | Skills content progressively disclosed in the form of "user input" or "tool return results" |
What does this mean for the model? It's like users continuously providing new reference materials during conversation, not forcibly changing its "persona". To use an analogy: Multi-Agent makes the model "split into multiple personalities" taking turns on stage; while Skills let the same expert consult different professional manuals when needed—personality remains unchanged, knowledge dynamically expands.
Triple Benefits of Agent Skills Architecture
1. Low-Cost Knowledge Injection
Truly achieves "instructionalization" of massive domain knowledge—models read on-demand, no need for full preloading. This is lighter than Multi-Agent and more precise than RAG (RAG is passive reception after retrieval, Skills are active invocation with clear purposes).
2. Global Context Consistency
Since always executed by the same main Agent (similar to Orchestrator in Multi-Agent, but without routing layer), it completely knows:
- Executed steps
- Read Agent Skills
- Current task status
This thoroughly eliminates information fragmentation and repeated labor problems in Multi-Agent—no routing error risks, no context blind spots between sub-Agents.
3. Avoiding Context Explosion
Through "read a bit, do a bit, read a bit more" streaming processing, instantaneous context length is effectively controlled. Compared to Multi-Agent where each sub-node must maintain independent System Prompt redundancy, Skills only load when needed, releasing after use.
Essential Insights
Skill uses file system's structured capabilities to replace complex network communication protocols, using progressive information disclosure to replace violent full injection. This isn't simple "functional replacement" but architectural philosophy transformation: from "exchanging complexity for capability" toward "exchanging orchestration precision for efficiency".
Skill Structure
Skill isn't "a plugin list in a unified directory" but a layered discovery, workflow system organized by workspace and sharing scope. Skills typically contain three-layer structure (three-level loading mechanism):
Metadata (always loaded):
- Skill description, functionality, and interface definitions
- This information is always loaded in context, approximately 100 tokens
- AI uses this to judge whether to trigger Skill
Instructions (loaded when triggered):
- Structured professional operation steps and judgment logic
- Loaded only after Skill is triggered
Resources (loaded on-demand):
- Data, templates, or tool links required for execution
- Loaded only when AI judges necessary, no size limit, because scripts can execute directly. For example, references/ stores detailed explanations, scripts/ stores truly reusable scripts
Therefore, skill isn't "just one markdown file" but a set of resources organized around certain tasks. Or rather, capability equals files (Capability as Files). Reusability + resource requirements = Skill.
The following displays a typical Skill's directory structure. Through this three-layer structure of Metadata → Instructions → Code and Resources (Skills are essentially just a file structure convention), a skill can not only be correctly recognized and triggered by Claude but also truly complete the full closed loop from "understanding requirements" to "executing tasks".
some-repository/
├── AGENTS.md # Permanent repository guidelines (recommended)
└── .agents/
└── skills/
├── rot13-encryption/ # AgentSkills standard (progressive disclosure)
│ ├── SKILL.md
│ ├── scripts/
│ │ └── rot13.sh
│ └── references/
│ └── README.md
├── another-agentskill/ # AgentSkills standard (progressive disclosure)
│ ├── SKILL.md
│ └── scripts/
│ └── placeholder.sh
└── legacy_trigger_this.md # Legacy/OpenHands format (keyword-triggered)Skill vs Workflow vs Prompt
Compared to Workflow, Skill represents a fundamental capability upgrade. The core differences lie in:
- Workflow is pre-orchestrated, linear, fixed-path execution flowcharts, ensuring complex tasks are "stably done correctly"
- Skill is modular, composable, on-demand callable professional capability packages, aiming to enable intelligent agents to "professionally do correctly"
Skill opens a new stage of intelligent agent specialization and scaled application with knowledge encapsulation and flexible scheduling that the former finds difficult to achieve. The "capability modularization" and "knowledge engineering" directions represented by Skill are undoubtedly key paths for building next-generation professional, high-reliability intelligent agents.
Skill compared to Prompt:
| Dimension | Prompt | Skill |
|---|---|---|
| Trigger | Manual pasting | Automatic matching |
| Capability | Text only | Documents + Scripts + Templates |
| Reusability | Low | High |
| Context | Full load every time | Progressive loading |
Scenarios
Three Types of Problems Suitable for Skill:
- Domain knowledge AI doesn't possess
- Repetitive workflows
- Operations requiring deterministic execution (compared to direct prompts, Skill excels in reusability and consistency)
Situations Not Suitable for Skill:
- One-time tasks: You only use this workflow once; writing Skill takes longer than doing it directly
- Highly personalized: Each requirement is completely different, no patterns can be solidified
- AI is already proficient: General coding, writing, translation, etc.—AI itself has strong default behaviors; adding Skill may be superfluous
In the future, many software's core assets may be: a set of high-quality, callable, auditable, reusable Agent Skills. Agent competition will likely gradually shift from "whose model scores higher" to: who is better at solidifying capabilities into skills, who is better at organizing skills into workflows, who is better at making these processes run stably in real business.
SkillsLoader
SkillsLoader is the core component responsible for intelligent agent Skills management in the Nanobot framework. Its core responsibilities include loading, managing, and validating intelligent agent skill files (with SKILL.md as the carrier), providing intelligent agents with available skill retrieval, loading, and validation capabilities. It's the foundational support module for Agents possessing tool usage/task execution capabilities.
- SkillsLoader: Scans plugins
- ContextBuilder: Stuff plugin documents into context
- LLM: Performs tasks according to plugin documents
Industry Paradigms
Let's reference OpenHands documentation to examine Skills management system paradigms.
The Skill system has four primary responsibilities:
- Context Injection - Add specialized prompts to agent context based on triggers
- Trigger Evaluation - Determine when skills should activate (always, keyword, task)
- MCP Integration - Load MCP tools associated with repository skills
- Third-Party Support - Parse .cursorrules, agents.md, and other skill formats
General Workflow
Let's examine Skill's general workflow.
LLM using Skills method = Reading documents + Calling tools. That is, humans are responsible for clarifying goals and constraints, Skill encapsulates implementation standards (code/processes), Agent is responsible for understanding intentions and scheduling execution. LLM doesn't need to understand specifically which tools exist and how to use them—only needs to check tool manuals when needing to use certain tools, then take out and use the tools.
Agent's behavior is Progressive Disclosure:
- Initialization: Only look at Skill's name and description, glance at what this tool can do
- Agent views skill summaries, determining skill matches tasks
- Discovers need to use → calls read_file to read SKILL.md
- Executes according to steps in document
The specific workflow is as follows:
Initialization Phase: Users define instructions, scripts, and resources using text, packaging into Skills (containing SKILL.md and optional scripts, reference materials, etc.). LLM reads all Skills' metadata (names and descriptions) at startup. These metadata are loaded into the model's context (system prompts). Agents "know" which Skills they possess at every conversation start but don't know specific content. Used for subsequent intent matching and skill trigger judgment.
Discovery Phase: When users initiate requests, LLM compares request content against loaded Skills metadata, judging whether certain Skills are needed. This judgment process is essentially LLM making decisions based on context, same as judging whether to call tools in Function Calling. Two key points require attention:
- Agents only seek Skills when feeling unable to handle tasks. Simple one-step operations (like "read this file"), even if Description matches perfectly, may not trigger because Agents judge they can complete directly
- Agents naturally tend toward under-triggering. Descriptions should be written proactively, pushing boundaries outward
Loading Phase (Function Calling): If LLM judges certain Skills are needed (when user requests match a Skill's description), it invokes a dedicated loading tool through Function Calling mechanism (similar to load_skill(skill_name)), reading and loading corresponding SKILL.md document content into current context. This step completely relies on Function Calling capabilities.
Execution Phase (Continuing to Use Function Calling): After SKILL.md content (containing instructions, processes, examples, etc.) is added to context, LLM executes tasks according to instructions defined in documents. If SKILL.md defines scripts needing execution (such as scripts/rotate_pdf.py), LLM still calls functions executing scripts through Function Calling. If reference materials need loading, similarly calls file reading functions through Function Calling.
It can be seen that the entire Skills operation process, from loading documents, executing scripts to reading resources, every step cannot leave Function Calling. Skills didn't create new capabilities—it merely organized Function Calling this foundational capability into a more usable form: enabling users to define processes using text, letting LLM automatically discover and load relevant knowledge. This also means Skill activation itself consumes 1-2 tool invocation steps. Therefore, whether description is written accurately directly affects Token consumption and response speed. Mis-triggering means waste, missed triggering means capability deficiency.
Nanobot Skill Management System Core Features
Nanobot Skill Management System's core features include:
- Dual-Source Priority Loading: Supports "workspace skills" and "builtin skills" dual sources, workspace skills have higher priority (avoiding builtin skills being overwritten)
- Requirement Validation Mechanism: Automatically validates skill dependencies (CLI tools, environment variables), can filter out unavailable skills with unmet dependencies
- Structured Skill Management: Provides skill list queries, single skill loading, skill summary construction (XML format), resident skill filtering, and other capabilities, adapting to Agent context loading, progressive skill usage requirements
Overall Process
SkillsLoader-related overall process:
User Input: "Check GitHub stars every hour"
↓
LLM Identifies Context
↓
SkillsLoader.get_always_skills() → Get activated skills
↓
SkillsLoader.load_skills_for_context() → Load cron skill context
↓
SkillsLoader.build_system_prompt() → Build system prompt
↓
SkillsLoader._run_agent_loop() → Run agent loop
↓
LLMProvider.chat() → LLM processes request
↓
LLM identifies as cron tool invocation
↓
ToolRegistry.execute() → Execute tool
↓
CronTool.execute() → Cron tool execution
↓
CronService.add_job() → Add scheduled task
↓
Task stored to cron.json fileSkills.md
Skills essence = Pluggable plugin documents.
Taking nanobot\skills\memory\SKILL.md as an example.
The three short horizontal lines at the top represent Skill's "ID card":
- name is its unique identifier, just use a simple, memorable English name
- description determines when this Skill triggers, describing what this Skills does, what user requests should trigger it, reminding readers: the more specific the description, the easier correct scenario triggering becomes
Description contains three layers of information: what this Skill does (this Skill's core value), what core capabilities it includes (specifically what capabilities), when to trigger (what users say or do should trigger).
- description always exists in Agent's system prompts, functioning like an index. When user input arrives, Agent matches requests against all Skills' descriptions, loading corresponding Skill's full content only upon match. This design means you can mount dozens of Skills simultaneously, while activation judgment cost is merely several dozen lines of short text comparison, no need to stuff all Skills' complete instructions into context.
Below is Skill's main text section.
Conclusion
Skills represent a paradigm shift in how AI agents acquire and utilize specialized knowledge. Rather than relying solely on model training or massive prompt contexts, Skills enable modular, reusable capability packages that can be dynamically loaded based on task requirements.
The architecture choices between Single Agent with Skills versus Multi-Agent systems depend on specific use cases. For most applications, Single Agent with Skills provides better context consistency, lower complexity, and more efficient resource utilization. Multi-Agent approaches should be reserved for truly independent task domains requiring separate state management.
As the AI agent ecosystem matures, the quality and organization of Skills will become key differentiators. Teams that invest in well-structured, thoroughly documented, and extensively tested Skills will achieve more reliable and professional agent behaviors in production environments.