The Evolution of Learning in the AI Era

The landscape of technical learning is undergoing a profound transformation. Traditional methods—reading books, watching video tutorials, and attending courses—remain valuable, but a new paradigm has emerged as increasingly dominant: project-based learning through code imitation.

This approach involves deeply studying and replicating excellent open-source projects—analyzing their code structure, understanding architectural decisions, and internalizing design patterns through hands-on modification and experimentation. By directly running and modifying high-quality open-source code, developers gain the fastest possible understanding of real-world engineering practices.

However, this powerful learning method introduces significant new challenges that traditional approaches never faced.


The Knowledge Fragmentation Problem

Scattered Learning Materials

Modern developers accumulate knowledge across multiple disconnected platforms and formats:

  • Notes stored in Obsidian, Notion, or Evernote
  • Code repositories scattered across various directories and GitHub organizations
  • AI assistant conversations trapped in isolated chat histories
  • Documentation downloaded as PDFs or bookmarked online
  • Configuration files buried in project directories

Each time you need AI assistance analyzing a specific project, you face the tedious process of manually copying code snippets, gathering context, and reconstructing the background information. This fragmentation creates friction that slows learning and reduces productivity.

Context Discontinuity

AI assistants fundamentally lack persistent memory between conversations. Each new chat session begins as a blank slate, requiring you to:

  1. Re-explain your project's purpose and architecture
  2. Re-share relevant code files and configurations
  3. Re-establish the current problem state and goals
  4. Rebuild the conversational context from scratch

This repetition becomes exponentially worse when working across multiple projects. Knowledge gained while analyzing Project A provides no benefit when you switch to Project B—the AI has no way to connect insights across your learning journey.

The Core Issue: Data Silos

These challenges share a common root cause: data silos. Your learning resources exist in isolated islands with no unified access mechanism. AI assistants cannot natively access your local files, understand your organizational structure, or connect insights across different knowledge domains.

The solution requires a fundamental architectural shift: a unified storage abstraction layer that enables AI assistants to understand and access all learning resources seamlessly.


Introducing the Vault System Architecture

To address these pain points comprehensively, we developed the Vault system—a unified knowledge storage abstraction layer designed specifically for AI-assisted learning and development. This architectural decision transforms how AI assistants interact with local knowledge resources, creating capabilities far beyond what manual context provision could achieve.

What Is a Vault?

A vault is a registered knowledge repository that the AI system can access, understand, and utilize autonomously. Think of it as giving your AI assistant a key to a organized library of your learning materials—once registered, the AI can navigate, reference, and (when permitted) modify contents without requiring manual intervention for each interaction.

The HagiCode Implementation

This vault system architecture was developed and refined through the HagiCode project—an AI code assistant built on OpenSpec workflows. HagiCode's core philosophy extends beyond conversational AI: it enables AI to actually do work—operating on code repositories, executing commands, running tests, and managing development workflows.

GitHub: github.com/HagiCode-org/site

During HagiCode's development, we recognized that AI assistants needed frequent access to users' diverse learning resources: code repositories, note documents, configuration files, and more. Requiring manual provision for each access created an unacceptable user experience. This insight drove the vault system's design.


Vault Types: Matching Structure to Use Case

The vault system supports four distinct types, each optimized for specific learning and development scenarios:

TypePurposeTypical Use Cases
folderGeneric folder storageTemporary learning materials, drafts, unorganized resources
coderefCode reference projectsSystematic learning of open-source projects
obsidianObsidian笔记 integrationReusing existing note libraries
system-managedSystem-controlled storageProject configurations, prompt templates, system resources

The CodeRef Vault: Purpose-Built for Project Learning

The coderef type deserves special attention—it's the most commonly used vault type in HagiCode and represents the system's core value proposition for project-based learning.

Why a Dedicated Type for Code Projects?

Learning from open-source projects isn't simply "downloading code." It requires managing multiple interconnected elements:

  • Source code from the original repository
  • Personal notes documenting understanding and insights
  • Configuration files for local development setup
  • Modifications and experiments you've implemented
  • Metadata describing the project's purpose and status

The coderef vault type standardizes this complexity into a coherent structure that both humans and AI can navigate effectively.


Core Design Principles

Persistent Storage Mechanism

The vault registry persists to the filesystem in JSON format:

_registryFilePath = Path.Combine(
    absoluteDataDir, 
    "personal-data", 
    "vaults", 
    "registry.json"
);

This seemingly simple design choice reflects careful consideration of multiple factors:

Simplicity and Reliability

  • JSON format is human-readable, facilitating debugging and manual modification
  • When systems encounter issues, developers can directly inspect and even repair the registry file
  • No database dependencies or connection requirements
  • Particularly valuable during development and troubleshooting

Reduced Dependencies

  • Filesystem storage eliminates database installation and configuration complexity
  • No additional services to maintain or monitor
  • Lower system complexity reduces potential failure points
  • Simplifies deployment across different environments

Concurrency Safety

  • SemaphoreSlim ensures thread-safe access to the registry
  • AI code assistants may have multiple concurrent operations accessing vault information
  • Proper concurrency control prevents race conditions and data corruption

AI Context Integration

The system's true power emerges through automatic vault information injection into AI proposal contexts:

export function buildTargetVaultsText(
    vaults: VaultForText[],
    template: VaultPromptTemplate = DEFAULT_VAULT_PROMPT_TEMPLATE,
): string {
    const readOnlyVaults = vaults.filter(
        (vault) => vault.accessType === 'read'
    );
    const editableVaults = vaults.filter(
        (vault) => vault.accessType === 'write'
    );

    const sections = [
        buildVaultSection(readOnlyVaults, template.reference),
        buildVaultSection(editableVaults, template.editable),
    ].filter(Boolean);

    return `\n\n### ${template.heading}\n\n${sections.join('\n')}`;
}

This automatic context injection transforms the user experience dramatically. Instead of manually providing background for each interaction, you simply tell the AI: "Help me analyze React's concurrent rendering." The AI automatically locates your previously registered React learning vault and accesses relevant code and notes—no repeated context sharing required.

Access Control Mechanism

The system distinguishes between two access types for each vault:

Reference (Read-Only)

  • AI can read and analyze content
  • AI cannot modify files
  • Ideal for learning materials, open-source code, documentation

Editable (Write Access)

  • AI can read and modify content
  • AI can create new files and update existing ones
  • Suitable for personal projects, working directories, draft areas

This distinction provides crucial safety guarantees. When you register an open-source project vault for learning, you mark it as reference—preventing accidental modifications to the code you're studying. Conversely, your personal project vaults marked as editable enable the AI to actively assist with implementation.


Standardized CodeRef Vault Structure

For coderef type vaults, the system provides a standardized directory structure that optimizes both human and AI navigation:

my-coderef-vault/
├── index.yaml          # Vault metadata description
├── AGENTS.md           # AI assistant operation guidelines
├── docs/               # Learning notes and documentation
└── repos/              # Cloned repositories via Git submodules

Design Philosophy Breakdown

docs/ Directory: Learning Notes

This directory stores Markdown-format notes documenting your understanding of the code, architectural analysis, encountered challenges, and solutions discovered. These notes serve dual purposes:

  1. Personal Reference: Your own documented learning journey
  2. AI Context: When analyzing related code, the AI automatically references these notes—like having an assistant who remembers your previous insights

repos/ Directory: Git Submodule Management

Rather than directly copying code, this directory manages cloned repositories through Git submodules. This approach offers significant advantages:

  • Upstream Synchronization: A simple git submodule update fetches the latest code from original repositories
  • Space Efficiency: Multiple vaults can reference the same repository at different versions without duplicating storage
  • Version Control: Track specific commits or branches for reproducibility
  • Clean Separation: Distinguish between original code and your notes/modifications

index.yaml: Vault Metadata

This file contains structured metadata describing the vault's purpose, contents, and relationships. Think of it as the vault's "self-introduction"—enabling the AI to quickly understand what the vault contains and how it should be used.

Example structure:

name: React Learning Vault
type: coderef
description: Systematic study of React's architecture and concurrent rendering
created: 2026-04-10
repositories:
  - name: react
    url: https://github.com/facebook/react.git
    branch: main
tags:
  - frontend
  - javascript
  - react
  - concurrent-rendering

AGENTS.md: AI Operation Guidelines

This file contains instructions specifically for AI assistants, explaining how to handle the vault's contents. You can specify:

  • Which aspects to focus on during analysis
  • Files or directories to avoid modifying
  • Preferred analysis approaches
  • Context about your learning goals

Example:

# AI Assistant Guidelines

## Focus Areas
- Performance optimization related code
- Fiber architecture implementation
- Scheduler module

## Restrictions
- Do not modify test files
- Do not change package.json dependencies
- Create new notes in docs/ directory

## Analysis Preferences
- Prioritize understanding rendering flow
- Document findings in docs/analysis-notes.md

Practical Implementation Guide

Creating a CodeRef Vault

Creating a vault programmatically demonstrates the system's automation capabilities:

const createCodeRefVault = async () => {
    const response = await VaultService.postApiVaults({
        requestBody: {
            name: "React Learning Vault",
            type: "coderef",
            physicalPath: "/Users/developer/vaults/react-learning",
            gitUrl: "https://github.com/facebook/react.git"
        }
    });

    // System automatically:
    // 1. Clones React repository to vault/repos/react
    // 2. Creates docs/ directory for notes
    // 3. Generates index.yaml metadata
    // 4. Creates AGENTS.md guidelines file

    return response;
};

This automation eliminates manual setup work, ensuring consistent structure across all vaults.

Referencing Vaults in AI Proposals

Once created, vaults integrate seamlessly into AI interaction proposals:

const proposal = composeProposalChiefComplaint({
    chiefComplaint: "Help me analyze React's concurrent rendering mechanism",
    repositories: [
        { 
            id: "react", 
            gitUrl: "https://github.com/facebook/react.git" 
        }
    ],
    vaults: [
        {
            id: "react-learning",
            name: "React Learning Vault",
            type: "coderef",
            physicalPath: "/vaults/react-learning",
            accessType: "read"  // AI can read but not modify
        }
    ],
    quickRequestText: "Focus on fiber architecture and scheduler implementation"
});

This structured approach ensures the AI has all necessary context to provide targeted, relevant assistance.


Real-World Usage Scenarios

Scenario 1: Systematic Open-Source Project Learning

Workflow:

  1. Create a CodeRef vault for your target project
  2. Manage the repository through Git submodules
  3. Document understanding and insights in docs/
  4. Request AI analysis with vault context automatically included

Benefits:

  • AI accesses both code and your previous notes simultaneously
  • Insights from earlier learning sessions inform current analysis
  • Knowledge accumulates systematically rather than fragmenting

Example: While studying React's Fiber architecture, you document key insights in docs/fiber-notes.md. Weeks later, when analyzing performance issues, the AI references these notes automatically, providing continuity across your learning journey.

Scenario 2: Obsidian Notes Integration

Workflow:

  1. Register existing Obsidian vault as an obsidian type vault
  2. AI gains direct access to your accumulated knowledge base
  3. No manual copying or pasting required

Benefits:

  • Leverages years of accumulated notes immediately
  • AI understands your personal knowledge organization
  • No duplication or migration required

Use Case: You've maintained an Obsidian vault for five years with extensive notes on system design patterns. Registering this vault enables the AI to reference your existing knowledge when analyzing new projects—your historical learning informs current work.

Scenario 3: Cross-Project Knowledge Reuse

Workflow:

  1. Create a "Design Patterns Learning Vault" with notes and examples
  2. Reference this vault across multiple project analyses
  3. Knowledge transfers automatically between contexts

Benefits:

  • Knowledge accumulated once applies everywhere
  • No repeated learning of fundamental concepts
  • AI recognizes pattern applications across different projects

Example: Your design patterns vault contains detailed notes on Observer, Factory, and Strategy patterns. When analyzing Project A's event system and Project B's plugin architecture, the AI references the same pattern knowledge—recognizing similarities and differences automatically.


Security: Path Safety Mechanisms

Operating on filesystem resources requires robust security boundaries. The system implements strict path validation to prevent path traversal attacks:

private static string ResolveFilePath(
    string vaultRoot, 
    string relativePath
)
{
    var rootPath = EnsureTrailingSeparator(
        Path.GetFullPath(vaultRoot)
    );
    var combinedPath = Path.GetFullPath(
        Path.Combine(rootPath, relativePath)
    );
    
    if (!combinedPath.StartsWith(
        rootPath, 
        StringComparison.OrdinalIgnoreCase
    ))
    {
        throw new BusinessException(
            VaultRelativePathTraversalCode,
            "Vault file paths must stay inside the registered vault root."
        );
    }
    
    return combinedPath;
}

This validation ensures all file operations remain within the vault's designated root directory, preventing malicious or accidental access to unrelated filesystem areas. Security boundaries are non-negotiable when AI assistants operate on user files.


Important Considerations

Path Security

Always ensure custom paths fall within allowed ranges. The system rejects operations attempting to access paths outside registered vault roots. This protects against both accidental misconfigurations and potential security vulnerabilities.

Git Submodule Management

CodeRef vaults recommend Git submodules over direct code copying. Benefits include synchronization capabilities and space efficiency. However, submodules have their own usage patterns—if you're unfamiliar, invest time in learning submodule workflows.

File Preview Limitations

The system enforces file size limits (256KB) and quantity limits (500 files) for preview operations. These constraints protect system performance. For larger files, process them in batches or use alternative approaches.

Diagnostic Information

Vault creation returns diagnostic information useful for debugging failures. When encountering issues, always review diagnostic output first—it typically provides clear indicators of what went wrong.


Conclusion: The Future of AI-Assisted Learning

The HagiCode Vault system addresses a deceptively simple but profoundly important question: How do we enable AI assistants to understand and utilize local knowledge resources effectively?

Through unified storage abstraction, standardized directory structures, and automated context injection, the system achieves "register once, reuse everywhere" knowledge management. Creating a single vault enables the AI to automatically access and understand learning notes, code repositories, and documentation—without repeated manual context provision.

The experiential improvement is substantial. No more manually copying code snippets. No more repeatedly explaining background information. Your AI assistant becomes like a colleague who genuinely understands your project context, providing increasingly valuable assistance based on accumulated knowledge.

This vault system represents real-world engineering—developed through actual HagiCode development challenges, refined through practical usage, and optimized based on genuine user needs. If this architectural approach resonates with your challenges, the broader HagiCode project likely offers additional value worth exploring.

Resources:

  • HagiCode GitHub: github.com/HagiCode-org/site
  • Official Website: hagicode.com
  • 30-Minute Demo: bilibili.com/video/BV1pirZBuEzq/
  • Docker Installation: docs.hagicode.com/installation/docker-compose
  • Desktop Client: hagicode.com/desktop/

The future of learning is collaborative—human curiosity amplified by AI capability, connected through thoughtful architectural design. The vault system is one step toward that future.