Building Cross-Project Knowledge Bases with Vault Systems for AI Assistants
The Evolution of Learning in the AI Era
The landscape of technical learning is undergoing a profound transformation. Traditional methods—reading books, watching video tutorials, and attending courses—remain valuable, but a new paradigm has emerged as increasingly dominant: project-based learning through code imitation.
This approach involves deeply studying and replicating excellent open-source projects—analyzing their code structure, understanding architectural decisions, and internalizing design patterns through hands-on modification and experimentation. By directly running and modifying high-quality open-source code, developers gain the fastest possible understanding of real-world engineering practices.
However, this powerful learning method introduces significant new challenges that traditional approaches never faced.
The Knowledge Fragmentation Problem
Scattered Learning Materials
Modern developers accumulate knowledge across multiple disconnected platforms and formats:
- Notes stored in Obsidian, Notion, or Evernote
- Code repositories scattered across various directories and GitHub organizations
- AI assistant conversations trapped in isolated chat histories
- Documentation downloaded as PDFs or bookmarked online
- Configuration files buried in project directories
Each time you need AI assistance analyzing a specific project, you face the tedious process of manually copying code snippets, gathering context, and reconstructing the background information. This fragmentation creates friction that slows learning and reduces productivity.
Context Discontinuity
AI assistants fundamentally lack persistent memory between conversations. Each new chat session begins as a blank slate, requiring you to:
- Re-explain your project's purpose and architecture
- Re-share relevant code files and configurations
- Re-establish the current problem state and goals
- Rebuild the conversational context from scratch
This repetition becomes exponentially worse when working across multiple projects. Knowledge gained while analyzing Project A provides no benefit when you switch to Project B—the AI has no way to connect insights across your learning journey.
The Core Issue: Data Silos
These challenges share a common root cause: data silos. Your learning resources exist in isolated islands with no unified access mechanism. AI assistants cannot natively access your local files, understand your organizational structure, or connect insights across different knowledge domains.
The solution requires a fundamental architectural shift: a unified storage abstraction layer that enables AI assistants to understand and access all learning resources seamlessly.
Introducing the Vault System Architecture
To address these pain points comprehensively, we developed the Vault system—a unified knowledge storage abstraction layer designed specifically for AI-assisted learning and development. This architectural decision transforms how AI assistants interact with local knowledge resources, creating capabilities far beyond what manual context provision could achieve.
What Is a Vault?
A vault is a registered knowledge repository that the AI system can access, understand, and utilize autonomously. Think of it as giving your AI assistant a key to a organized library of your learning materials—once registered, the AI can navigate, reference, and (when permitted) modify contents without requiring manual intervention for each interaction.
The HagiCode Implementation
This vault system architecture was developed and refined through the HagiCode project—an AI code assistant built on OpenSpec workflows. HagiCode's core philosophy extends beyond conversational AI: it enables AI to actually do work—operating on code repositories, executing commands, running tests, and managing development workflows.
GitHub: github.com/HagiCode-org/site
During HagiCode's development, we recognized that AI assistants needed frequent access to users' diverse learning resources: code repositories, note documents, configuration files, and more. Requiring manual provision for each access created an unacceptable user experience. This insight drove the vault system's design.
Vault Types: Matching Structure to Use Case
The vault system supports four distinct types, each optimized for specific learning and development scenarios:
| Type | Purpose | Typical Use Cases |
|---|---|---|
| folder | Generic folder storage | Temporary learning materials, drafts, unorganized resources |
| coderef | Code reference projects | Systematic learning of open-source projects |
| obsidian | Obsidian笔记 integration | Reusing existing note libraries |
| system-managed | System-controlled storage | Project configurations, prompt templates, system resources |
The CodeRef Vault: Purpose-Built for Project Learning
The coderef type deserves special attention—it's the most commonly used vault type in HagiCode and represents the system's core value proposition for project-based learning.
Why a Dedicated Type for Code Projects?
Learning from open-source projects isn't simply "downloading code." It requires managing multiple interconnected elements:
- Source code from the original repository
- Personal notes documenting understanding and insights
- Configuration files for local development setup
- Modifications and experiments you've implemented
- Metadata describing the project's purpose and status
The coderef vault type standardizes this complexity into a coherent structure that both humans and AI can navigate effectively.
Core Design Principles
Persistent Storage Mechanism
The vault registry persists to the filesystem in JSON format:
_registryFilePath = Path.Combine(
absoluteDataDir,
"personal-data",
"vaults",
"registry.json"
);This seemingly simple design choice reflects careful consideration of multiple factors:
Simplicity and Reliability
- JSON format is human-readable, facilitating debugging and manual modification
- When systems encounter issues, developers can directly inspect and even repair the registry file
- No database dependencies or connection requirements
- Particularly valuable during development and troubleshooting
Reduced Dependencies
- Filesystem storage eliminates database installation and configuration complexity
- No additional services to maintain or monitor
- Lower system complexity reduces potential failure points
- Simplifies deployment across different environments
Concurrency Safety
SemaphoreSlimensures thread-safe access to the registry- AI code assistants may have multiple concurrent operations accessing vault information
- Proper concurrency control prevents race conditions and data corruption
AI Context Integration
The system's true power emerges through automatic vault information injection into AI proposal contexts:
export function buildTargetVaultsText(
vaults: VaultForText[],
template: VaultPromptTemplate = DEFAULT_VAULT_PROMPT_TEMPLATE,
): string {
const readOnlyVaults = vaults.filter(
(vault) => vault.accessType === 'read'
);
const editableVaults = vaults.filter(
(vault) => vault.accessType === 'write'
);
const sections = [
buildVaultSection(readOnlyVaults, template.reference),
buildVaultSection(editableVaults, template.editable),
].filter(Boolean);
return `\n\n### ${template.heading}\n\n${sections.join('\n')}`;
}This automatic context injection transforms the user experience dramatically. Instead of manually providing background for each interaction, you simply tell the AI: "Help me analyze React's concurrent rendering." The AI automatically locates your previously registered React learning vault and accesses relevant code and notes—no repeated context sharing required.
Access Control Mechanism
The system distinguishes between two access types for each vault:
Reference (Read-Only)
- AI can read and analyze content
- AI cannot modify files
- Ideal for learning materials, open-source code, documentation
Editable (Write Access)
- AI can read and modify content
- AI can create new files and update existing ones
- Suitable for personal projects, working directories, draft areas
This distinction provides crucial safety guarantees. When you register an open-source project vault for learning, you mark it as reference—preventing accidental modifications to the code you're studying. Conversely, your personal project vaults marked as editable enable the AI to actively assist with implementation.
Standardized CodeRef Vault Structure
For coderef type vaults, the system provides a standardized directory structure that optimizes both human and AI navigation:
my-coderef-vault/
├── index.yaml # Vault metadata description
├── AGENTS.md # AI assistant operation guidelines
├── docs/ # Learning notes and documentation
└── repos/ # Cloned repositories via Git submodulesDesign Philosophy Breakdown
docs/ Directory: Learning Notes
This directory stores Markdown-format notes documenting your understanding of the code, architectural analysis, encountered challenges, and solutions discovered. These notes serve dual purposes:
- Personal Reference: Your own documented learning journey
- AI Context: When analyzing related code, the AI automatically references these notes—like having an assistant who remembers your previous insights
repos/ Directory: Git Submodule Management
Rather than directly copying code, this directory manages cloned repositories through Git submodules. This approach offers significant advantages:
- Upstream Synchronization: A simple
git submodule updatefetches the latest code from original repositories - Space Efficiency: Multiple vaults can reference the same repository at different versions without duplicating storage
- Version Control: Track specific commits or branches for reproducibility
- Clean Separation: Distinguish between original code and your notes/modifications
index.yaml: Vault Metadata
This file contains structured metadata describing the vault's purpose, contents, and relationships. Think of it as the vault's "self-introduction"—enabling the AI to quickly understand what the vault contains and how it should be used.
Example structure:
name: React Learning Vault
type: coderef
description: Systematic study of React's architecture and concurrent rendering
created: 2026-04-10
repositories:
- name: react
url: https://github.com/facebook/react.git
branch: main
tags:
- frontend
- javascript
- react
- concurrent-renderingAGENTS.md: AI Operation Guidelines
This file contains instructions specifically for AI assistants, explaining how to handle the vault's contents. You can specify:
- Which aspects to focus on during analysis
- Files or directories to avoid modifying
- Preferred analysis approaches
- Context about your learning goals
Example:
# AI Assistant Guidelines
## Focus Areas
- Performance optimization related code
- Fiber architecture implementation
- Scheduler module
## Restrictions
- Do not modify test files
- Do not change package.json dependencies
- Create new notes in docs/ directory
## Analysis Preferences
- Prioritize understanding rendering flow
- Document findings in docs/analysis-notes.mdPractical Implementation Guide
Creating a CodeRef Vault
Creating a vault programmatically demonstrates the system's automation capabilities:
const createCodeRefVault = async () => {
const response = await VaultService.postApiVaults({
requestBody: {
name: "React Learning Vault",
type: "coderef",
physicalPath: "/Users/developer/vaults/react-learning",
gitUrl: "https://github.com/facebook/react.git"
}
});
// System automatically:
// 1. Clones React repository to vault/repos/react
// 2. Creates docs/ directory for notes
// 3. Generates index.yaml metadata
// 4. Creates AGENTS.md guidelines file
return response;
};This automation eliminates manual setup work, ensuring consistent structure across all vaults.
Referencing Vaults in AI Proposals
Once created, vaults integrate seamlessly into AI interaction proposals:
const proposal = composeProposalChiefComplaint({
chiefComplaint: "Help me analyze React's concurrent rendering mechanism",
repositories: [
{
id: "react",
gitUrl: "https://github.com/facebook/react.git"
}
],
vaults: [
{
id: "react-learning",
name: "React Learning Vault",
type: "coderef",
physicalPath: "/vaults/react-learning",
accessType: "read" // AI can read but not modify
}
],
quickRequestText: "Focus on fiber architecture and scheduler implementation"
});This structured approach ensures the AI has all necessary context to provide targeted, relevant assistance.
Real-World Usage Scenarios
Scenario 1: Systematic Open-Source Project Learning
Workflow:
- Create a CodeRef vault for your target project
- Manage the repository through Git submodules
- Document understanding and insights in
docs/ - Request AI analysis with vault context automatically included
Benefits:
- AI accesses both code and your previous notes simultaneously
- Insights from earlier learning sessions inform current analysis
- Knowledge accumulates systematically rather than fragmenting
Example: While studying React's Fiber architecture, you document key insights in docs/fiber-notes.md. Weeks later, when analyzing performance issues, the AI references these notes automatically, providing continuity across your learning journey.
Scenario 2: Obsidian Notes Integration
Workflow:
- Register existing Obsidian vault as an
obsidiantype vault - AI gains direct access to your accumulated knowledge base
- No manual copying or pasting required
Benefits:
- Leverages years of accumulated notes immediately
- AI understands your personal knowledge organization
- No duplication or migration required
Use Case: You've maintained an Obsidian vault for five years with extensive notes on system design patterns. Registering this vault enables the AI to reference your existing knowledge when analyzing new projects—your historical learning informs current work.
Scenario 3: Cross-Project Knowledge Reuse
Workflow:
- Create a "Design Patterns Learning Vault" with notes and examples
- Reference this vault across multiple project analyses
- Knowledge transfers automatically between contexts
Benefits:
- Knowledge accumulated once applies everywhere
- No repeated learning of fundamental concepts
- AI recognizes pattern applications across different projects
Example: Your design patterns vault contains detailed notes on Observer, Factory, and Strategy patterns. When analyzing Project A's event system and Project B's plugin architecture, the AI references the same pattern knowledge—recognizing similarities and differences automatically.
Security: Path Safety Mechanisms
Operating on filesystem resources requires robust security boundaries. The system implements strict path validation to prevent path traversal attacks:
private static string ResolveFilePath(
string vaultRoot,
string relativePath
)
{
var rootPath = EnsureTrailingSeparator(
Path.GetFullPath(vaultRoot)
);
var combinedPath = Path.GetFullPath(
Path.Combine(rootPath, relativePath)
);
if (!combinedPath.StartsWith(
rootPath,
StringComparison.OrdinalIgnoreCase
))
{
throw new BusinessException(
VaultRelativePathTraversalCode,
"Vault file paths must stay inside the registered vault root."
);
}
return combinedPath;
}This validation ensures all file operations remain within the vault's designated root directory, preventing malicious or accidental access to unrelated filesystem areas. Security boundaries are non-negotiable when AI assistants operate on user files.
Important Considerations
Path Security
Always ensure custom paths fall within allowed ranges. The system rejects operations attempting to access paths outside registered vault roots. This protects against both accidental misconfigurations and potential security vulnerabilities.
Git Submodule Management
CodeRef vaults recommend Git submodules over direct code copying. Benefits include synchronization capabilities and space efficiency. However, submodules have their own usage patterns—if you're unfamiliar, invest time in learning submodule workflows.
File Preview Limitations
The system enforces file size limits (256KB) and quantity limits (500 files) for preview operations. These constraints protect system performance. For larger files, process them in batches or use alternative approaches.
Diagnostic Information
Vault creation returns diagnostic information useful for debugging failures. When encountering issues, always review diagnostic output first—it typically provides clear indicators of what went wrong.
Conclusion: The Future of AI-Assisted Learning
The HagiCode Vault system addresses a deceptively simple but profoundly important question: How do we enable AI assistants to understand and utilize local knowledge resources effectively?
Through unified storage abstraction, standardized directory structures, and automated context injection, the system achieves "register once, reuse everywhere" knowledge management. Creating a single vault enables the AI to automatically access and understand learning notes, code repositories, and documentation—without repeated manual context provision.
The experiential improvement is substantial. No more manually copying code snippets. No more repeatedly explaining background information. Your AI assistant becomes like a colleague who genuinely understands your project context, providing increasingly valuable assistance based on accumulated knowledge.
This vault system represents real-world engineering—developed through actual HagiCode development challenges, refined through practical usage, and optimized based on genuine user needs. If this architectural approach resonates with your challenges, the broader HagiCode project likely offers additional value worth exploring.
Resources:
- HagiCode GitHub:
github.com/HagiCode-org/site - Official Website:
hagicode.com - 30-Minute Demo:
bilibili.com/video/BV1pirZBuEzq/ - Docker Installation:
docs.hagicode.com/installation/docker-compose - Desktop Client:
hagicode.com/desktop/
The future of learning is collaborative—human curiosity amplified by AI capability, connected through thoughtful architectural design. The vault system is one step toward that future.