Inside Mem0's Memory Engine: How Dual-Storage Architecture Powers Intelligent AI Recall
This is the first installment of an in-depth source code analysis series focusing on Mem0, an open-source project that provides a long-term memory layer for AI applications. In this comprehensive exploration, we will dissect the core functionality of Mem0—its memory addition mechanism—and uncover the sophisticated design principles and implementation details that make it work.
Mem0 (pronounced "mem-zero") represents a groundbreaking approach to giving AI applications the ability to maintain persistent memory across sessions. It enables AI assistants to remember user preferences, adapt to individualized needs, and engage in continuous learning—making it exceptionally well-suited for customer support chatbots, AI companions, autonomous systems, and any application requiring contextual continuity over time.
Before diving into the source code, several fundamental questions drove this investigation:
- How does Mem0 extract valuable information from conversational exchanges?
- What decision-making process determines whether to add new memories, update existing ones, or delete outdated information?
- What distinct roles do vector storage and graph storage play in the overall architecture?
- How does the system balance performance with accuracy in memory operations?
With these guiding questions in mind, let us embark on this technical journey through Mem0's internal workings.
Architectural Overview
Mem0's memory addition workflow follows a carefully orchestrated architecture that balances efficiency with intelligence. The core implementation resides in mem0/memory/main.py, with three primary components driving the entire process:
- Memory.add(): The main entry point that orchestrates the entire memory addition workflow
- _add_to_vector_store(): Handles all vector storage operations for semantic similarity-based retrieval
- _add_to_graph(): Manages graph storage operations for entity-relationship mapping
The architecture embodies a dual-storage philosophy: vector storage excels at rapid similarity searches and semantic retrieval, while graph storage captures complex entity relationships and knowledge networks. Together, they provide complementary capabilities that neither could achieve alone.
The Entry Point: Memory.add() Method
Let us begin our exploration at the system's entry point, located at line 281 of mem0/memory/main.py. The method signature reveals the flexibility built into the design:
def add(
self,
messages,
*,
user_id: Optional[str] = None,
agent_id: Optional[str] = None,
run_id: Optional[str] = None,
metadata: Optional[Dict[str, Any]] = None,
infer: bool = True,
memory_type: Optional[str] = None,
prompt: Optional[str] = None,
):Parameter Breakdown and Design Intentions
The messages parameter demonstrates remarkable flexibility in input handling. It accepts:
- Plain strings: Simple text like "I prefer drinking coffee in the morning"
- Single message dictionaries: Structured format such as
{"role": "user", "content": "I prefer drinking coffee"} - Message lists: Complete conversation threads with multiple exchanges between user and assistant
The identifier parameters (user_id, agent_id, run_id) serve a critical isolation function. They ensure that memories remain properly segmented across different users, agents, or execution contexts—preventing cross-contamination of personal data and maintaining privacy boundaries.
The infer parameter represents a crucial design decision point:
- When True (default): The system engages LLM-powered reasoning to extract facts, perform similarity matching, and make intelligent decisions about memory operations
- When False: Messages are stored directly without any intelligent processing, suitable for scenarios requiring complete conversation preservation
The memory_type parameter enables specialized handling for different memory categories, such as "procedural_memory" for skill-based knowledge that differs from declarative facts.
Core Execution Flow
The simplified core logic reveals elegant design patterns:
# Step 1: Build metadata and filtering criteria
processed_metadata, effective_filters = _build_filters_and_metadata(
user_id=user_id,
agent_id=agent_id,
run_id=run_id,
input_metadata=metadata
)
# Step 2: Handle special memory types (e.g., procedural memory)
if agent_id is not None and memory_type == MemoryType.PROCEDURAL.value:
return self._create_procedural_memory(messages, metadata=processed_metadata)
# Step 3: Execute vector store and graph store additions in parallel
with concurrent.futures.ThreadPoolExecutor() as executor:
future1 = executor.submit(
self._add_to_vector_store,
messages,
processed_metadata,
effective_filters,
infer
)
future2 = executor.submit(
self._add_to_graph,
messages,
effective_filters
)
concurrent.futures.wait([future1, future2])
vector_store_result = future1.result()
graph_result = future2.result()
# Step 4: Return consolidated results
return {"results": vector_store_result, "relations": graph_result}The parallel execution design deserves special attention. By running vector storage and graph storage operations concurrently, Mem0 achieves significant performance improvements without sacrificing the integrity of either storage system. This concurrent approach reflects a mature understanding of modern multi-core processing capabilities.
Vector Storage Operations: _add_to_vector_store()
This method, located at line 386 of mem0/memory/main.py, represents the heart of Mem0's intelligent memory management. It implements two distinct operational modes.
Mode One: Direct Storage (infer=False)
When intelligent inference is disabled, the system takes a straightforward approach:
if not infer:
for message_dict in messages:
# Skip system messages
if message_dict["role"] == "system":
continue
# Generate embeddings for the message content
msg_embeddings = self.embedding_model.embed(msg_content, "add")
# Create memory entry directly
mem_id = self._create_memory(msg_content, msg_embeddings, per_msg_meta)
returned_memories.append({
"id": mem_id,
"memory": msg_content,
"event": "ADD"
})
return returned_memoriesThis mode prioritizes simplicity and completeness over intelligence—every message becomes a memory without filtering or deduplication.
Mode Two: Intelligent Inference (infer=True)
The default mode implements a sophisticated multi-stage pipeline that transforms raw conversation into structured, deduplicated knowledge.
Stage 1: Fact Extraction via LLM
The system begins by employing a large language model to distill valuable facts from the conversation:
# Select appropriate prompt based on memory type
is_agent_memory = self._should_use_agent_memory_extraction(messages, metadata)
system_prompt, user_prompt = get_fact_retrieval_messages(
parsed_messages,
is_agent_memory
)
# Invoke LLM with structured output requirement
response = self.llm.generate_response(
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt}
],
response_format={"type": "json_object"}
)
# Parse JSON results to extract facts
new_retrieved_facts = json.loads(response)["facts"]The fact extraction prompt, defined in mem0/configs/prompts.py, provides detailed guidance on what constitutes memorable information:
FACT_RETRIEVAL_PROMPT = """You are a Personal Information Organizer...
Types of Information to Remember:
1. Store Personal Preferences (food, activities, entertainment choices)
2. Maintain Important Personal Details (name, occupation, location)
3. Track Plans and Intentions (upcoming events, goals, projects)
4. Record Relationship Information (family, friends, colleagues)
5. Capture Skills and Knowledge Areas (expertise, certifications)
6. Note Communication Preferences (tone, language, formality)
7. Document Accessibility Requirements (special needs, accommodations)
Input: Hi, my name is John. I am a software engineer who loves hiking.
Output: {"facts": ["Name is John", "Is a Software engineer", "Enjoys hiking activities"]}
"""This carefully crafted prompt ensures consistent, structured extraction across diverse conversation types.
Stage 2: Similarity Search Against Existing Memories
For each newly extracted fact, the system performs a similarity search to identify potentially overlapping memories:
for new_mem in new_retrieved_facts:
# Generate embedding vector for the new fact
messages_embeddings = self.embedding_model.embed(new_mem, "add")
# Search for similar existing memories using vector similarity
existing_memories = self.vector_store.search(
query=new_mem,
vectors=messages_embeddings,
limit=5,
filters=search_filters
)
# Collect matching memories for comparison
for mem in existing_memories:
retrieved_old_memory.append({
"id": mem.id,
"text": mem.payload.get("data", "")
})This step prevents memory bloat by identifying duplicates and near-duplicates before they enter the system.
Stage 3: Operation Decision via LLM Analysis
The system then employs another LLM call to decide the appropriate action for each fact:
# Construct decision prompt with old and new information
function_calling_prompt = get_update_memory_messages(
retrieved_old_memory,
new_retrieved_facts
)
# LLM returns structured operation decisions
response = self.llm.generate_response(
messages=[{"role": "user", "content": function_calling_prompt}],
response_format={"type": "json_object"}
)
new_memories_with_actions = json.loads(response)The decision prompt defines four possible operations:
DEFAULT_UPDATE_MEMORY_PROMPT = """You are a smart memory manager...
Compare newly retrieved facts with the existing memory. For each new fact, decide whether to:
- ADD: Add it to the memory as a completely new element
- UPDATE: Modify an existing memory element with new information
- DELETE: Remove an existing memory element that is now obsolete
- NONE: Make no change because the information already exists
"""An example decision output illustrates the system's reasoning:
{
"memory": [
{
"id": "0",
"text": "Likes cheese and chicken pizza",
"event": "UPDATE",
"old_memory": "Likes cheese pizza"
},
{
"id": "1",
"text": "Name is John",
"event": "NONE"
},
{
"id": "2",
"text": "Dislikes cats",
"event": "ADD"
}
]
}Stage 4: Executing Decided Operations
Based on the LLM's decisions, the system executes the appropriate CRUD operations:
for resp in new_memories_with_actions.get("memory", []):
action_text = resp.get("text")
event_type = resp.get("event")
if event_type == "ADD":
memory_id = self._create_memory(
action_text,
existing_embeddings,
metadata
)
elif event_type == "UPDATE":
self._update_memory(
memory_id=temp_uuid_mapping[resp.get("id")],
data=action_text,
existing_embeddings=existing_embeddings,
metadata=metadata
)
elif event_type == "DELETE":
self._delete_memory(
memory_id=temp_uuid_mapping[resp.get("id")]
)
elif event_type == "NONE":
# No action required - information already exists
passMemory Creation Internals
The _create_memory() method (line 1075) handles the actual persistence:
def _create_memory(self, data, existing_embeddings, metadata=None):
# Generate universally unique identifier
memory_id = str(uuid.uuid4())
# Build comprehensive metadata package
metadata["data"] = data
metadata["hash"] = hashlib.md5(data.encode()).hexdigest()
metadata["created_at"] = datetime.now(
pytz.timezone("US/Pacific")
).isoformat()
# Insert into vector database with embedding and payload
self.vector_store.insert(
vectors=[embeddings],
ids=[memory_id],
payloads=[metadata]
)
# Record operation history in SQLite for audit trail
self.db.add_history(
memory_id, None, data, "ADD", ...
)
return memory_idThis method ensures each memory has a unique identity, temporal context, and content hash for deduplication verification.
Graph Storage Operations: _add_to_graph()
While vector storage excels at similarity search, graph storage captures the rich relationships between entities—creating a knowledge network rather than isolated facts.
Entry Method Overview
Located at line 599 of mem0/memory/main.py:
def _add_to_graph(self, messages, filters):
if self.enable_graph:
# Consolidate message content into single text
data = "\n".join([
msg["content"] for msg in messages
if msg["role"] != "system"
])
# Invoke graph storage addition
added_entities = self.graph.add(data, filters)
return added_entitiesGraph Storage Core Logic
The implementation in mem0/memory/graph_memory.py (line 76) follows a systematic five-step process:
def add(self, data, filters):
# Step 1: Extract entities and their types from text
entity_type_map = self._retrieve_nodes_from_data(data, filters)
# Step 2: Establish relationships between identified entities
to_be_added = self._establish_nodes_relations_from_data(
data, filters, entity_type_map
)
# Step 3: Search graph database for similar existing nodes
search_output = self._search_graph_db(
node_list=list(entity_type_map.keys()),
filters=filters
)
# Step 4: Identify entities to delete (contradictory relationships)
to_be_deleted = self._get_delete_entities_from_search_output(
search_output, data, filters
)
# Step 5: Execute deletion and addition operations
deleted_entities = self._delete_entities(to_be_deleted, filters)
added_entities = self._add_entities(to_be_added, filters, entity_type_map)
return {
"deleted_entities": deleted_entities,
"added_entities": added_entities
}Entity Extraction Example
The system uses LLM-powered extraction to identify entities and classify them:
entity_type_map = self._retrieve_nodes_from_data(data, filters)
# Example result:
# {
# "john": "person",
# "coffee": "drink",
# "starbucks": "brand"
# }Relationship Establishment Example
Another LLM call determines how entities relate to one another:
entities = self._establish_nodes_relations_from_data(
data, filters, entity_type_map
)
# Example result:
# [
# {"source": "john", "relationship": "likes", "destination": "coffee"},
# {"source": "coffee", "relationship": "brand", "destination": "starbucks"}
# ]These relationships are then persisted in Neo4j, creating a traversable knowledge graph:
(john:Person) -[:LIKES]-> (coffee:Drink) -[:BRAND]-> (starbucks:Brand)Key Component Architecture
LLM Factory Pattern
Mem0 employs a factory pattern to support multiple LLM providers seamlessly:
# mem0/utils/factory.py
class LlmFactory:
provider_to_class = {
"openai": ("mem0.llms.openai.OpenAILLM", OpenAIConfig),
"anthropic": ("mem0.llms.anthropic.AnthropicLLM", AnthropicConfig),
"azure_openai": ("mem0.llms.azure_openai.AzureOpenAILLM", AzureOpenAILLMConfig),
"gemini": ("mem0.llms.gemini.GeminiLLM", BaseLlmConfig),
# Additional providers...
}This design enables users to switch between providers without modifying application code.
Embedding Generation
Similarly, embedding models are abstracted through a factory:
class EmbedderFactory:
provider_to_class = {
"openai": "mem0.embeddings.openai.OpenAIEmbedding",
"huggingface": "mem0.embeddings.huggingface.HuggingFaceEmbedding",
# Additional providers...
}Vector Store Support
The system supports numerous vector databases:
class VectorStoreFactory:
provider_to_class = {
"qdrant": "mem0.vector_stores.qdrant.Qdrant",
"chroma": "mem0.vector_stores.chroma.ChromaDB",
"pinecone": "mem0.vector_stores.pinecone.PineconeDB",
"milvus": "mem0.vector_stores.milvus.MilvusDB",
"weaviate": "mem0.vector_stores.weaviate.Weaviate",
# Additional providers...
}This flexibility allows deployment in diverse infrastructure environments.
Complete Workflow Example
Let us trace a complete example from input to stored memory:
from mem0 import Memory
# Initialize memory system
memory = Memory()
# User conversation
messages = [
{"role": "user", "content": "My name is Zhang San. I enjoy drinking lattes."},
{"role": "assistant", "content": "Hello Zhang San! Lattes are a popular coffee choice."}
]
# Add to memory
result = memory.add(messages, user_id="zhangsan")The resulting output demonstrates the dual-storage approach:
{
"results": [
{"id": "abc123", "memory": "Name is Zhang San", "event": "ADD"},
{"id": "def456", "memory": "Enjoys drinking lattes", "event": "ADD"}
],
"relations": {
"deleted_entities": [],
"added_entities": [
{"source": "Zhang San", "relationship": "likes", "target": "latte"}
]
}
}Step-by-Step Breakdown
- Fact Extraction: The LLM processes the conversation and extracts:
["Name is Zhang San", "Enjoys drinking lattes"] - Similarity Search: The system queries the vector database for similar existing memories. In this case (assuming a new user), no matches are found.
- Operation Decision: The LLM determines both facts require ADD operations since they represent new information.
Operation Execution:
- Two memory records are created
- Embeddings are generated and stored in the vector database
- Metadata including timestamps and hashes are recorded
Graph Storage Processing:
- Entities are extracted:
{"Zhang San": "person", "latte": "drink"} - Relationships are established:
Zhang San -[:likes]-> latte - The relationship is persisted in Neo4j
- Entities are extracted:
Design Highlights and Innovations
Intelligent Inference vs Direct Storage
Mem0's dual-mode operation provides flexibility for different use cases:
- infer=True: Intelligent extraction, deduplication, and update decisions—ideal for production environments where memory quality matters
- infer=False: Direct storage preserving complete conversation history—suitable for compliance, auditing, or complete transcript requirements
Dual-Storage Architecture Philosophy
The combination of vector and graph storage addresses complementary needs:
- Vector Storage: Excels at semantic similarity search, enabling retrieval of related memories even when wording differs
- Graph Storage: Captures explicit entity relationships, enabling complex queries like "What does Zhang San like?" or "Find all people who enjoy coffee"
Together, they provide retrieval capabilities neither could achieve independently.
Parallel Processing for Performance
The concurrent execution of vector and graph operations demonstrates performance-conscious design:
with concurrent.futures.ThreadPoolExecutor() as executor:
future1 = executor.submit(self._add_to_vector_store, ...)
future2 = executor.submit(self._add_to_graph, ...)This approach reduces total operation time by leveraging multi-core processors effectively.
Prompt Engineering Excellence
Mem0's intelligence fundamentally relies on carefully crafted prompts:
- Fact Extraction Prompt: Defines seven comprehensive information categories with examples
- Memory Update Prompt: Establishes clear ADD/UPDATE/DELETE/NONE decision criteria
- Customization Support: Allows users to provide custom prompts for domain-specific needs
These prompts represent significant engineering effort and are central to Mem0's effectiveness.
Summary and Key Takeaways
Through this comprehensive source code analysis, we have discovered that Mem0's memory addition is far more sophisticated than simple storage. The system implements:
- Intelligent Fact Extraction: Using LLMs to distill valuable information from conversations
- Similarity-Based Retrieval: Preventing duplication through vector similarity matching
- Decision-Making Pipeline: LLM-powered ADD/UPDATE/DELETE/NONE decisions
- Dual-Storage Architecture: Combining vector and graph storage for complementary capabilities
The core design philosophy centers on using LLMs to intelligently manage the entire memory lifecycle, rather than treating memory as passive storage. Vector storage handles fact retrieval efficiently, while graph storage manages entity relationships for complex knowledge networks. Parallel processing ensures performance remains acceptable despite the computational overhead of LLM calls.
Most critically, Mem0 demonstrates that prompt engineering is not an afterthought but a core architectural component. The carefully designed prompts for fact extraction and memory updates are what transform raw LLM capabilities into structured, reliable memory management.
Series Preview
This article has explored Mem0's memory addition workflow, including intelligent fact extraction, similarity retrieval, update decision-making, and dual-storage architecture. The next installment will dive deeper into the prompt engineering techniques that power these capabilities.
Coming in Part Two: A deep dive into prompt engineering strategies, including how Mem0 crafts effective prompts for different scenarios, handles edge cases, and enables customization for domain-specific applications.
Referenced Source Files
mem0/memory/main.py: Core Memory class implementationmem0/memory/graph_memory.py: Graph storage operationsmem0/configs/prompts.py: All prompt definitionsmem0/utils/factory.py: Factory classes for LLMs, embeddings, and vector stores