Mem0 Deep Dive: How AI Memory Systems Actually Work Under the Hood
Introduction: The Quest for Persistent AI Memory
In the rapidly evolving landscape of artificial intelligence, one challenge has persisted: how do we give AI systems the ability to remember, learn, and adapt over time? Mem0 (pronounced "mem-zero") emerges as a groundbreaking open-source project that addresses this fundamental question by providing a long-term memory layer for AI applications.
This comprehensive analysis delves into the source code of Mem0, specifically examining how memories are added to the system. Through careful examination of the implementation details, we uncover the sophisticated architecture that enables AI assistants to remember user preferences, adapt to individual needs, and continuously learn from interactions.
The implications extend far beyond simple data storage. Mem0 represents a paradigm shift in how we think about AI memory—moving from ephemeral context windows to persistent, intelligent memory systems that evolve with each interaction.
Understanding Mem0's Core Mission
Mem0 positions itself as a memory infrastructure for AI applications. Its primary use cases include:
- Customer Support Chatbots: Remembering customer history, preferences, and previous issues across sessions
- AI Assistants: Building personalized understanding of user habits, preferences, and working styles
- Autonomous Systems: Maintaining state and learning from past actions to improve future performance
Before diving into the code, several critical questions demand answers:
- How does Mem0 extract valuable information from conversational data?
- What decision-making process determines whether to add new memories, update existing ones, or delete outdated information?
- What distinct roles do vector storage and graph storage play in the overall architecture?
These questions form the foundation of our exploration into Mem0's memory addition mechanism.
Architectural Overview: The Dual-Storage Approach
Mem0's memory addition flow follows a sophisticated architecture centered around the Memory.add() method located in mem0/memory/main.py. The system orchestrates two parallel operations:
- Vector Store Operations: Handled by
_add_to_vector_store(), responsible for semantic similarity-based retrieval - Graph Store Operations: Managed by
_add_to_graph(), handling entity relationships and knowledge networks
This dual-storage approach isn't arbitrary—it reflects a deep understanding of different memory access patterns. Vector stores excel at finding similar memories through semantic search, while graph stores capture the rich relationships between entities and concepts.
The Entry Point: Memory.add() Method Deep Dive
The Memory.add() method serves as the gateway for all memory operations. Located at line 281 in mem0/memory/main.py, this method accepts several critical parameters:
Parameter Breakdown
messages: The input content accepts multiple formats:
- Simple strings: "I prefer coffee in the morning"
- Single message dictionaries:
{"role": "user", "content": "I prefer coffee"} - Message lists: Complete conversation histories with multiple turns
user_id/agent_id/run_id: These identifiers create isolated memory spaces for different users, agents, or conversation sessions, ensuring memories don't bleed between contexts.
infer: This boolean flag (defaulting to True) determines the processing mode:
True: Enables LLM-powered fact extraction and intelligent memory managementFalse: Stores raw messages directly without any processing
memory_type: Specifies special memory categories such as "procedural_memory" for skill-based learning.
The Core Execution Flow
The method's logic reveals elegant design decisions:
# Step 1: Build metadata and filtering criteria
processed_metadata, effective_filters = _build_filters_and_metadata(
user_id=user_id, agent_id=agent_id, run_id=run_id, input_metadata=metadata
)
# Step 2: Handle special memory types
if agent_id is not None and memory_type == MemoryType.PROCEDURAL.value:
return self._create_procedural_memory(messages, metadata=processed_metadata)
# Step 3: Execute vector and graph storage in parallel
with concurrent.futures.ThreadPoolExecutor() as executor:
future1 = executor.submit(self._add_to_vector_store, messages, processed_metadata, effective_filters, infer)
future2 = executor.submit(self._add_to_graph, messages, effective_filters)
concurrent.futures.wait([future1, future2])
vector_store_result = future1.result()
graph_result = future2.result()
# Step 4: Return consolidated results
return {"results": vector_store_result, "relations": graph_result}The parallel execution design deserves special attention. By running vector store and graph store operations concurrently, Mem0 achieves significant performance improvements without sacrificing functionality.
Vector Store Addition: The Intelligent Memory Engine
The _add_to_vector_store() method (line 386) represents the heart of Mem0's intelligence. This method operates in two distinct modes.
Mode One: Direct Storage (infer=False)
When inference is disabled, the system takes a straightforward approach:
if not infer:
for message_dict in messages:
# Skip system messages
if message_dict["role"] == "system":
continue
# Generate embeddings
msg_embeddings = self.embedding_model.embed(msg_content, "add")
# Create memory directly
mem_id = self._create_memory(msg_content, msg_embeddings, per_msg_meta)
returned_memories.append({
"id": mem_id,
"memory": msg_content,
"event": "ADD"
})
return returned_memoriesThis mode suits applications requiring complete conversation preservation without intelligent filtering.
Mode Two: Intelligent Inference (infer=True)
The default mode unleashes Mem0's full capabilities through a multi-step process:
Step 1: Fact Extraction via LLM
The system employs carefully crafted prompts to extract meaningful facts from conversations:
# Select appropriate prompt based on memory type
is_agent_memory = self._should_use_agent_memory_extraction(messages, metadata)
system_prompt, user_prompt = get_fact_retrieval_messages(parsed_messages, is_agent_memory)
# Call LLM with structured output
response = self.llm.generate_response(
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt}
],
response_format={"type": "json_object"}
)
# Parse JSON results
new_retrieved_facts = json.loads(response)["facts"]The fact retrieval prompt (defined in mem0/configs/prompts.py) provides detailed guidance:
FACT_RETRIEVAL_PROMPT = """You are a Personal Information Organizer...
Types of Information to Remember:
1. Store Personal Preferences
2. Maintain Important Personal Details
3. Track Plans and Intentions
4. Remember Relationships
5. Note Important Events
6. Capture Skills and Knowledge
7. Track Context-Specific Information
Input: Hi, my name is John. I am a software engineer.
Output: {"facts": ["Name is John", "Is a Software engineer"]}
"""This structured approach ensures consistent fact extraction across diverse conversation types.
Step 2: Similarity Search
For each extracted fact, the system searches for similar existing memories:
for new_mem in new_retrieved_facts:
# Generate embeddings for the new fact
messages_embeddings = self.embedding_model.embed(new_mem, "add")
# Search for similar memories (vector similarity)
existing_memories = self.vector_store.search(
query=new_mem,
vectors=messages_embeddings,
limit=5,
filters=search_filters
)
for mem in existing_memories:
retrieved_old_memory.append({"id": mem.id, "text": mem.payload.get("data", "")})This step prevents duplicate memories and enables intelligent updates to existing knowledge.
Step 3: Operation Decision Making
Using another LLM call, Mem0 decides what operation to perform for each fact:
# Build decision prompt
function_calling_prompt = get_update_memory_messages(
retrieved_old_memory, new_retrieved_facts
)
# LLM returns operation decisions
response = self.llm.generate_response(
messages=[{"role": "user", "content": function_calling_prompt}],
response_format={"type": "json_object"}
)
new_memories_with_actions = json.loads(response)The decision prompt defines four possible operations:
DEFAULT_UPDATE_MEMORY_PROMPT = """You are a smart memory manager...
Compare newly retrieved facts with the existing memory. For each new fact, decide whether to:
- ADD: Add it to the memory as a new element
- UPDATE: Update an existing memory element
- DELETE: Delete an existing memory element
- NONE: Make no change
"""Example decision output:
{
"memory": [
{
"id": "0",
"text": "Likes cheese and chicken pizza",
"event": "UPDATE",
"old_memory": "Likes cheese pizza"
},
{
"id": "1",
"text": "Name is John",
"event": "NONE"
},
{
"id": "2",
"text": "Dislikes cats",
"event": "ADD"
}
]
}Step 4: Executing Operations
Based on the decisions, the system performs the appropriate CRUD operations:
for resp in new_memories_with_actions.get("memory", []):
action_text = resp.get("text")
event_type = resp.get("event")
if event_type == "ADD":
memory_id = self._create_memory(action_text, existing_embeddings, metadata)
elif event_type == "UPDATE":
self._update_memory(
memory_id=temp_uuid_mapping[resp.get("id")],
data=action_text,
existing_embeddings=existing_embeddings,
metadata=metadata
)
elif event_type == "DELETE":
self._delete_memory(memory_id=temp_uuid_mapping[resp.get("id")])
elif event_type == "NONE":
# No action required
passMemory Creation Internals
The _create_memory() method (line 1075) handles the actual storage:
def _create_memory(self, data, existing_embeddings, metadata=None):
# Generate unique ID
memory_id = str(uuid.uuid4())
# Build metadata
metadata["data"] = data
metadata["hash"] = hashlib.md5(data.encode()).hexdigest()
metadata["created_at"] = datetime.now(pytz.timezone("US/Pacific")).isoformat()
# Store in vector database
self.vector_store.insert(
vectors=[embeddings],
ids=[memory_id],
payloads=[metadata]
)
# Record history (SQLite)
self.db.add_history(memory_id, None, data, "ADD", ...)
return memory_idThis method ensures each memory has a unique identifier, content hash for deduplication, and timestamp for temporal ordering.
Graph Store Addition: Building Knowledge Networks
While vector stores handle semantic similarity, the graph store (_add_to_graph()) manages entity relationships. Located at line 599, this method activates when graph storage is enabled.
Entry Point Logic
def _add_to_graph(self, messages, filters):
if self.enable_graph:
# Merge message contents
data = "\n".join([msg["content"] for msg in messages if msg["role"] != "system"])
# Call graph storage addition
added_entities = self.graph.add(data, filters)
return added_entitiesCore Graph Operations
The graph storage logic (in mem0/memory/graph_memory.py:76) follows a systematic approach:
def add(self, data, filters):
# Step 1: Extract entities
entity_type_map = self._retrieve_nodes_from_data(data, filters)
# Step 2: Establish entity relationships
to_be_added = self._establish_nodes_relations_from_data(data, filters, entity_type_map)
# Step 3: Search for similar nodes in graph database
search_output = self._search_graph_db(node_list=list(entity_type_map.keys()), filters=filters)
# Step 4: Identify entities to delete (conflicting relationships)
to_be_deleted = self._get_delete_entities_from_search_output(search_output, data, filters)
# Step 5: Execute deletions and additions
deleted_entities = self._delete_entities(to_be_deleted, filters)
added_entities = self._add_entities(to_be_added, filters, entity_type_map)
return {"deleted_entities": deleted_entities, "added_entities": added_entities}Entity Extraction Example
Using LLM-powered extraction:
entity_type_map = self._retrieve_nodes_from_data(data, filters)
# Result example:
# {
# "john": "person",
# "coffee": "drink",
# "starbucks": "brand"
# }Relationship Building Example
entities = self._establish_nodes_relations_from_data(data, filters, entity_type_map)
# Result example:
# [
# {"source": "john", "relationship": "likes", "destination": "coffee"},
# {"source": "coffee", "relationship": "brand", "destination": "starbucks"}
# ]These relationships are ultimately stored in Neo4j:
(john:Person) -[:LIKES]-> (coffee:Drink) -[:BRAND]-> (starbucks:Brand)Key Component Architecture
LLM Factory Pattern
Mem0 employs a factory pattern supporting multiple LLM providers:
class LlmFactory:
provider_to_class = {
"openai": ("mem0.llms.openai.OpenAILLM", OpenAIConfig),
"anthropic": ("mem0.llms.anthropic.AnthropicLLM", AnthropicConfig),
"azure_openai": ("mem0.llms.azure_openai.AzureOpenAILLM", AzureOpenAIConfig),
"gemini": ("mem0.llms.gemini.GeminiLLM", BaseLlmConfig),
...
}Embedding Generation
class EmbedderFactory:
provider_to_class = {
"openai": "mem0.embeddings.openai.OpenAIEmbedding",
"huggingface": "mem0.embeddings.huggingface.HuggingFaceEmbedding",
...
}Vector Store Support
class VectorStoreFactory:
provider_to_class = {
"qdrant": "mem0.vector_stores.qdrant.Qdrant",
"chroma": "mem0.vector_stores.chroma.ChromaDB",
"pinecone": "mem0.vector_stores.pinecone.PineconeDB",
"milvus": "mem0.vector_stores.milvus.MilvusDB",
"weaviate": "mem0.vector_stores.weaviate.Weaviate",
...
}This extensible architecture allows users to choose their preferred providers for each component.
Complete Flow Example
Let's trace a complete memory addition scenario:
from mem0 import Memory
memory = Memory()
# User conversation
messages = [
{"role": "user", "content": "My name is Zhang San, I like drinking lattes"},
{"role": "assistant", "content": "Hello Zhang San! Lattes are popular coffee drinks"}
]
# Add memory
result = memory.add(messages, user_id="zhangsan")Expected result:
{
"results": [
{"id": "abc123", "memory": "Name is Zhang San", "event": "ADD"},
{"id": "def456", "memory": "Likes latte coffee", "event": "ADD"}
],
"relations": {
"deleted_entities": [],
"added_entities": [
{"source": "Zhang San", "relationship": "likes", "target": "latte"}
]
}
}Flow Breakdown
- Fact Extraction: LLM extracts
["Name is Zhang San", "Likes latte coffee"] - Similarity Search: Vector database search returns no similar memories (new user)
- Decision: LLM determines both facts need ADD operations
- Execution: Two memory records created with embeddings stored in vector database
- Graph Storage: Entities extracted
{"Zhang San": "person", "latte": "drink"}with relationshipZhang San -[:likes]-> lattestored in Neo4j
Design Highlights Worth Emulating
Intelligent Inference vs Direct Storage
Mem0's dual-mode operation serves different needs:
- infer=True: Intelligent extraction, deduplication, and updates—ideal for production environments
- infer=False: Direct storage preserving complete conversations—suitable for compliance or audit requirements
Dual-Storage Architecture
The combination of vector and graph storage provides complementary capabilities:
- Vector Store: Fast semantic similarity search, perfect for retrieval scenarios
- Graph Store: Entity relationship management, ideal for complex knowledge networks
Together, they offer comprehensive memory capabilities that neither could achieve alone.
Parallel Processing
Vector and graph storage operations execute concurrently:
with concurrent.futures.ThreadPoolExecutor() as executor:
future1 = executor.submit(self._add_to_vector_store, ...)
future2 = executor.submit(self._add_to_graph, ...)This design choice demonstrates thoughtful performance optimization without complicating the API.
Prompt Engineering Excellence
Mem0's prompts represent carefully crafted instruments:
- Fact Extraction Prompt: Defines seven information categories with detailed examples
- Memory Update Prompt: Establishes clear ADD/UPDATE/DELETE/NONE rules
- Custom Prompt Support: Allows
custom_fact_extraction_promptfor domain-specific needs
These prompts aren't afterthoughts—they're central to Mem0's intelligence.
Conclusion: Lessons from Mem0's Architecture
Through this source code exploration, we've uncovered that Mem0's memory addition is far more than simple storage. It encompasses:
- Intelligent Fact Extraction: Using LLMs to identify valuable information
- Similarity-Based Retrieval: Preventing duplicates through vector search
- Smart CRUD Decisions: LLM-powered add/update/delete decisions
- Dual-Storage Architecture: Combining vector and graph databases
The core design principles include:
- Using LLMs to intelligently manage the memory lifecycle
- Leveraging vector storage for factual retrieval
- Employing graph storage for entity relationships
- Parallel processing for efficiency
Most importantly, prompt engineering sits at Mem0's heart. The carefully designed prompts for fact extraction and memory updates are what transform Mem0 from a simple database wrapper into an intelligent memory management system.
This architecture offers valuable lessons for anyone building AI systems requiring persistent, evolving knowledge. Mem0 demonstrates that effective AI memory isn't about storing everything—it's about intelligently curating what matters while maintaining the flexibility to adapt as understanding deepens.