Mem0 Deep Dive: How AI Memory Systems Actually Work Under the Hood

Introduction: The Quest for Persistent AI Memory

In the rapidly evolving landscape of artificial intelligence, one challenge has persisted: how do we give AI systems the ability to remember, learn, and adapt over time? Mem0 (pronounced "mem-zero") emerges as a groundbreaking open-source project that addresses this fundamental question by providing a long-term memory layer for AI applications.

This comprehensive analysis delves into the source code of Mem0, specifically examining how memories are added to the system. Through careful examination of the implementation details, we uncover the sophisticated architecture that enables AI assistants to remember user preferences, adapt to individual needs, and continuously learn from interactions.

The implications extend far beyond simple data storage. Mem0 represents a paradigm shift in how we think about AI memory—moving from ephemeral context windows to persistent, intelligent memory systems that evolve with each interaction.

Understanding Mem0's Core Mission

Mem0 positions itself as a memory infrastructure for AI applications. Its primary use cases include:

Customer Support Chatbots: Remembering customer history, preferences, and previous issues across sessions
AI Assistants: Building personalized understanding of user habits, preferences, and working styles
Autonomous Systems: Maintaining state and learning from past actions to improve future performance

Before diving into the code, several critical questions demand answers:

How does Mem0 extract valuable information from conversational data?
What decision-making process determines whether to add new memories, update existing ones, or delete outdated information?
What distinct roles do vector storage and graph storage play in the overall architecture?

These questions form the foundation of our exploration into Mem0's memory addition mechanism.

Architectural Overview: The Dual-Storage Approach

Mem0's memory addition flow follows a sophisticated architecture centered around the Memory.add() method located in mem0/memory/main.py. The system orchestrates two parallel operations:

Vector Store Operations: Handled by _add_to_vector_store(), responsible for semantic similarity-based retrieval
Graph Store Operations: Managed by _add_to_graph(), handling entity relationships and knowledge networks

This dual-storage approach isn't arbitrary—it reflects a deep understanding of different memory access patterns. Vector stores excel at finding similar memories through semantic search, while graph stores capture the rich relationships between entities and concepts.

The Entry Point: Memory.add() Method Deep Dive

The Memory.add() method serves as the gateway for all memory operations. Located at line 281 in mem0/memory/main.py, this method accepts several critical parameters:

Parameter Breakdown

messages: The input content accepts multiple formats:

Simple strings: "I prefer coffee in the morning"
Single message dictionaries: {"role": "user", "content": "I prefer coffee"}
Message lists: Complete conversation histories with multiple turns

user_id/agent_id/run_id: These identifiers create isolated memory spaces for different users, agents, or conversation sessions, ensuring memories don't bleed between contexts.

infer: This boolean flag (defaulting to True) determines the processing mode:

True: Enables LLM-powered fact extraction and intelligent memory management
False: Stores raw messages directly without any processing

memory_type: Specifies special memory categories such as "procedural_memory" for skill-based learning.

The Core Execution Flow

The method's logic reveals elegant design decisions:

# Step 1: Build metadata and filtering criteria
processed_metadata, effective_filters = _build_filters_and_metadata(
    user_id=user_id, agent_id=agent_id, run_id=run_id, input_metadata=metadata
)

# Step 2: Handle special memory types
if agent_id is not None and memory_type == MemoryType.PROCEDURAL.value:
    return self._create_procedural_memory(messages, metadata=processed_metadata)

# Step 3: Execute vector and graph storage in parallel
with concurrent.futures.ThreadPoolExecutor() as executor:
    future1 = executor.submit(self._add_to_vector_store, messages, processed_metadata, effective_filters, infer)
    future2 = executor.submit(self._add_to_graph, messages, effective_filters)
    
    concurrent.futures.wait([future1, future2])
    
    vector_store_result = future1.result()
    graph_result = future2.result()

# Step 4: Return consolidated results
return {"results": vector_store_result, "relations": graph_result}

The parallel execution design deserves special attention. By running vector store and graph store operations concurrently, Mem0 achieves significant performance improvements without sacrificing functionality.

Vector Store Addition: The Intelligent Memory Engine

The _add_to_vector_store() method (line 386) represents the heart of Mem0's intelligence. This method operates in two distinct modes.

Mode One: Direct Storage (infer=False)

When inference is disabled, the system takes a straightforward approach:

if not infer:
    for message_dict in messages:
        # Skip system messages
        if message_dict["role"] == "system":
            continue
        
        # Generate embeddings
        msg_embeddings = self.embedding_model.embed(msg_content, "add")
        
        # Create memory directly
        mem_id = self._create_memory(msg_content, msg_embeddings, per_msg_meta)
        
        returned_memories.append({
            "id": mem_id,
            "memory": msg_content,
            "event": "ADD"
        })
    return returned_memories

This mode suits applications requiring complete conversation preservation without intelligent filtering.

Mode Two: Intelligent Inference (infer=True)

The default mode unleashes Mem0's full capabilities through a multi-step process:

Step 1: Fact Extraction via LLM

The system employs carefully crafted prompts to extract meaningful facts from conversations:

# Select appropriate prompt based on memory type
is_agent_memory = self._should_use_agent_memory_extraction(messages, metadata)
system_prompt, user_prompt = get_fact_retrieval_messages(parsed_messages, is_agent_memory)

# Call LLM with structured output
response = self.llm.generate_response(
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt}
    ],
    response_format={"type": "json_object"}
)

# Parse JSON results
new_retrieved_facts = json.loads(response)["facts"]

The fact retrieval prompt (defined in mem0/configs/prompts.py) provides detailed guidance:

FACT_RETRIEVAL_PROMPT = """You are a Personal Information Organizer...

Types of Information to Remember:
1. Store Personal Preferences
2. Maintain Important Personal Details
3. Track Plans and Intentions
4. Remember Relationships
5. Note Important Events
6. Capture Skills and Knowledge
7. Track Context-Specific Information

Input: Hi, my name is John. I am a software engineer.
Output: {"facts": ["Name is John", "Is a Software engineer"]}
"""

This structured approach ensures consistent fact extraction across diverse conversation types.

Step 2: Similarity Search

For each extracted fact, the system searches for similar existing memories:

for new_mem in new_retrieved_facts:
    # Generate embeddings for the new fact
    messages_embeddings = self.embedding_model.embed(new_mem, "add")
    
    # Search for similar memories (vector similarity)
    existing_memories = self.vector_store.search(
        query=new_mem,
        vectors=messages_embeddings,
        limit=5,
        filters=search_filters
    )
    
    for mem in existing_memories:
        retrieved_old_memory.append({"id": mem.id, "text": mem.payload.get("data", "")})

This step prevents duplicate memories and enables intelligent updates to existing knowledge.

Step 3: Operation Decision Making

Using another LLM call, Mem0 decides what operation to perform for each fact:

# Build decision prompt
function_calling_prompt = get_update_memory_messages(
    retrieved_old_memory, new_retrieved_facts
)

# LLM returns operation decisions
response = self.llm.generate_response(
    messages=[{"role": "user", "content": function_calling_prompt}],
    response_format={"type": "json_object"}
)

new_memories_with_actions = json.loads(response)

The decision prompt defines four possible operations:

DEFAULT_UPDATE_MEMORY_PROMPT = """You are a smart memory manager...

Compare newly retrieved facts with the existing memory. For each new fact, decide whether to:
- ADD: Add it to the memory as a new element
- UPDATE: Update an existing memory element
- DELETE: Delete an existing memory element
- NONE: Make no change
"""

Example decision output:

{
  "memory": [
    {
      "id": "0",
      "text": "Likes cheese and chicken pizza",
      "event": "UPDATE",
      "old_memory": "Likes cheese pizza"
    },
    {
      "id": "1",
      "text": "Name is John",
      "event": "NONE"
    },
    {
      "id": "2",
      "text": "Dislikes cats",
      "event": "ADD"
    }
  ]
}

Step 4: Executing Operations

Based on the decisions, the system performs the appropriate CRUD operations:

for resp in new_memories_with_actions.get("memory", []):
    action_text = resp.get("text")
    event_type = resp.get("event")
    
    if event_type == "ADD":
        memory_id = self._create_memory(action_text, existing_embeddings, metadata)
    
    elif event_type == "UPDATE":
        self._update_memory(
            memory_id=temp_uuid_mapping[resp.get("id")],
            data=action_text,
            existing_embeddings=existing_embeddings,
            metadata=metadata
        )
    
    elif event_type == "DELETE":
        self._delete_memory(memory_id=temp_uuid_mapping[resp.get("id")])
    
    elif event_type == "NONE":
        # No action required
        pass

Memory Creation Internals

The _create_memory() method (line 1075) handles the actual storage:

def _create_memory(self, data, existing_embeddings, metadata=None):
    # Generate unique ID
    memory_id = str(uuid.uuid4())
    
    # Build metadata
    metadata["data"] = data
    metadata["hash"] = hashlib.md5(data.encode()).hexdigest()
    metadata["created_at"] = datetime.now(pytz.timezone("US/Pacific")).isoformat()
    
    # Store in vector database
    self.vector_store.insert(
        vectors=[embeddings],
        ids=[memory_id],
        payloads=[metadata]
    )
    
    # Record history (SQLite)
    self.db.add_history(memory_id, None, data, "ADD", ...)
    
    return memory_id

This method ensures each memory has a unique identifier, content hash for deduplication, and timestamp for temporal ordering.

Graph Store Addition: Building Knowledge Networks

While vector stores handle semantic similarity, the graph store (_add_to_graph()) manages entity relationships. Located at line 599, this method activates when graph storage is enabled.

Entry Point Logic

def _add_to_graph(self, messages, filters):
    if self.enable_graph:
        # Merge message contents
        data = "\n".join([msg["content"] for msg in messages if msg["role"] != "system"])
        
        # Call graph storage addition
        added_entities = self.graph.add(data, filters)
        
        return added_entities

Core Graph Operations

The graph storage logic (in mem0/memory/graph_memory.py:76) follows a systematic approach:

def add(self, data, filters):
    # Step 1: Extract entities
    entity_type_map = self._retrieve_nodes_from_data(data, filters)
    
    # Step 2: Establish entity relationships
    to_be_added = self._establish_nodes_relations_from_data(data, filters, entity_type_map)
    
    # Step 3: Search for similar nodes in graph database
    search_output = self._search_graph_db(node_list=list(entity_type_map.keys()), filters=filters)
    
    # Step 4: Identify entities to delete (conflicting relationships)
    to_be_deleted = self._get_delete_entities_from_search_output(search_output, data, filters)
    
    # Step 5: Execute deletions and additions
    deleted_entities = self._delete_entities(to_be_deleted, filters)
    added_entities = self._add_entities(to_be_added, filters, entity_type_map)
    
    return {"deleted_entities": deleted_entities, "added_entities": added_entities}

Entity Extraction Example

Using LLM-powered extraction:

entity_type_map = self._retrieve_nodes_from_data(data, filters)
# Result example:
# {
#   "john": "person",
#   "coffee": "drink",
#   "starbucks": "brand"
# }

Relationship Building Example

entities = self._establish_nodes_relations_from_data(data, filters, entity_type_map)
# Result example:
# [
#   {"source": "john", "relationship": "likes", "destination": "coffee"},
#   {"source": "coffee", "relationship": "brand", "destination": "starbucks"}
# ]

These relationships are ultimately stored in Neo4j:

(john:Person) -[:LIKES]-> (coffee:Drink) -[:BRAND]-> (starbucks:Brand)

Key Component Architecture

LLM Factory Pattern

Mem0 employs a factory pattern supporting multiple LLM providers:

class LlmFactory:
    provider_to_class = {
        "openai": ("mem0.llms.openai.OpenAILLM", OpenAIConfig),
        "anthropic": ("mem0.llms.anthropic.AnthropicLLM", AnthropicConfig),
        "azure_openai": ("mem0.llms.azure_openai.AzureOpenAILLM", AzureOpenAIConfig),
        "gemini": ("mem0.llms.gemini.GeminiLLM", BaseLlmConfig),
        ...
    }

Embedding Generation

class EmbedderFactory:
    provider_to_class = {
        "openai": "mem0.embeddings.openai.OpenAIEmbedding",
        "huggingface": "mem0.embeddings.huggingface.HuggingFaceEmbedding",
        ...
    }

Vector Store Support

class VectorStoreFactory:
    provider_to_class = {
        "qdrant": "mem0.vector_stores.qdrant.Qdrant",
        "chroma": "mem0.vector_stores.chroma.ChromaDB",
        "pinecone": "mem0.vector_stores.pinecone.PineconeDB",
        "milvus": "mem0.vector_stores.milvus.MilvusDB",
        "weaviate": "mem0.vector_stores.weaviate.Weaviate",
        ...
    }

This extensible architecture allows users to choose their preferred providers for each component.

Complete Flow Example

Let's trace a complete memory addition scenario:

from mem0 import Memory

memory = Memory()

# User conversation
messages = [
    {"role": "user", "content": "My name is Zhang San, I like drinking lattes"},
    {"role": "assistant", "content": "Hello Zhang San! Lattes are popular coffee drinks"}
]

# Add memory
result = memory.add(messages, user_id="zhangsan")

Expected result:

{
  "results": [
    {"id": "abc123", "memory": "Name is Zhang San", "event": "ADD"},
    {"id": "def456", "memory": "Likes latte coffee", "event": "ADD"}
  ],
  "relations": {
    "deleted_entities": [],
    "added_entities": [
      {"source": "Zhang San", "relationship": "likes", "target": "latte"}
    ]
  }
}

Flow Breakdown

Fact Extraction: LLM extracts ["Name is Zhang San", "Likes latte coffee"]
Similarity Search: Vector database search returns no similar memories (new user)
Decision: LLM determines both facts need ADD operations
Execution: Two memory records created with embeddings stored in vector database
Graph Storage: Entities extracted {"Zhang San": "person", "latte": "drink"} with relationship Zhang San -[:likes]-> latte stored in Neo4j

Design Highlights Worth Emulating

Intelligent Inference vs Direct Storage

Mem0's dual-mode operation serves different needs:

infer=True: Intelligent extraction, deduplication, and updates—ideal for production environments
infer=False: Direct storage preserving complete conversations—suitable for compliance or audit requirements

Dual-Storage Architecture

The combination of vector and graph storage provides complementary capabilities:

Vector Store: Fast semantic similarity search, perfect for retrieval scenarios
Graph Store: Entity relationship management, ideal for complex knowledge networks

Together, they offer comprehensive memory capabilities that neither could achieve alone.

Parallel Processing

Vector and graph storage operations execute concurrently:

with concurrent.futures.ThreadPoolExecutor() as executor:
    future1 = executor.submit(self._add_to_vector_store, ...)
    future2 = executor.submit(self._add_to_graph, ...)

This design choice demonstrates thoughtful performance optimization without complicating the API.

Prompt Engineering Excellence

Mem0's prompts represent carefully crafted instruments:

Fact Extraction Prompt: Defines seven information categories with detailed examples
Memory Update Prompt: Establishes clear ADD/UPDATE/DELETE/NONE rules
Custom Prompt Support: Allows custom_fact_extraction_prompt for domain-specific needs

These prompts aren't afterthoughts—they're central to Mem0's intelligence.

Conclusion: Lessons from Mem0's Architecture

Through this source code exploration, we've uncovered that Mem0's memory addition is far more than simple storage. It encompasses:

Intelligent Fact Extraction: Using LLMs to identify valuable information
Similarity-Based Retrieval: Preventing duplicates through vector search
Smart CRUD Decisions: LLM-powered add/update/delete decisions
Dual-Storage Architecture: Combining vector and graph databases

The core design principles include:

Using LLMs to intelligently manage the memory lifecycle
Leveraging vector storage for factual retrieval
Employing graph storage for entity relationships
Parallel processing for efficiency

Most importantly, prompt engineering sits at Mem0's heart. The carefully designed prompts for fact extraction and memory updates are what transform Mem0 from a simple database wrapper into an intelligent memory management system.

This architecture offers valuable lessons for anyone building AI systems requiring persistent, evolving knowledge. Mem0 demonstrates that effective AI memory isn't about storing everything—it's about intelligently curating what matters while maintaining the flexibility to adapt as understanding deepens.