Inside Mem0's Memory Engine: How Dual-Storage Architecture Powers Intelligent AI Recall

This is the first installment of an in-depth source code analysis series focusing on Mem0, an open-source project that provides a long-term memory layer for AI applications. In this comprehensive exploration, we will dissect the core functionality of Mem0—its memory addition mechanism—and uncover the sophisticated design principles and implementation details that make it work.

Mem0 (pronounced "mem-zero") represents a groundbreaking approach to giving AI applications the ability to maintain persistent memory across sessions. It enables AI assistants to remember user preferences, adapt to individualized needs, and engage in continuous learning—making it exceptionally well-suited for customer support chatbots, AI companions, autonomous systems, and any application requiring contextual continuity over time.

Before diving into the source code, several fundamental questions drove this investigation:

How does Mem0 extract valuable information from conversational exchanges?
What decision-making process determines whether to add new memories, update existing ones, or delete outdated information?
What distinct roles do vector storage and graph storage play in the overall architecture?
How does the system balance performance with accuracy in memory operations?

With these guiding questions in mind, let us embark on this technical journey through Mem0's internal workings.

Architectural Overview

Mem0's memory addition workflow follows a carefully orchestrated architecture that balances efficiency with intelligence. The core implementation resides in mem0/memory/main.py, with three primary components driving the entire process:

Memory.add(): The main entry point that orchestrates the entire memory addition workflow
_add_to_vector_store(): Handles all vector storage operations for semantic similarity-based retrieval
_add_to_graph(): Manages graph storage operations for entity-relationship mapping

The architecture embodies a dual-storage philosophy: vector storage excels at rapid similarity searches and semantic retrieval, while graph storage captures complex entity relationships and knowledge networks. Together, they provide complementary capabilities that neither could achieve alone.

The Entry Point: Memory.add() Method

Let us begin our exploration at the system's entry point, located at line 281 of mem0/memory/main.py. The method signature reveals the flexibility built into the design:

def add(
    self,
    messages,
    *,
    user_id: Optional[str] = None,
    agent_id: Optional[str] = None,
    run_id: Optional[str] = None,
    metadata: Optional[Dict[str, Any]] = None,
    infer: bool = True,
    memory_type: Optional[str] = None,
    prompt: Optional[str] = None,
):

Parameter Breakdown and Design Intentions

The messages parameter demonstrates remarkable flexibility in input handling. It accepts:

Plain strings: Simple text like "I prefer drinking coffee in the morning"
Single message dictionaries: Structured format such as {"role": "user", "content": "I prefer drinking coffee"}
Message lists: Complete conversation threads with multiple exchanges between user and assistant

The identifier parameters (user_id, agent_id, run_id) serve a critical isolation function. They ensure that memories remain properly segmented across different users, agents, or execution contexts—preventing cross-contamination of personal data and maintaining privacy boundaries.

The infer parameter represents a crucial design decision point:

When True (default): The system engages LLM-powered reasoning to extract facts, perform similarity matching, and make intelligent decisions about memory operations
When False: Messages are stored directly without any intelligent processing, suitable for scenarios requiring complete conversation preservation

The memory_type parameter enables specialized handling for different memory categories, such as "procedural_memory" for skill-based knowledge that differs from declarative facts.

Core Execution Flow

The simplified core logic reveals elegant design patterns:

# Step 1: Build metadata and filtering criteria
processed_metadata, effective_filters = _build_filters_and_metadata(
    user_id=user_id, 
    agent_id=agent_id, 
    run_id=run_id, 
    input_metadata=metadata
)

# Step 2: Handle special memory types (e.g., procedural memory)
if agent_id is not None and memory_type == MemoryType.PROCEDURAL.value:
    return self._create_procedural_memory(messages, metadata=processed_metadata)

# Step 3: Execute vector store and graph store additions in parallel
with concurrent.futures.ThreadPoolExecutor() as executor:
    future1 = executor.submit(
        self._add_to_vector_store, 
        messages, 
        processed_metadata, 
        effective_filters, 
        infer
    )
    future2 = executor.submit(
        self._add_to_graph, 
        messages, 
        effective_filters
    )
    
    concurrent.futures.wait([future1, future2])
    
    vector_store_result = future1.result()
    graph_result = future2.result()

# Step 4: Return consolidated results
return {"results": vector_store_result, "relations": graph_result}

The parallel execution design deserves special attention. By running vector storage and graph storage operations concurrently, Mem0 achieves significant performance improvements without sacrificing the integrity of either storage system. This concurrent approach reflects a mature understanding of modern multi-core processing capabilities.

Vector Storage Operations: _add_to_vector_store()

This method, located at line 386 of mem0/memory/main.py, represents the heart of Mem0's intelligent memory management. It implements two distinct operational modes.

Mode One: Direct Storage (infer=False)

When intelligent inference is disabled, the system takes a straightforward approach:

if not infer:
    for message_dict in messages:
        # Skip system messages
        if message_dict["role"] == "system":
            continue
        
        # Generate embeddings for the message content
        msg_embeddings = self.embedding_model.embed(msg_content, "add")
        
        # Create memory entry directly
        mem_id = self._create_memory(msg_content, msg_embeddings, per_msg_meta)
        
        returned_memories.append({
            "id": mem_id,
            "memory": msg_content,
            "event": "ADD"
        })
    
    return returned_memories

This mode prioritizes simplicity and completeness over intelligence—every message becomes a memory without filtering or deduplication.

Mode Two: Intelligent Inference (infer=True)

The default mode implements a sophisticated multi-stage pipeline that transforms raw conversation into structured, deduplicated knowledge.

Stage 1: Fact Extraction via LLM

The system begins by employing a large language model to distill valuable facts from the conversation:

# Select appropriate prompt based on memory type
is_agent_memory = self._should_use_agent_memory_extraction(messages, metadata)
system_prompt, user_prompt = get_fact_retrieval_messages(
    parsed_messages, 
    is_agent_memory
)

# Invoke LLM with structured output requirement
response = self.llm.generate_response(
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt}
    ],
    response_format={"type": "json_object"}
)

# Parse JSON results to extract facts
new_retrieved_facts = json.loads(response)["facts"]

The fact extraction prompt, defined in mem0/configs/prompts.py, provides detailed guidance on what constitutes memorable information:

FACT_RETRIEVAL_PROMPT = """You are a Personal Information Organizer...

Types of Information to Remember:
1. Store Personal Preferences (food, activities, entertainment choices)
2. Maintain Important Personal Details (name, occupation, location)
3. Track Plans and Intentions (upcoming events, goals, projects)
4. Record Relationship Information (family, friends, colleagues)
5. Capture Skills and Knowledge Areas (expertise, certifications)
6. Note Communication Preferences (tone, language, formality)
7. Document Accessibility Requirements (special needs, accommodations)

Input: Hi, my name is John. I am a software engineer who loves hiking.
Output: {"facts": ["Name is John", "Is a Software engineer", "Enjoys hiking activities"]}
"""

This carefully crafted prompt ensures consistent, structured extraction across diverse conversation types.

Stage 2: Similarity Search Against Existing Memories

For each newly extracted fact, the system performs a similarity search to identify potentially overlapping memories:

for new_mem in new_retrieved_facts:
    # Generate embedding vector for the new fact
    messages_embeddings = self.embedding_model.embed(new_mem, "add")
    
    # Search for similar existing memories using vector similarity
    existing_memories = self.vector_store.search(
        query=new_mem,
        vectors=messages_embeddings,
        limit=5,
        filters=search_filters
    )
    
    # Collect matching memories for comparison
    for mem in existing_memories:
        retrieved_old_memory.append({
            "id": mem.id, 
            "text": mem.payload.get("data", "")
        })

This step prevents memory bloat by identifying duplicates and near-duplicates before they enter the system.

Stage 3: Operation Decision via LLM Analysis

The system then employs another LLM call to decide the appropriate action for each fact:

# Construct decision prompt with old and new information
function_calling_prompt = get_update_memory_messages(
    retrieved_old_memory, 
    new_retrieved_facts
)

# LLM returns structured operation decisions
response = self.llm.generate_response(
    messages=[{"role": "user", "content": function_calling_prompt}],
    response_format={"type": "json_object"}
)

new_memories_with_actions = json.loads(response)

The decision prompt defines four possible operations:

DEFAULT_UPDATE_MEMORY_PROMPT = """You are a smart memory manager...

Compare newly retrieved facts with the existing memory. For each new fact, decide whether to:
- ADD: Add it to the memory as a completely new element
- UPDATE: Modify an existing memory element with new information
- DELETE: Remove an existing memory element that is now obsolete
- NONE: Make no change because the information already exists
"""

An example decision output illustrates the system's reasoning:

{
  "memory": [
    {
      "id": "0",
      "text": "Likes cheese and chicken pizza",
      "event": "UPDATE",
      "old_memory": "Likes cheese pizza"
    },
    {
      "id": "1",
      "text": "Name is John",
      "event": "NONE"
    },
    {
      "id": "2",
      "text": "Dislikes cats",
      "event": "ADD"
    }
  ]
}

Stage 4: Executing Decided Operations

Based on the LLM's decisions, the system executes the appropriate CRUD operations:

for resp in new_memories_with_actions.get("memory", []):
    action_text = resp.get("text")
    event_type = resp.get("event")
    
    if event_type == "ADD":
        memory_id = self._create_memory(
            action_text, 
            existing_embeddings, 
            metadata
        )
    
    elif event_type == "UPDATE":
        self._update_memory(
            memory_id=temp_uuid_mapping[resp.get("id")],
            data=action_text,
            existing_embeddings=existing_embeddings,
            metadata=metadata
        )
    
    elif event_type == "DELETE":
        self._delete_memory(
            memory_id=temp_uuid_mapping[resp.get("id")]
        )
    
    elif event_type == "NONE":
        # No action required - information already exists
        pass

Memory Creation Internals

The _create_memory() method (line 1075) handles the actual persistence:

def _create_memory(self, data, existing_embeddings, metadata=None):
    # Generate universally unique identifier
    memory_id = str(uuid.uuid4())
    
    # Build comprehensive metadata package
    metadata["data"] = data
    metadata["hash"] = hashlib.md5(data.encode()).hexdigest()
    metadata["created_at"] = datetime.now(
        pytz.timezone("US/Pacific")
    ).isoformat()
    
    # Insert into vector database with embedding and payload
    self.vector_store.insert(
        vectors=[embeddings],
        ids=[memory_id],
        payloads=[metadata]
    )
    
    # Record operation history in SQLite for audit trail
    self.db.add_history(
        memory_id, None, data, "ADD", ...
    )
    
    return memory_id

This method ensures each memory has a unique identity, temporal context, and content hash for deduplication verification.

Graph Storage Operations: _add_to_graph()

While vector storage excels at similarity search, graph storage captures the rich relationships between entities—creating a knowledge network rather than isolated facts.

Entry Method Overview

Located at line 599 of mem0/memory/main.py:

def _add_to_graph(self, messages, filters):
    if self.enable_graph:
        # Consolidate message content into single text
        data = "\n".join([
            msg["content"] for msg in messages 
            if msg["role"] != "system"
        ])
        
        # Invoke graph storage addition
        added_entities = self.graph.add(data, filters)
        
        return added_entities

Graph Storage Core Logic

The implementation in mem0/memory/graph_memory.py (line 76) follows a systematic five-step process:

def add(self, data, filters):
    # Step 1: Extract entities and their types from text
    entity_type_map = self._retrieve_nodes_from_data(data, filters)
    
    # Step 2: Establish relationships between identified entities
    to_be_added = self._establish_nodes_relations_from_data(
        data, filters, entity_type_map
    )
    
    # Step 3: Search graph database for similar existing nodes
    search_output = self._search_graph_db(
        node_list=list(entity_type_map.keys()), 
        filters=filters
    )
    
    # Step 4: Identify entities to delete (contradictory relationships)
    to_be_deleted = self._get_delete_entities_from_search_output(
        search_output, data, filters
    )
    
    # Step 5: Execute deletion and addition operations
    deleted_entities = self._delete_entities(to_be_deleted, filters)
    added_entities = self._add_entities(to_be_added, filters, entity_type_map)
    
    return {
        "deleted_entities": deleted_entities, 
        "added_entities": added_entities
    }

Entity Extraction Example

The system uses LLM-powered extraction to identify entities and classify them:

entity_type_map = self._retrieve_nodes_from_data(data, filters)
# Example result:
# {
#     "john": "person",
#     "coffee": "drink", 
#     "starbucks": "brand"
# }

Relationship Establishment Example

Another LLM call determines how entities relate to one another:

entities = self._establish_nodes_relations_from_data(
    data, filters, entity_type_map
)
# Example result:
# [
#     {"source": "john", "relationship": "likes", "destination": "coffee"},
#     {"source": "coffee", "relationship": "brand", "destination": "starbucks"}
# ]

These relationships are then persisted in Neo4j, creating a traversable knowledge graph:

(john:Person) -[:LIKES]-> (coffee:Drink) -[:BRAND]-> (starbucks:Brand)

Key Component Architecture

LLM Factory Pattern

Mem0 employs a factory pattern to support multiple LLM providers seamlessly:

# mem0/utils/factory.py
class LlmFactory:
    provider_to_class = {
        "openai": ("mem0.llms.openai.OpenAILLM", OpenAIConfig),
        "anthropic": ("mem0.llms.anthropic.AnthropicLLM", AnthropicConfig),
        "azure_openai": ("mem0.llms.azure_openai.AzureOpenAILLM", AzureOpenAILLMConfig),
        "gemini": ("mem0.llms.gemini.GeminiLLM", BaseLlmConfig),
        # Additional providers...
    }

This design enables users to switch between providers without modifying application code.

Embedding Generation

Similarly, embedding models are abstracted through a factory:

class EmbedderFactory:
    provider_to_class = {
        "openai": "mem0.embeddings.openai.OpenAIEmbedding",
        "huggingface": "mem0.embeddings.huggingface.HuggingFaceEmbedding",
        # Additional providers...
    }

Vector Store Support

The system supports numerous vector databases:

class VectorStoreFactory:
    provider_to_class = {
        "qdrant": "mem0.vector_stores.qdrant.Qdrant",
        "chroma": "mem0.vector_stores.chroma.ChromaDB",
        "pinecone": "mem0.vector_stores.pinecone.PineconeDB",
        "milvus": "mem0.vector_stores.milvus.MilvusDB",
        "weaviate": "mem0.vector_stores.weaviate.Weaviate",
        # Additional providers...
    }

This flexibility allows deployment in diverse infrastructure environments.

Complete Workflow Example

Let us trace a complete example from input to stored memory:

from mem0 import Memory

# Initialize memory system
memory = Memory()

# User conversation
messages = [
    {"role": "user", "content": "My name is Zhang San. I enjoy drinking lattes."},
    {"role": "assistant", "content": "Hello Zhang San! Lattes are a popular coffee choice."}
]

# Add to memory
result = memory.add(messages, user_id="zhangsan")

The resulting output demonstrates the dual-storage approach:

{
  "results": [
    {"id": "abc123", "memory": "Name is Zhang San", "event": "ADD"},
    {"id": "def456", "memory": "Enjoys drinking lattes", "event": "ADD"}
  ],
  "relations": {
    "deleted_entities": [],
    "added_entities": [
      {"source": "Zhang San", "relationship": "likes", "target": "latte"}
    ]
  }
}

Step-by-Step Breakdown

Fact Extraction: The LLM processes the conversation and extracts: ["Name is Zhang San", "Enjoys drinking lattes"]
Similarity Search: The system queries the vector database for similar existing memories. In this case (assuming a new user), no matches are found.
Operation Decision: The LLM determines both facts require ADD operations since they represent new information.
Operation Execution:
- Two memory records are created
- Embeddings are generated and stored in the vector database
- Metadata including timestamps and hashes are recorded
Graph Storage Processing:
- Entities are extracted: {"Zhang San": "person", "latte": "drink"}
- Relationships are established: Zhang San -[:likes]-> latte
- The relationship is persisted in Neo4j

Design Highlights and Innovations

Intelligent Inference vs Direct Storage

Mem0's dual-mode operation provides flexibility for different use cases:

infer=True: Intelligent extraction, deduplication, and update decisions—ideal for production environments where memory quality matters
infer=False: Direct storage preserving complete conversation history—suitable for compliance, auditing, or complete transcript requirements

Dual-Storage Architecture Philosophy

The combination of vector and graph storage addresses complementary needs:

Vector Storage: Excels at semantic similarity search, enabling retrieval of related memories even when wording differs
Graph Storage: Captures explicit entity relationships, enabling complex queries like "What does Zhang San like?" or "Find all people who enjoy coffee"

Together, they provide retrieval capabilities neither could achieve independently.

Parallel Processing for Performance

The concurrent execution of vector and graph operations demonstrates performance-conscious design:

with concurrent.futures.ThreadPoolExecutor() as executor:
    future1 = executor.submit(self._add_to_vector_store, ...)
    future2 = executor.submit(self._add_to_graph, ...)

This approach reduces total operation time by leveraging multi-core processors effectively.

Prompt Engineering Excellence

Mem0's intelligence fundamentally relies on carefully crafted prompts:

Fact Extraction Prompt: Defines seven comprehensive information categories with examples
Memory Update Prompt: Establishes clear ADD/UPDATE/DELETE/NONE decision criteria
Customization Support: Allows users to provide custom prompts for domain-specific needs

These prompts represent significant engineering effort and are central to Mem0's effectiveness.

Summary and Key Takeaways

Through this comprehensive source code analysis, we have discovered that Mem0's memory addition is far more sophisticated than simple storage. The system implements:

Intelligent Fact Extraction: Using LLMs to distill valuable information from conversations
Similarity-Based Retrieval: Preventing duplication through vector similarity matching
Decision-Making Pipeline: LLM-powered ADD/UPDATE/DELETE/NONE decisions
Dual-Storage Architecture: Combining vector and graph storage for complementary capabilities

The core design philosophy centers on using LLMs to intelligently manage the entire memory lifecycle, rather than treating memory as passive storage. Vector storage handles fact retrieval efficiently, while graph storage manages entity relationships for complex knowledge networks. Parallel processing ensures performance remains acceptable despite the computational overhead of LLM calls.

Most critically, Mem0 demonstrates that prompt engineering is not an afterthought but a core architectural component. The carefully designed prompts for fact extraction and memory updates are what transform raw LLM capabilities into structured, reliable memory management.

Series Preview

This article has explored Mem0's memory addition workflow, including intelligent fact extraction, similarity retrieval, update decision-making, and dual-storage architecture. The next installment will dive deeper into the prompt engineering techniques that power these capabilities.

Coming in Part Two: A deep dive into prompt engineering strategies, including how Mem0 crafts effective prompts for different scenarios, handles edge cases, and enables customization for domain-specific applications.

Referenced Source Files

mem0/memory/main.py: Core Memory class implementation
mem0/memory/graph_memory.py: Graph storage operations
mem0/configs/prompts.py: All prompt definitions
mem0/utils/factory.py: Factory classes for LLMs, embeddings, and vector stores