Tech workspace setup — Photo by Christin Hume on Unsplash

Building AI Agent Memory Systems: In-Context, External and Episodic Memory

Memory is what makes AI agents genuinely useful over time. An agent that forgets your preferences after every conversation, that cannot recall how it solved a similar problem last week, or that must re-read an entire knowledge base on every query is severely limited. Memory transforms an agent from a stateless responder into a system that learns and improves from experience.

This tutorial builds a complete agent memory system from scratch, covering all four memory types and showing how they work together. We use LangChain, Chroma (local vector store), and optionally Pinecone (production vector store).

Prerequisites: Python 3.11+, LangChain familiarity, OpenAI API key, basic understanding of embeddings.

The Four Types of Agent Memory#

Before writing code, understand what each memory type does and when to use it:

┌─────────────────────────────────────────────────────────┐
│                  AI Agent Memory System                   │
│                                                           │
│  ┌──────────────────┐     ┌──────────────────────────┐   │
│  │  IN-CONTEXT      │     │  EXTERNAL MEMORY          │   │
│  │  Memory          │     │  (Vector DB)              │   │
│  │                  │     │                           │   │
│  │  - Current conv  │     │  - Past conversations     │   │
│  │  - Recent msgs   │     │  - Knowledge base         │   │
│  │  - Active task   │     │  - Retrieved on demand    │   │
│  │  - Fast, limited │     │  - Unlimited, slower      │   │
│  └──────────────────┘     └──────────────────────────┘   │
│                                                           │
│  ┌──────────────────┐     ┌──────────────────────────┐   │
│  │  EPISODIC        │     │  SEMANTIC                 │   │
│  │  Memory          │     │  Memory                   │   │
│  │                  │     │                           │   │
│  │  - Past events   │     │  - General knowledge      │   │
│  │  - What happened │     │  - Domain facts           │   │
│  │  - When/context  │     │  - Relationships          │   │
│  └──────────────────┘     └──────────────────────────┘   │
└─────────────────────────────────────────────────────────┘

Setup#

pip install langchain langchain-openai langchain-community chromadb pinecone-client python-dotenv tiktoken

import os
from datetime import datetime
from typing import List, Optional, Dict, Any
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_core.messages import HumanMessage, AIMessage, SystemMessage
from langchain_community.vectorstores import Chroma
from dotenv import load_dotenv

load_dotenv()

llm = ChatOpenAI(model="gpt-4o", temperature=0.1)
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

Memory Type 1: In-Context Memory (Conversation History)#

In-context memory is the simplest form — the conversation history within the LLM's active context window.

from langchain_core.chat_history import BaseChatMessageHistory
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

# Store conversation histories in memory (use Redis/PostgreSQL in production)
session_store: Dict[str, ChatMessageHistory] = {}

def get_session_history(session_id: str) -> BaseChatMessageHistory:
    if session_id not in session_store:
        session_store[session_id] = ChatMessageHistory()
    return session_store[session_id]

# Build chain with conversation history
prompt = ChatPromptTemplate.from_messages([
    SystemMessage(content="You are a helpful AI assistant with excellent memory."),
    MessagesPlaceholder(variable_name="history"),
    ("human", "{input}")
])

chain = prompt | llm

# Wrap with message history
chain_with_history = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="history"
)

# Usage
session_id = "user_sarah_001"
config = {"configurable": {"session_id": session_id}}

response1 = chain_with_history.invoke(
    {"input": "My name is Sarah and I'm working on a Python data pipeline."},
    config=config
)
print(response1.content)

response2 = chain_with_history.invoke(
    {"input": "What are best practices for my project?"},
    config=config
)
# Agent remembers Sarah is working on a Python data pipeline
print(response2.content)

Managing Context Window Size#

from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.messages import BaseMessage, trim_messages

class TrimmedChatHistory(BaseChatMessageHistory):
    """Chat history that automatically trims old messages."""

    def __init__(self, max_tokens: int = 4000):
        self.messages: List[BaseMessage] = []
        self.max_tokens = max_tokens

    def add_messages(self, messages: List[BaseMessage]) -> None:
        self.messages.extend(messages)
        # Trim to keep within token budget
        self.messages = trim_messages(
            self.messages,
            max_tokens=self.max_tokens,
            token_counter=llm,
            strategy="last",           # Keep the most recent messages
            start_on="human",         # Start trim on human message
            include_system=True        # Always keep system messages
        )

    def clear(self) -> None:
        self.messages = []

Memory Type 2: External Memory with Vector Database#

External memory allows agents to store and retrieve information beyond the context window.

Setting Up Chroma Vector Store#

from langchain_community.vectorstores import Chroma
from langchain_core.documents import Document

# Initialize persistent Chroma vector store
vectorstore = Chroma(
    persist_directory="./agent_memory",
    embedding_function=embeddings,
    collection_name="agent_long_term_memory"
)

class ExternalMemory:
    """Long-term memory using a vector database."""

    def __init__(self, vectorstore: Chroma, user_id: str):
        self.vectorstore = vectorstore
        self.user_id = user_id

    def store(self, content: str, metadata: Optional[Dict] = None) -> str:
        """Store a memory with metadata for later retrieval."""
        if metadata is None:
            metadata = {}

        metadata.update({
            "user_id": self.user_id,
            "timestamp": datetime.now().isoformat(),
            "type": metadata.get("type", "general")
        })

        doc = Document(page_content=content, metadata=metadata)
        ids = self.vectorstore.add_documents([doc])
        return ids[0]

    def retrieve(self, query: str, k: int = 5, memory_type: Optional[str] = None) -> List[Document]:
        """Retrieve relevant memories using semantic search."""
        filter_dict = {"user_id": self.user_id}
        if memory_type:
            filter_dict["type"] = memory_type

        return self.vectorstore.similarity_search(
            query,
            k=k,
            filter=filter_dict
        )

    def retrieve_with_scores(self, query: str, k: int = 5) -> List[tuple]:
        """Retrieve memories with relevance scores."""
        filter_dict = {"user_id": self.user_id}
        return self.vectorstore.similarity_search_with_relevance_scores(
            query,
            k=k,
            filter=filter_dict
        )

    def format_memories_for_context(self, memories: List[Document]) -> str:
        """Format retrieved memories for inclusion in LLM context."""
        if not memories:
            return "No relevant past memories found."

        formatted = ["Relevant memories from past interactions:"]
        for i, mem in enumerate(memories, 1):
            timestamp = mem.metadata.get("timestamp", "Unknown time")
            mem_type = mem.metadata.get("type", "general")
            formatted.append(f"\n[Memory {i} - {mem_type} - {timestamp[:10]}]")
            formatted.append(mem.page_content)

        return "\n".join(formatted)

Agent with External Memory#

class MemoryAwareAgent:
    """An agent that uses external memory for long-term recall."""

    def __init__(self, user_id: str, vectorstore: Chroma):
        self.user_id = user_id
        self.memory = ExternalMemory(vectorstore, user_id)
        self.session_messages = []  # In-context memory

    def chat(self, user_input: str) -> str:
        """Process user input with memory retrieval and storage."""

        # 1. Retrieve relevant memories
        relevant_memories = self.memory.retrieve(user_input, k=3)
        memory_context = self.memory.format_memories_for_context(relevant_memories)

        # 2. Build prompt with memory context
        system_prompt = f"""You are a helpful AI assistant with long-term memory.

{memory_context}

Use the above memories to personalize your responses when relevant.
If the memories are not relevant to the current question, ignore them."""

        # 3. Add current message to session history
        self.session_messages.append(HumanMessage(content=user_input))

        # 4. Generate response
        messages = [SystemMessage(content=system_prompt)] + self.session_messages[-10:]
        response = llm.invoke(messages)

        # 5. Add response to session history
        self.session_messages.append(AIMessage(content=response.content))

        # 6. Extract and store important information from this exchange
        self._extract_and_store_memories(user_input, response.content)

        return response.content

    def _extract_and_store_memories(self, user_input: str, assistant_response: str) -> None:
        """Extract key facts from the interaction and store them."""
        extraction_prompt = f"""Analyze this conversation exchange and extract important facts worth remembering.
        Focus on: preferences, personal information, project details, decisions made, problems solved.
        Only extract genuinely useful, specific facts. Skip pleasantries and generic content.

        User: {user_input}
        Assistant: {assistant_response}

        List each fact on a separate line, or respond with "NO_FACTS" if nothing important to remember."""

        extraction_response = llm.invoke([HumanMessage(content=extraction_prompt)])
        content = extraction_response.content

        if content.strip() == "NO_FACTS" or not content.strip():
            return

        # Store each extracted fact
        facts = [f.strip() for f in content.strip().split("\n") if f.strip() and f.strip() != "NO_FACTS"]
        for fact in facts:
            self.memory.store(
                content=fact,
                metadata={
                    "type": "extracted_fact",
                    "source_input": user_input[:100]
                }
            )

# Usage
agent = MemoryAwareAgent(user_id="sarah_001", vectorstore=vectorstore)

response = agent.chat("I'm building a data pipeline in Python for processing customer transactions")
print(response)

# Later session (agent retrieves stored memories)
agent2 = MemoryAwareAgent(user_id="sarah_001", vectorstore=vectorstore)
response = agent2.chat("What database should I use for my project?")
# Agent recalls Sarah is working on Python + customer transaction processing
print(response)

Memory Type 3: Episodic Memory#

Episodic memory stores specific past interactions as structured records — what happened, when, and in what context.

from dataclasses import dataclass, asdict
import json

@dataclass
class Episode:
    """A single past interaction episode."""
    episode_id: str
    user_id: str
    timestamp: str
    task_description: str
    user_input_summary: str
    outcome_summary: str
    tools_used: List[str]
    success: bool
    key_learnings: str
    tags: List[str]

class EpisodicMemory:
    """Stores and retrieves specific past episodes of agent interactions."""

    def __init__(self, vectorstore: Chroma, user_id: str):
        self.vectorstore = vectorstore
        self.user_id = user_id

    def store_episode(self, episode: Episode) -> None:
        """Store a complete interaction episode."""
        # Create rich text representation for embedding
        episode_text = f"""Task: {episode.task_description}
User summary: {episode.user_input_summary}
Outcome: {episode.outcome_summary}
Tools used: {', '.join(episode.tools_used)}
Success: {episode.success}
Key learnings: {episode.key_learnings}
Tags: {', '.join(episode.tags)}"""

        metadata = {
            "user_id": self.user_id,
            "type": "episode",
            "episode_id": episode.episode_id,
            "timestamp": episode.timestamp,
            "success": episode.success,
            **{f"tag_{tag}": True for tag in episode.tags}
        }

        doc = Document(page_content=episode_text, metadata=metadata)
        self.vectorstore.add_documents([doc])

    def recall_similar_episodes(self, current_task: str, k: int = 3) -> List[Document]:
        """Find past episodes similar to the current task."""
        return self.vectorstore.similarity_search(
            current_task,
            k=k,
            filter={"user_id": self.user_id, "type": "episode"}
        )

    def get_successful_patterns(self, task_type: str) -> List[Document]:
        """Retrieve past episodes where similar tasks succeeded."""
        return self.vectorstore.similarity_search(
            task_type,
            k=5,
            filter={"user_id": self.user_id, "type": "episode", "success": True}
        )

def create_episode_from_interaction(
    user_id: str,
    task: str,
    conversation: List[Dict],
    tools_used: List[str],
    success: bool
) -> Episode:
    """Use an LLM to create a structured episode from a raw interaction."""

    conv_text = "\n".join([f"{m['role']}: {m['content']}" for m in conversation])

    summary_prompt = f"""Summarize this agent interaction into structured memory:
    Task: {task}
    Conversation: {conv_text[:2000]}...

    Provide:
    1. USER_SUMMARY: One sentence summary of what the user needed
    2. OUTCOME: One sentence summary of what happened / what was produced
    3. LEARNINGS: Key insights from this interaction (1-2 sentences)
    4. TAGS: 3-5 relevant tags (comma-separated)"""

    response = llm.invoke([HumanMessage(content=summary_prompt)])
    content = response.content

    # Parse response (simplified)
    user_summary = _extract_field(content, "USER_SUMMARY")
    outcome = _extract_field(content, "OUTCOME")
    learnings = _extract_field(content, "LEARNINGS")
    tags_str = _extract_field(content, "TAGS")
    tags = [t.strip() for t in tags_str.split(",") if t.strip()]

    return Episode(
        episode_id=f"ep_{user_id}_{datetime.now().strftime('%Y%m%d_%H%M%S')}",
        user_id=user_id,
        timestamp=datetime.now().isoformat(),
        task_description=task,
        user_input_summary=user_summary,
        outcome_summary=outcome,
        tools_used=tools_used,
        success=success,
        key_learnings=learnings,
        tags=tags
    )

def _extract_field(text: str, field_name: str) -> str:
    """Extract a labeled field from LLM output."""
    if f"{field_name}:" in text:
        return text.split(f"{field_name}:")[1].split("\n")[0].strip()
    return ""

Memory Type 4: Semantic Memory (Knowledge Base)#

Semantic memory represents structured knowledge about a domain — facts, concepts, and their relationships.

from langchain_community.document_loaders import TextLoader, PDFMinerLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

class SemanticMemory:
    """Domain knowledge base using a vector store."""

    def __init__(self, vectorstore: Chroma):
        self.vectorstore = vectorstore
        self.splitter = RecursiveCharacterTextSplitter(
            chunk_size=1000,
            chunk_overlap=200
        )

    def ingest_document(self, text: str, source: str, category: str) -> int:
        """Add a document to the knowledge base."""
        chunks = self.splitter.split_text(text)
        docs = [
            Document(
                page_content=chunk,
                metadata={"source": source, "category": category, "type": "semantic"}
            )
            for chunk in chunks
        ]
        self.vectorstore.add_documents(docs)
        return len(docs)

    def query(self, question: str, category: Optional[str] = None, k: int = 5) -> List[Document]:
        """Query the knowledge base."""
        filter_dict = {"type": "semantic"}
        if category:
            filter_dict["category"] = category

        return self.vectorstore.similarity_search(
            question, k=k, filter=filter_dict
        )

# Initialize specialized vector stores for different memory types
episodic_store = Chroma(
    persist_directory="./memory/episodic",
    embedding_function=embeddings,
    collection_name="episodic_memory"
)

semantic_store = Chroma(
    persist_directory="./memory/semantic",
    embedding_function=embeddings,
    collection_name="semantic_memory"
)

working_store = Chroma(
    persist_directory="./memory/working",
    embedding_function=embeddings,
    collection_name="working_memory"
)

Complete Memory System Integration#

Combining all four memory types into a unified agent:

class FullMemoryAgent:
    """Agent with all four memory types integrated."""

    def __init__(self, user_id: str):
        self.user_id = user_id
        self.session_messages = []  # In-context memory

        # External memories
        self.external_memory = ExternalMemory(working_store, user_id)
        self.episodic_memory = EpisodicMemory(episodic_store, user_id)
        self.semantic_memory = SemanticMemory(semantic_store)

        # Session tracking
        self.session_start = datetime.now().isoformat()
        self.tools_used_this_session = []
        self.session_successful = True

    def chat(self, user_input: str) -> str:
        """Process message with full memory context."""

        # 1. Retrieve from all memory types
        external_memories = self.external_memory.retrieve(user_input, k=3)
        similar_episodes = self.episodic_memory.recall_similar_episodes(user_input, k=2)
        knowledge = self.semantic_memory.query(user_input, k=3)

        # 2. Build rich context
        context_parts = []

        if external_memories:
            context_parts.append("=== Past Preferences & Facts ===")
            context_parts.append(self.external_memory.format_memories_for_context(external_memories))

        if similar_episodes:
            context_parts.append("\n=== Similar Past Interactions ===")
            for ep in similar_episodes:
                context_parts.append(ep.page_content)

        if knowledge:
            context_parts.append("\n=== Relevant Knowledge ===")
            for doc in knowledge:
                context_parts.append(f"[{doc.metadata.get('source', 'Knowledge Base')}]: {doc.page_content}")

        memory_context = "\n".join(context_parts) if context_parts else "No relevant memories found."

        # 3. Generate response
        system_prompt = f"""You are a helpful AI assistant with comprehensive memory.

{memory_context}

Use relevant context from memory when it improves your responses.
Today's date: {datetime.now().strftime('%Y-%m-%d')}"""

        self.session_messages.append(HumanMessage(content=user_input))
        messages = [SystemMessage(content=system_prompt)] + self.session_messages[-15:]
        response = llm.invoke(messages)
        self.session_messages.append(AIMessage(content=response.content))

        # 4. Update memories
        self.external_memory._extract_and_store_memories = lambda u, r: self._store_interaction_memory(u, r)
        self._store_interaction_memory(user_input, response.content)

        return response.content

    def _store_interaction_memory(self, user_input: str, response: str) -> None:
        """Store key facts from the interaction."""
        extract_prompt = f"""Extract important facts from this exchange worth long-term remembering.
        Only extract specific, useful facts about the user's needs, preferences, or context.

        User: {user_input}
        Assistant: {response}

        Return facts as a simple list, or "NO_FACTS":"""

        result = llm.invoke([HumanMessage(content=extract_prompt)])
        if "NO_FACTS" not in result.content:
            facts = [f.strip() for f in result.content.strip().split("\n") if f.strip()]
            for fact in facts:
                self.external_memory.store(fact, {"type": "extracted_fact"})

    def end_session(self, task: str, success: bool = True) -> None:
        """Save episode memory at end of session."""
        if len(self.session_messages) < 2:
            return  # Nothing meaningful to save

        conv = [
            {"role": "human" if isinstance(m, HumanMessage) else "assistant", "content": m.content}
            for m in self.session_messages
        ]

        episode = create_episode_from_interaction(
            user_id=self.user_id,
            task=task,
            conversation=conv,
            tools_used=self.tools_used_this_session,
            success=success
        )

        self.episodic_memory.store_episode(episode)
        print(f"Session episode saved: {episode.episode_id}")


# Example usage
agent = FullMemoryAgent(user_id="sarah_001")

# Populate semantic memory (do this once during setup)
agent.semantic_memory.ingest_document(
    text="""LangGraph is a library for building stateful, multi-actor applications with LLMs.
    Key features: State management with TypedDict, graph-based workflow definition,
    built-in checkpointing, human-in-the-loop support, streaming...""",
    source="LangGraph Documentation",
    category="technical"
)

# Chat with full memory
r1 = agent.chat("I'm trying to build a research agent that remembers past searches")
print(r1)

r2 = agent.chat("What framework should I use for this?")
print(r2)  # Agent recalls user's goal and retrieves LangGraph knowledge

# End session and save episode
agent.end_session("Building a research agent with persistent memory", success=True)

Production with Pinecone#

Replace Chroma with Pinecone for production scale:

from pinecone import Pinecone, ServerlessSpec
from langchain_pinecone import PineconeVectorStore

# Initialize Pinecone
pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])

# Create index if it doesn't exist
index_name = "agent-memory"
if index_name not in pc.list_indexes().names():
    pc.create_index(
        name=index_name,
        dimension=1536,  # text-embedding-3-small dimension
        metric="cosine",
        spec=ServerlessSpec(cloud="aws", region="us-east-1")
    )

index = pc.Index(index_name)

# Use Pinecone as drop-in replacement for Chroma
pinecone_vectorstore = PineconeVectorStore(
    index=index,
    embedding=embeddings,
    namespace="production"
)

# All the same ExternalMemory, EpisodicMemory, SemanticMemory classes work unchanged
production_memory = ExternalMemory(pinecone_vectorstore, user_id="prod_user_001")

Key Takeaways#

Memory is the most powerful capability gap between current AI assistants and genuinely useful AI agents:

In-context memory is fast but limited — always manage conversation history size
External memory (vector DB) enables recall beyond context windows — use Chroma locally, Pinecone in production
Episodic memory stores what happened, enabling the agent to learn from past interactions
Semantic memory provides domain knowledge grounding for more accurate, specialized responses
Always extract facts at the end of interactions — what seems obvious in the moment is easily forgotten
Separate memory stores for different types prevents interference and enables targeted retrieval

For the next step, learn how to evaluate your memory system's effectiveness in our AI Agent Evaluation Metrics tutorial.

Building AI Agent Memory Systems: In-Context, External and Episodic Memory

Prerequisites: Python 3.11+, LangChain familiarity, OpenAI API key, basic understanding of embeddings.

The Four Types of Agent Memory#

Before writing code, understand what each memory type does and when to use it:

┌─────────────────────────────────────────────────────────┐
│                  AI Agent Memory System                   │
│                                                           │
│  ┌──────────────────┐     ┌──────────────────────────┐   │
│  │  IN-CONTEXT      │     │  EXTERNAL MEMORY          │   │
│  │  Memory          │     │  (Vector DB)              │   │
│  │                  │     │                           │   │
│  │  - Current conv  │     │  - Past conversations     │   │
│  │  - Recent msgs   │     │  - Knowledge base         │   │
│  │  - Active task   │     │  - Retrieved on demand    │   │
│  │  - Fast, limited │     │  - Unlimited, slower      │   │
│  └──────────────────┘     └──────────────────────────┘   │
│                                                           │
│  ┌──────────────────┐     ┌──────────────────────────┐   │
│  │  EPISODIC        │     │  SEMANTIC                 │   │
│  │  Memory          │     │  Memory                   │   │
│  │                  │     │                           │   │
│  │  - Past events   │     │  - General knowledge      │   │
│  │  - What happened │     │  - Domain facts           │   │
│  │  - When/context  │     │  - Relationships          │   │
│  └──────────────────┘     └──────────────────────────┘   │
└─────────────────────────────────────────────────────────┘

Setup#

pip install langchain langchain-openai langchain-community chromadb pinecone-client python-dotenv tiktoken

import os
from datetime import datetime
from typing import List, Optional, Dict, Any
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_core.messages import HumanMessage, AIMessage, SystemMessage
from langchain_community.vectorstores import Chroma
from dotenv import load_dotenv

load_dotenv()

llm = ChatOpenAI(model="gpt-4o", temperature=0.1)
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

Memory Type 1: In-Context Memory (Conversation History)#

In-context memory is the simplest form — the conversation history within the LLM's active context window.

from langchain_core.chat_history import BaseChatMessageHistory
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

# Store conversation histories in memory (use Redis/PostgreSQL in production)
session_store: Dict[str, ChatMessageHistory] = {}

def get_session_history(session_id: str) -> BaseChatMessageHistory:
    if session_id not in session_store:
        session_store[session_id] = ChatMessageHistory()
    return session_store[session_id]

# Build chain with conversation history
prompt = ChatPromptTemplate.from_messages([
    SystemMessage(content="You are a helpful AI assistant with excellent memory."),
    MessagesPlaceholder(variable_name="history"),
    ("human", "{input}")
])

chain = prompt | llm

# Wrap with message history
chain_with_history = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="history"
)

# Usage
session_id = "user_sarah_001"
config = {"configurable": {"session_id": session_id}}

response1 = chain_with_history.invoke(
    {"input": "My name is Sarah and I'm working on a Python data pipeline."},
    config=config
)
print(response1.content)

response2 = chain_with_history.invoke(
    {"input": "What are best practices for my project?"},
    config=config
)
# Agent remembers Sarah is working on a Python data pipeline
print(response2.content)

Managing Context Window Size#

from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.messages import BaseMessage, trim_messages

class TrimmedChatHistory(BaseChatMessageHistory):
    """Chat history that automatically trims old messages."""

    def __init__(self, max_tokens: int = 4000):
        self.messages: List[BaseMessage] = []
        self.max_tokens = max_tokens

    def add_messages(self, messages: List[BaseMessage]) -> None:
        self.messages.extend(messages)
        # Trim to keep within token budget
        self.messages = trim_messages(
            self.messages,
            max_tokens=self.max_tokens,
            token_counter=llm,
            strategy="last",           # Keep the most recent messages
            start_on="human",         # Start trim on human message
            include_system=True        # Always keep system messages
        )

    def clear(self) -> None:
        self.messages = []

Memory Type 2: External Memory with Vector Database#

External memory allows agents to store and retrieve information beyond the context window.

Setting Up Chroma Vector Store#

from langchain_community.vectorstores import Chroma
from langchain_core.documents import Document

# Initialize persistent Chroma vector store
vectorstore = Chroma(
    persist_directory="./agent_memory",
    embedding_function=embeddings,
    collection_name="agent_long_term_memory"
)

class ExternalMemory:
    """Long-term memory using a vector database."""

    def __init__(self, vectorstore: Chroma, user_id: str):
        self.vectorstore = vectorstore
        self.user_id = user_id

    def store(self, content: str, metadata: Optional[Dict] = None) -> str:
        """Store a memory with metadata for later retrieval."""
        if metadata is None:
            metadata = {}

        metadata.update({
            "user_id": self.user_id,
            "timestamp": datetime.now().isoformat(),
            "type": metadata.get("type", "general")
        })

        doc = Document(page_content=content, metadata=metadata)
        ids = self.vectorstore.add_documents([doc])
        return ids[0]

    def retrieve(self, query: str, k: int = 5, memory_type: Optional[str] = None) -> List[Document]:
        """Retrieve relevant memories using semantic search."""
        filter_dict = {"user_id": self.user_id}
        if memory_type:
            filter_dict["type"] = memory_type

        return self.vectorstore.similarity_search(
            query,
            k=k,
            filter=filter_dict
        )

    def retrieve_with_scores(self, query: str, k: int = 5) -> List[tuple]:
        """Retrieve memories with relevance scores."""
        filter_dict = {"user_id": self.user_id}
        return self.vectorstore.similarity_search_with_relevance_scores(
            query,
            k=k,
            filter=filter_dict
        )

    def format_memories_for_context(self, memories: List[Document]) -> str:
        """Format retrieved memories for inclusion in LLM context."""
        if not memories:
            return "No relevant past memories found."

        formatted = ["Relevant memories from past interactions:"]
        for i, mem in enumerate(memories, 1):
            timestamp = mem.metadata.get("timestamp", "Unknown time")
            mem_type = mem.metadata.get("type", "general")
            formatted.append(f"\n[Memory {i} - {mem_type} - {timestamp[:10]}]")
            formatted.append(mem.page_content)

        return "\n".join(formatted)

Agent with External Memory#

class MemoryAwareAgent:
    """An agent that uses external memory for long-term recall."""

    def __init__(self, user_id: str, vectorstore: Chroma):
        self.user_id = user_id
        self.memory = ExternalMemory(vectorstore, user_id)
        self.session_messages = []  # In-context memory

    def chat(self, user_input: str) -> str:
        """Process user input with memory retrieval and storage."""

        # 1. Retrieve relevant memories
        relevant_memories = self.memory.retrieve(user_input, k=3)
        memory_context = self.memory.format_memories_for_context(relevant_memories)

        # 2. Build prompt with memory context
        system_prompt = f"""You are a helpful AI assistant with long-term memory.

{memory_context}

Use the above memories to personalize your responses when relevant.
If the memories are not relevant to the current question, ignore them."""

        # 3. Add current message to session history
        self.session_messages.append(HumanMessage(content=user_input))

        # 4. Generate response
        messages = [SystemMessage(content=system_prompt)] + self.session_messages[-10:]
        response = llm.invoke(messages)

        # 5. Add response to session history
        self.session_messages.append(AIMessage(content=response.content))

        # 6. Extract and store important information from this exchange
        self._extract_and_store_memories(user_input, response.content)

        return response.content

    def _extract_and_store_memories(self, user_input: str, assistant_response: str) -> None:
        """Extract key facts from the interaction and store them."""
        extraction_prompt = f"""Analyze this conversation exchange and extract important facts worth remembering.
        Focus on: preferences, personal information, project details, decisions made, problems solved.
        Only extract genuinely useful, specific facts. Skip pleasantries and generic content.

        User: {user_input}
        Assistant: {assistant_response}

        List each fact on a separate line, or respond with "NO_FACTS" if nothing important to remember."""

        extraction_response = llm.invoke([HumanMessage(content=extraction_prompt)])
        content = extraction_response.content

        if content.strip() == "NO_FACTS" or not content.strip():
            return

        # Store each extracted fact
        facts = [f.strip() for f in content.strip().split("\n") if f.strip() and f.strip() != "NO_FACTS"]
        for fact in facts:
            self.memory.store(
                content=fact,
                metadata={
                    "type": "extracted_fact",
                    "source_input": user_input[:100]
                }
            )

# Usage
agent = MemoryAwareAgent(user_id="sarah_001", vectorstore=vectorstore)

response = agent.chat("I'm building a data pipeline in Python for processing customer transactions")
print(response)

# Later session (agent retrieves stored memories)
agent2 = MemoryAwareAgent(user_id="sarah_001", vectorstore=vectorstore)
response = agent2.chat("What database should I use for my project?")
# Agent recalls Sarah is working on Python + customer transaction processing
print(response)

Memory Type 3: Episodic Memory#

Episodic memory stores specific past interactions as structured records — what happened, when, and in what context.

from dataclasses import dataclass, asdict
import json

@dataclass
class Episode:
    """A single past interaction episode."""
    episode_id: str
    user_id: str
    timestamp: str
    task_description: str
    user_input_summary: str
    outcome_summary: str
    tools_used: List[str]
    success: bool
    key_learnings: str
    tags: List[str]

class EpisodicMemory:
    """Stores and retrieves specific past episodes of agent interactions."""

    def __init__(self, vectorstore: Chroma, user_id: str):
        self.vectorstore = vectorstore
        self.user_id = user_id

    def store_episode(self, episode: Episode) -> None:
        """Store a complete interaction episode."""
        # Create rich text representation for embedding
        episode_text = f"""Task: {episode.task_description}
User summary: {episode.user_input_summary}
Outcome: {episode.outcome_summary}
Tools used: {', '.join(episode.tools_used)}
Success: {episode.success}
Key learnings: {episode.key_learnings}
Tags: {', '.join(episode.tags)}"""

        metadata = {
            "user_id": self.user_id,
            "type": "episode",
            "episode_id": episode.episode_id,
            "timestamp": episode.timestamp,
            "success": episode.success,
            **{f"tag_{tag}": True for tag in episode.tags}
        }

        doc = Document(page_content=episode_text, metadata=metadata)
        self.vectorstore.add_documents([doc])

    def recall_similar_episodes(self, current_task: str, k: int = 3) -> List[Document]:
        """Find past episodes similar to the current task."""
        return self.vectorstore.similarity_search(
            current_task,
            k=k,
            filter={"user_id": self.user_id, "type": "episode"}
        )

    def get_successful_patterns(self, task_type: str) -> List[Document]:
        """Retrieve past episodes where similar tasks succeeded."""
        return self.vectorstore.similarity_search(
            task_type,
            k=5,
            filter={"user_id": self.user_id, "type": "episode", "success": True}
        )

def create_episode_from_interaction(
    user_id: str,
    task: str,
    conversation: List[Dict],
    tools_used: List[str],
    success: bool
) -> Episode:
    """Use an LLM to create a structured episode from a raw interaction."""

    conv_text = "\n".join([f"{m['role']}: {m['content']}" for m in conversation])

    summary_prompt = f"""Summarize this agent interaction into structured memory:
    Task: {task}
    Conversation: {conv_text[:2000]}...

    Provide:
    1. USER_SUMMARY: One sentence summary of what the user needed
    2. OUTCOME: One sentence summary of what happened / what was produced
    3. LEARNINGS: Key insights from this interaction (1-2 sentences)
    4. TAGS: 3-5 relevant tags (comma-separated)"""

    response = llm.invoke([HumanMessage(content=summary_prompt)])
    content = response.content

    # Parse response (simplified)
    user_summary = _extract_field(content, "USER_SUMMARY")
    outcome = _extract_field(content, "OUTCOME")
    learnings = _extract_field(content, "LEARNINGS")
    tags_str = _extract_field(content, "TAGS")
    tags = [t.strip() for t in tags_str.split(",") if t.strip()]

    return Episode(
        episode_id=f"ep_{user_id}_{datetime.now().strftime('%Y%m%d_%H%M%S')}",
        user_id=user_id,
        timestamp=datetime.now().isoformat(),
        task_description=task,
        user_input_summary=user_summary,
        outcome_summary=outcome,
        tools_used=tools_used,
        success=success,
        key_learnings=learnings,
        tags=tags
    )

def _extract_field(text: str, field_name: str) -> str:
    """Extract a labeled field from LLM output."""
    if f"{field_name}:" in text:
        return text.split(f"{field_name}:")[1].split("\n")[0].strip()
    return ""

Memory Type 4: Semantic Memory (Knowledge Base)#

Semantic memory represents structured knowledge about a domain — facts, concepts, and their relationships.

from langchain_community.document_loaders import TextLoader, PDFMinerLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

class SemanticMemory:
    """Domain knowledge base using a vector store."""

    def __init__(self, vectorstore: Chroma):
        self.vectorstore = vectorstore
        self.splitter = RecursiveCharacterTextSplitter(
            chunk_size=1000,
            chunk_overlap=200
        )

    def ingest_document(self, text: str, source: str, category: str) -> int:
        """Add a document to the knowledge base."""
        chunks = self.splitter.split_text(text)
        docs = [
            Document(
                page_content=chunk,
                metadata={"source": source, "category": category, "type": "semantic"}
            )
            for chunk in chunks
        ]
        self.vectorstore.add_documents(docs)
        return len(docs)

    def query(self, question: str, category: Optional[str] = None, k: int = 5) -> List[Document]:
        """Query the knowledge base."""
        filter_dict = {"type": "semantic"}
        if category:
            filter_dict["category"] = category

        return self.vectorstore.similarity_search(
            question, k=k, filter=filter_dict
        )

# Initialize specialized vector stores for different memory types
episodic_store = Chroma(
    persist_directory="./memory/episodic",
    embedding_function=embeddings,
    collection_name="episodic_memory"
)

semantic_store = Chroma(
    persist_directory="./memory/semantic",
    embedding_function=embeddings,
    collection_name="semantic_memory"
)

working_store = Chroma(
    persist_directory="./memory/working",
    embedding_function=embeddings,
    collection_name="working_memory"
)

Complete Memory System Integration#

Combining all four memory types into a unified agent:

class FullMemoryAgent:
    """Agent with all four memory types integrated."""

    def __init__(self, user_id: str):
        self.user_id = user_id
        self.session_messages = []  # In-context memory

        # External memories
        self.external_memory = ExternalMemory(working_store, user_id)
        self.episodic_memory = EpisodicMemory(episodic_store, user_id)
        self.semantic_memory = SemanticMemory(semantic_store)

        # Session tracking
        self.session_start = datetime.now().isoformat()
        self.tools_used_this_session = []
        self.session_successful = True

    def chat(self, user_input: str) -> str:
        """Process message with full memory context."""

        # 1. Retrieve from all memory types
        external_memories = self.external_memory.retrieve(user_input, k=3)
        similar_episodes = self.episodic_memory.recall_similar_episodes(user_input, k=2)
        knowledge = self.semantic_memory.query(user_input, k=3)

        # 2. Build rich context
        context_parts = []

        if external_memories:
            context_parts.append("=== Past Preferences & Facts ===")
            context_parts.append(self.external_memory.format_memories_for_context(external_memories))

        if similar_episodes:
            context_parts.append("\n=== Similar Past Interactions ===")
            for ep in similar_episodes:
                context_parts.append(ep.page_content)

        if knowledge:
            context_parts.append("\n=== Relevant Knowledge ===")
            for doc in knowledge:
                context_parts.append(f"[{doc.metadata.get('source', 'Knowledge Base')}]: {doc.page_content}")

        memory_context = "\n".join(context_parts) if context_parts else "No relevant memories found."

        # 3. Generate response
        system_prompt = f"""You are a helpful AI assistant with comprehensive memory.

{memory_context}

Use relevant context from memory when it improves your responses.
Today's date: {datetime.now().strftime('%Y-%m-%d')}"""

        self.session_messages.append(HumanMessage(content=user_input))
        messages = [SystemMessage(content=system_prompt)] + self.session_messages[-15:]
        response = llm.invoke(messages)
        self.session_messages.append(AIMessage(content=response.content))

        # 4. Update memories
        self.external_memory._extract_and_store_memories = lambda u, r: self._store_interaction_memory(u, r)
        self._store_interaction_memory(user_input, response.content)

        return response.content

    def _store_interaction_memory(self, user_input: str, response: str) -> None:
        """Store key facts from the interaction."""
        extract_prompt = f"""Extract important facts from this exchange worth long-term remembering.
        Only extract specific, useful facts about the user's needs, preferences, or context.

        User: {user_input}
        Assistant: {response}

        Return facts as a simple list, or "NO_FACTS":"""

        result = llm.invoke([HumanMessage(content=extract_prompt)])
        if "NO_FACTS" not in result.content:
            facts = [f.strip() for f in result.content.strip().split("\n") if f.strip()]
            for fact in facts:
                self.external_memory.store(fact, {"type": "extracted_fact"})

    def end_session(self, task: str, success: bool = True) -> None:
        """Save episode memory at end of session."""
        if len(self.session_messages) < 2:
            return  # Nothing meaningful to save

        conv = [
            {"role": "human" if isinstance(m, HumanMessage) else "assistant", "content": m.content}
            for m in self.session_messages
        ]

        episode = create_episode_from_interaction(
            user_id=self.user_id,
            task=task,
            conversation=conv,
            tools_used=self.tools_used_this_session,
            success=success
        )

        self.episodic_memory.store_episode(episode)
        print(f"Session episode saved: {episode.episode_id}")


# Example usage
agent = FullMemoryAgent(user_id="sarah_001")

# Populate semantic memory (do this once during setup)
agent.semantic_memory.ingest_document(
    text="""LangGraph is a library for building stateful, multi-actor applications with LLMs.
    Key features: State management with TypedDict, graph-based workflow definition,
    built-in checkpointing, human-in-the-loop support, streaming...""",
    source="LangGraph Documentation",
    category="technical"
)

# Chat with full memory
r1 = agent.chat("I'm trying to build a research agent that remembers past searches")
print(r1)

r2 = agent.chat("What framework should I use for this?")
print(r2)  # Agent recalls user's goal and retrieves LangGraph knowledge

# End session and save episode
agent.end_session("Building a research agent with persistent memory", success=True)

Production with Pinecone#

Replace Chroma with Pinecone for production scale:

from pinecone import Pinecone, ServerlessSpec
from langchain_pinecone import PineconeVectorStore

# Initialize Pinecone
pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])

# Create index if it doesn't exist
index_name = "agent-memory"
if index_name not in pc.list_indexes().names():
    pc.create_index(
        name=index_name,
        dimension=1536,  # text-embedding-3-small dimension
        metric="cosine",
        spec=ServerlessSpec(cloud="aws", region="us-east-1")
    )

index = pc.Index(index_name)

# Use Pinecone as drop-in replacement for Chroma
pinecone_vectorstore = PineconeVectorStore(
    index=index,
    embedding=embeddings,
    namespace="production"
)

# All the same ExternalMemory, EpisodicMemory, SemanticMemory classes work unchanged
production_memory = ExternalMemory(pinecone_vectorstore, user_id="prod_user_001")

Key Takeaways#

Memory is the most powerful capability gap between current AI assistants and genuinely useful AI agents:

In-context memory is fast but limited — always manage conversation history size
External memory (vector DB) enables recall beyond context windows — use Chroma locally, Pinecone in production
Episodic memory stores what happened, enabling the agent to learn from past interactions
Semantic memory provides domain knowledge grounding for more accurate, specialized responses
Always extract facts at the end of interactions — what seems obvious in the moment is easily forgotten
Separate memory stores for different types prevents interference and enables targeted retrieval

For the next step, learn how to evaluate your memory system's effectiveness in our AI Agent Evaluation Metrics tutorial.

AI Agent Memory Systems Explained (2026)

Building AI Agent Memory Systems: In-Context, External and Episodic Memory

The Four Types of Agent Memory#

Setup#

Memory Type 1: In-Context Memory (Conversation History)#

Managing Context Window Size#

Memory Type 2: External Memory with Vector Database#

Setting Up Chroma Vector Store#

Agent with External Memory#

Memory Type 3: Episodic Memory#

Memory Type 4: Semantic Memory (Knowledge Base)#

Complete Memory System Integration#

Production with Pinecone#

Key Takeaways#

AI Agent Memory Systems Explained (2026)

Building AI Agent Memory Systems: In-Context, External and Episodic Memory

The Four Types of Agent Memory#

Setup#

Memory Type 1: In-Context Memory (Conversation History)#

Managing Context Window Size#

Memory Type 2: External Memory with Vector Database#

Setting Up Chroma Vector Store#

Agent with External Memory#

Memory Type 3: Episodic Memory#

Memory Type 4: Semantic Memory (Knowledge Base)#

Complete Memory System Integration#

Production with Pinecone#

Key Takeaways#