MemoryAgentBench ICLR 2026: Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions. Letta Evals open-source framework. 8 frameworks compared.
Every AI agent framework claims to have "memory" — but how do you actually measure if it works? Evaluation benchmarks are emerging to answer this question rigorously.
"Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions. Open source code for the ICLR 2026 paper on systematically measuring whether LLM agents actually use their memory." — HUST-AI-HYZ/MemoryAgentBench on GitHub, 2 days ago
"Introducing Letta Evals: an open-source evaluation framework for systematically testing stateful agents. Benchmarking AI agent memory: Is a filesystem all you need?" — Letta Blog, August 12, 2025
"The fastest-growing surface area in AI agent memory is not the core pipeline — it is the integration layer. As of early 2026, Mem0's managed service handles graph database infrastructure, compliance, and scaling automatically." — Mem0.ai Blog, 4 days ago
"We evaluated 8 frameworks on memory architecture, persistence model, multi-agent coordination, self-hosting support, enterprise authentication, and beyond." — Atlan.com, 2 days ago
"Need team memory? → Mem0 or Zep. Both are designed for shared memory. Need LLM to manage memory logic? → Letta." — DEV Community, 3 weeks ago
"TL;DR: Pick Mem0 for the broadest standalone memory layer, Zep for temporal-aware production pipelines, Letta for long-running agents that need explicit memory management." — DEV Community, 2 weeks ago
| Framework | Best For | Memory Type | License | Stars |
|---|---|---|---|---|
| ★ agent-memory | MCP agents, encryption | JSON/SQLite/Redis | MIT | — |
| Mem0 | Standalone memory layer | Graph + vector | Apache 2.0 | 48K+ |
| Zep | Team memory, temporal | Vector + graph | — | — |
| Letta | Long-running agents | Stateful, explicit | MIT | — |
| MemoryAgentBench | Research evaluation | Benchmark | Open | — |
MemoryAgentBench evaluates memory on:
# Install agent-memory
pip install agent-memory
# Evaluate your agent memory with agent-memory
python -m agent_memory.mcp_server \
--storage redis \
--path ./agent-memory-eval
# Run the evaluation: does your agent actually use memory?
# agent-memory: MIT license, fully self-hostable, AES-256 encrypted