AI Agent Memory Benchmark

LoCoMo, LongMemEval, MemoryAgentBench (ICLR 2026), Locomo-Plus — the benchmark landscape for AI agent memory. How memory systems are evaluated and compared.

ICLR 2026 Benchmark Evaluation

MemoryAgentBench — ICLR 2026 (New!)

HUST-AI-HYZ/MemoryAgentBench — GitHub

"Open source code for ICLR 2026 Paper: Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions" — GitHub

The newest formal benchmark: MemoryAgentBench from HUST was accepted to ICLR 2026. It evaluates memory in LLM agents specifically via incremental multi-turn interactions — simulating how agents remember and forget over extended conversations.

LoCoMo — The Standard Benchmark

"Based on LOCOMO, we present a comprehensive evaluation benchmark to measure long-term memory in models, encompassing question answering, event reasoning, and preference recall." — snap-research.github.io/locomo

LoCoMo (Long-Term Conversation Memory) is the most widely cited benchmark for AI agent memory. It evaluates:

Question answering over long conversations
Event reasoning from past interactions
Preference recall over extended sessions
Memory consolidation and retrieval

Locomo-Plus — Beyond Factual Memory (Feb 11, 2026)

"We introduce LoCoMo-Plus, a benchmark that targets beyond-factual cognitive memory evaluation for LLM agents." — arXiv:2602.10715v1, February 11, 2026

Locomo-Plus extends LoCoMo beyond factual recall to cognitive memory — testing reasoning, inference, and application of past context.

Vectorize: Agent Memory Benchmark Manifesto (2 weeks ago)

"LoCoMo and LongMemEval are still a valid foundation — the question formats are good, the evaluation methodology is reasonable, and they remain the best available benchmarks." — Vectorize Hindsight, 2 weeks ago

Mem0: State of AI Agent Memory 2026 on LoCoMo

"LOCOMO is a solid benchmark for measuring general long-term memory recall, but it does not capture application-level memory performance." — Mem0: State of AI Agent Memory 2026, 2 days ago

Benchmark Comparison Table

Benchmark	Type	Focus	Status
★ agent-memory	—	Production memory layer	MIT, Available
MemoryAgentBench	ICLR 2026	Incremental multi-turn interactions	New
LoCoMo	Research	Long-term conversational memory	Widely used
Locomo-Plus	arXiv Feb 2026	Beyond-factual cognitive memory	New
LongMemEval	Research	Long conversation memory	Secondary standard

Mem0 LoCoMo Performance Claim

Mem0 claims on their GitHub:

"+26% Accuracy over OpenAI Memory on the LOCOMO benchmark" — mem0ai/mem0 on GitHub, 4 days ago

Paper Resources

MemoryAgentBench

ICLR 2026. Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions. HUST-AI-HYZ.

LoCoMo

Evaluating Very Long-Term Conversational Memory. snap-research.github.io/locomo

Locomo-Plus

Beyond-Factual Cognitive Memory Evaluation. arXiv:2602.10715v1 (Feb 11, 2026)

Awesome-Memory-for-Agents

TsinghuaC3I curated paper list. Memory in the Age of AI Agents: A Survey.

How agent-memory Approaches Benchmark

agent-memory is a production memory layer — not a benchmark. But it excels where benchmarks matter:

MCP v3.2 native — works with any agent framework
AES-256 encryption — unique among all memory systems
TTL auto-expiration — benchmark-validated memory management
JSON/SQLite/Redis backends — transparent, auditable storage
MIT license — benchmark against it freely

agent-memory on GitHub MemoryAgentBench on GitHub