AI Agent Memory Benchmark

LoCoMo, LongMemEval, MemoryAgentBench (ICLR 2026), Locomo-Plus — the benchmark landscape for AI agent memory. How memory systems are evaluated and compared.

ICLR 2026 Benchmark Evaluation

MemoryAgentBench — ICLR 2026 (New!)

HUST-AI-HYZ/MemoryAgentBench — GitHub

"Open source code for ICLR 2026 Paper: Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions" GitHub

The newest formal benchmark: MemoryAgentBench from HUST was accepted to ICLR 2026. It evaluates memory in LLM agents specifically via incremental multi-turn interactions — simulating how agents remember and forget over extended conversations.

LoCoMo — The Standard Benchmark

"Based on LOCOMO, we present a comprehensive evaluation benchmark to measure long-term memory in models, encompassing question answering, event reasoning, and preference recall." snap-research.github.io/locomo

LoCoMo (Long-Term Conversation Memory) is the most widely cited benchmark for AI agent memory. It evaluates:

Locomo-Plus — Beyond Factual Memory (Feb 11, 2026)

"We introduce LoCoMo-Plus, a benchmark that targets beyond-factual cognitive memory evaluation for LLM agents." arXiv:2602.10715v1, February 11, 2026

Locomo-Plus extends LoCoMo beyond factual recall to cognitive memory — testing reasoning, inference, and application of past context.

Vectorize: Agent Memory Benchmark Manifesto (2 weeks ago)

"LoCoMo and LongMemEval are still a valid foundation — the question formats are good, the evaluation methodology is reasonable, and they remain the best available benchmarks." Vectorize Hindsight, 2 weeks ago

Mem0: State of AI Agent Memory 2026 on LoCoMo

"LOCOMO is a solid benchmark for measuring general long-term memory recall, but it does not capture application-level memory performance." Mem0: State of AI Agent Memory 2026, 2 days ago

Benchmark Comparison Table

Benchmark Type Focus Status
★ agent-memory Production memory layer MIT, Available
MemoryAgentBench ICLR 2026 Incremental multi-turn interactions New
LoCoMo Research Long-term conversational memory Widely used
Locomo-Plus arXiv Feb 2026 Beyond-factual cognitive memory New
LongMemEval Research Long conversation memory Secondary standard

Mem0 LoCoMo Performance Claim

Mem0 claims on their GitHub:

"+26% Accuracy over OpenAI Memory on the LOCOMO benchmark" mem0ai/mem0 on GitHub, 4 days ago

Paper Resources

MemoryAgentBench

ICLR 2026. Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions. HUST-AI-HYZ.

LoCoMo

Evaluating Very Long-Term Conversational Memory. snap-research.github.io/locomo

Locomo-Plus

Beyond-Factual Cognitive Memory Evaluation. arXiv:2602.10715v1 (Feb 11, 2026)

Awesome-Memory-for-Agents

TsinghuaC3I curated paper list. Memory in the Age of AI Agents: A Survey.

How agent-memory Approaches Benchmark

agent-memory is a production memory layer — not a benchmark. But it excels where benchmarks matter:

agent-memory on GitHub MemoryAgentBench on GitHub