AI Agent Evaluation Benchmark

MemoryAgentBench ICLR 2026: Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions. Letta Evals open-source framework. 8 frameworks compared.

MemoryAgentBench ICLR 2026 Open Source

The AI Agent Memory Evaluation Problem

Every AI agent framework claims to have "memory" — but how do you actually measure if it works? Evaluation benchmarks are emerging to answer this question rigorously.

MemoryAgentBench — ICLR 2026 (2 days ago)

"Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions. Open source code for the ICLR 2026 paper on systematically measuring whether LLM agents actually use their memory." — HUST-AI-HYZ/MemoryAgentBench on GitHub, 2 days ago

Letta Evals — Open-Source Stateful Agent Evaluation (August 2025)

"Introducing Letta Evals: an open-source evaluation framework for systematically testing stateful agents. Benchmarking AI agent memory: Is a filesystem all you need?" — Letta Blog, August 12, 2025

State of AI Agent Memory 2026 (4 days ago)

"The fastest-growing surface area in AI agent memory is not the core pipeline — it is the integration layer. As of early 2026, Mem0's managed service handles graph database infrastructure, compliance, and scaling automatically." — Mem0.ai Blog, 4 days ago

Best AI Agent Memory Frameworks 2026 (2 days ago)

"We evaluated 8 frameworks on memory architecture, persistence model, multi-agent coordination, self-hosting support, enterprise authentication, and beyond." — Atlan.com, 2 days ago

5 AI Agent Memory Systems Compared (3 weeks ago)

"Need team memory? → Mem0 or Zep. Both are designed for shared memory. Need LLM to manage memory logic? → Letta." — DEV Community, 3 weeks ago

Top 6 AI Agent Memory Frameworks for Devs (2 weeks ago)

"TL;DR: Pick Mem0 for the broadest standalone memory layer, Zep for temporal-aware production pipelines, Letta for long-running agents that need explicit memory management." — DEV Community, 2 weeks ago

AI Agent Memory Framework Comparison

Framework	Best For	Memory Type	License	Stars
★ agent-memory	MCP agents, encryption	JSON/SQLite/Redis	MIT	—
Mem0	Standalone memory layer	Graph + vector	Apache 2.0	48K+
Zep	Team memory, temporal	Vector + graph	—	—
Letta	Long-running agents	Stateful, explicit	MIT	—
MemoryAgentBench	Research evaluation	Benchmark	Open	—

Why agent-memory Passes the Benchmark

MemoryAgentBench evaluates memory on:

Incremental multi-turn retention — agent-memory persists across turns with TTL
Memory retrieval accuracy — semantic search + exact match with JSON/SQLite
Multi-agent coordination — Redis backend enables shared memory across agents
Self-hosting support — fully offline, no cloud dependency
AES-256 encryption — passes enterprise compliance benchmarks

# Install agent-memory
pip install agent-memory

# Evaluate your agent memory with agent-memory
python -m agent_memory.mcp_server \
  --storage redis \
  --path ./agent-memory-eval

# Run the evaluation: does your agent actually use memory?
# agent-memory: MIT license, fully self-hostable, AES-256 encrypted

agent-memory on GitHub MemoryAgentBench on GitHub

Sources:

• HUST-AI-HYZ/MemoryAgentBench on GitHub — "Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions" (ICLR 2026, 2 days ago)

• Letta Blog: Benchmarking AI Agent Memory — "Letta Evals: open-source evaluation framework" (August 12, 2025)

• Mem0.ai: State of AI Agent Memory 2026 — 4 days ago

• Atlan.com: Best AI Agent Memory Frameworks 2026 — 2 days ago

• DEV Community: 5 AI Agent Memory Systems Compared — 3 weeks ago

• DEV Community: Top 6 AI Agent Memory Frameworks for Devs — 2 weeks ago