AI Agent Evaluation Benchmark

MemoryAgentBench ICLR 2026: Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions. Letta Evals open-source framework. 8 frameworks compared.

MemoryAgentBench ICLR 2026 Open Source

The AI Agent Memory Evaluation Problem

Every AI agent framework claims to have "memory" — but how do you actually measure if it works? Evaluation benchmarks are emerging to answer this question rigorously.

MemoryAgentBench — ICLR 2026 (2 days ago)

"Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions. Open source code for the ICLR 2026 paper on systematically measuring whether LLM agents actually use their memory." HUST-AI-HYZ/MemoryAgentBench on GitHub, 2 days ago

Letta Evals — Open-Source Stateful Agent Evaluation (August 2025)

"Introducing Letta Evals: an open-source evaluation framework for systematically testing stateful agents. Benchmarking AI agent memory: Is a filesystem all you need?" Letta Blog, August 12, 2025

State of AI Agent Memory 2026 (4 days ago)

"The fastest-growing surface area in AI agent memory is not the core pipeline — it is the integration layer. As of early 2026, Mem0's managed service handles graph database infrastructure, compliance, and scaling automatically." Mem0.ai Blog, 4 days ago

Best AI Agent Memory Frameworks 2026 (2 days ago)

"We evaluated 8 frameworks on memory architecture, persistence model, multi-agent coordination, self-hosting support, enterprise authentication, and beyond." Atlan.com, 2 days ago

5 AI Agent Memory Systems Compared (3 weeks ago)

"Need team memory? → Mem0 or Zep. Both are designed for shared memory. Need LLM to manage memory logic? → Letta." DEV Community, 3 weeks ago

Top 6 AI Agent Memory Frameworks for Devs (2 weeks ago)

"TL;DR: Pick Mem0 for the broadest standalone memory layer, Zep for temporal-aware production pipelines, Letta for long-running agents that need explicit memory management." DEV Community, 2 weeks ago

AI Agent Memory Framework Comparison

Framework Best For Memory Type License Stars
★ agent-memory MCP agents, encryption JSON/SQLite/Redis MIT
Mem0 Standalone memory layer Graph + vector Apache 2.0 48K+
Zep Team memory, temporal Vector + graph
Letta Long-running agents Stateful, explicit MIT
MemoryAgentBench Research evaluation Benchmark Open

Why agent-memory Passes the Benchmark

MemoryAgentBench evaluates memory on:

# Install agent-memory pip install agent-memory # Evaluate your agent memory with agent-memory python -m agent_memory.mcp_server \ --storage redis \ --path ./agent-memory-eval # Run the evaluation: does your agent actually use memory? # agent-memory: MIT license, fully self-hostable, AES-256 encrypted
agent-memory on GitHub MemoryAgentBench on GitHub