MemTrace: Tracing and Attributing Errors in Large Language Model Memory Systems
Abstract
Memory systems in large language models suffer from reliability issues that can be addressed through a novel tracing framework and automated fault attribution for improved performance.
Memory is essential for enabling large language models to support long-horizon reasoning, yet existing memory systems remain unreliable and difficult to debug. Tracing memory's dynamic evolution is crucial to understand how information is synthesized, propagated, or corrupted over time. In this work, we study the new problem of error tracing and attribution in LLM memory systems. We propose a novel framework that transforms memory pipelines into executable memory evolution graphs, enabling fine-grained tracing of operational information flow. We then construct MemTraceBench, a benchmark collected from representative memory systems such as Long-Context, RAG, Mem0, and EverMemOS, to systematically study memory failure modes. We further introduce an automatic attribution method that iteratively traces operation subgraphs to pinpoint the root cause of any failed case. Our analysis reveals that memory failures are systematic, stemming from operation-level issues like information loss and retrieval misalignment. Crucially, we leverage these fine-grained attribution signals to guide downstream prompt optimization, establishing a closed-loop system that automatically corrects faults and boosts end-task performance by up to 7.62%. Code will be released at https://github.com/zjunlp/MemTrace.
Community
We introduce MemTrace, a framework that traces how memories evolve inside LLM systems, automatically pinpoints where failures occur, and uses these signals to self-correct memory pipelines for more reliable long-term reasoning.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Mitigating Provenance-Role Collapse in Long-Term Agents via Typed Memory Representation (2026)
- MemGuard: Preventing Memory Contamination in Long-Term Memory-Augmented Large Language Models (2026)
- MemFail: Stress-Testing Failure Modes of LLM Memory Systems (2026)
- LongMemEval-V2: Evaluating Long-Term Agent Memory Toward Experienced Colleagues (2026)
- ContextWeaver: Selective and Dependency-Structured Memory Construction for LLM Agents (2026)
- MemORAI: Memory Organization and Retrieval via Adaptive Graph Intelligence for LLM Conversational Agents (2026)
- CodeTracer: Towards Traceable Agent States (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2605.28732 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper