MRAgent Cuts Long-Memory Agent Queries To 118k Tokens In Benchmark Tests
National University of Singapore researchers built MRAgent to reconstruct memory through a Cue-Tag-Content graph, with VentureBeat citing LongMemEval prompt use of 118k tokens per sample versus 632k for A-Mem and 3.26 million for LangMem.

MRAgent Rebuilds Memory During Reasoning
Researchers at the National University of Singapore developed MRAgent, a memory framework for AI agents that replaces static retrieve-then-reason pipelines with dynamic memory reconstruction.
The framework lets an agent develop its memory path as it gathers evidence, rather than loading broad retrieval results into the model context at the start.
Classic vector-search and graph-traversal retrieval can fail on long-horizon tasks because the system cannot revise its search strategy during reasoning.
If the agent finds a missing cue, such as a date, person or place, a passive retrieval pipeline has no way to issue a new query based on that discovery.
Fixed similarity scores can also return surface-level matches that fill the context window with irrelevant material.
Cue-Tag-Content Narrows The Search Path
MRAgent treats memory as an interactive environment.
The backbone model explores candidate retrieval paths across a structured memory graph, evaluates intermediate evidence, infers new constraints and prunes branches that do not help answer the query.
The framework organizes memory through a Cue-Tag-Content mechanism.
Cues are fine-grained keywords or contextual attributes, Content stores the memory units, and Tags summarize relationships between cues and content.
The model can judge short relational summaries before spending tokens on heavier memory contents.
The authors illustrate the retrieval flow with a prompt about how Nate used prize money after winning a video game tournament.
The query starts with cues such as Nate, tournament and win.
MRAgent follows the victory-related tag, discards less relevant tournament memories, adds tournament earnings as a new cue and keeps searching until it has enough evidence to answer.
LongMemEval Shows 118k Token Prompt Use
The researchers tested MRAgent on LoCoMo and LongMemEval against standard RAG, A-Mem, MemoryOS, LangMem and Mem0.
The paper benchmarks cited by VentureBeat report that MRAgent outperformed every baseline across both models and all question types.
In the LongMemEval tests cited by VentureBeat, MRAgent used 118k prompt tokens per sample.
A-Mem consumed 632k tokens, while LangMem used 3.26 million tokens per query.
Runtime fell from 1,122 seconds to 586 seconds compared with A-Mem.
Memory Construction Remains The Deployment Work
The framework still requires the Cue-Tag-Content structure to be prepared before query time.
Developers must build an ingestion pipeline that processes raw interaction histories, extracts metadata and stores the result in a graph database.
The authors designed that construction phase to use LLMs for automated distillation rather than manual labeling.
Implementation work still includes background jobs, prompt templates and graph storage before query time.
The authors released code on GitHub.
Named production deployments, maintenance costs and customer validation remain undisclosed.
















