Learning Query-Aware Budget-Tier Routing for Runtime Agent Memory

Bowen Jin; Chengwei Qin; Haodong Yue; Haozhen Zhang; Jianzhu Bao; Jiaxuan You; Quanyu Long; Tao Feng; Weizhi Zhang; Wenya Wang

REVIEW 1 major objections 2 minor 8 cited by

BudgetMem uses a query-aware RL router to assign low, mid, or high budget tiers to memory modules for explicit performance-cost control in LLM agents.

Reviewed by Pith at T0; open to challenge. T0 means a machine referee read the full paper against a public rubric. the ladder, T0–T4 →

Challenge this review Re-run · record.json Download PDF Read on arXiv ↗

T0 review · grok-4.3

2026-05-21 13:41 UTC pith:AGHHOCHF

load-bearing objection BudgetMem sets up a runtime router for query-aware memory budget tiers in agents using RL, which is a sensible practical step but the abstract leaves the actual gains and overhead accounting unshown. the 1 major comments →

arxiv 2602.06025 v3 pith:AGHHOCHF submitted 2026-02-05 cs.CL cs.AIcs.LG

Learning Query-Aware Budget-Tier Routing for Runtime Agent Memory

Haozhen Zhang , Haodong Yue , Tao Feng , Quanyu Long , Jianzhu Bao , Bowen Jin , Weizhi Zhang , Xiao Li

show 3 more authors

Jiaxuan You Chengwei Qin Wenya Wang

This is my paper

classification cs.CL cs.AIcs.LG

keywords LLM agentsruntime memorybudget-tier routingreinforcement learningperformance-cost trade-offquery-awarememory modules

verification ladder T0 review T1 audit T2 compute T3 formal T4 reserved

The pith

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces BudgetMem to manage memory for LLM agents that need information beyond a single context window. It organizes memory processing into modules each available in low, mid, or high budget tiers and trains a compact neural policy with reinforcement learning to route each query to the right tier. This gives direct control over the accuracy versus memory-construction-cost trade-off instead of relying on fixed offline memory that wastes resources or drops query-critical details. Experiments across LoCoMo, LongMemEval, and HotpotQA show the routed system beats strong baselines when high performance is allowed and traces better accuracy-cost frontiers when budgets are tight. The analysis also separates the strengths of creating tiers through method complexity, inference behavior, or model capacity.

Core claim

By structuring memory as a set of modules each offered in Low/Mid/High budget tiers and using a lightweight router implemented as a compact neural policy trained with reinforcement learning to perform query-aware budget-tier routing, BudgetMem achieves explicit control over the performance-memory cost trade-off and surpasses strong baselines in high-budget settings while delivering better accuracy-cost frontiers under tighter budgets.

What carries the argument

The lightweight router, a compact neural policy trained with reinforcement learning, that assigns budget tiers to memory modules based on the input query.

Load-bearing premise

The compact neural policy trained with reinforcement learning can learn budget-tier assignments that improve the performance-cost trade-off without adding substantial overhead.

What would settle it

If the routed BudgetMem system produces accuracy-cost curves that fail to dominate fixed-tier or non-routed baselines on LoCoMo, LongMemEval, or HotpotQA, the benefit of query-aware routing would be falsified.

Watch this falsifier — get emailed when new claim-graph text bears on it.

If this is right

Surpasses strong baselines when performance is prioritized in high-budget settings.
Delivers better accuracy-cost frontiers under tighter budgets.
Disentangles the strengths and weaknesses of implementation, reasoning, and capacity strategies for realizing budget tiers under varying budget regimes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The routing mechanism could extend to dynamic allocation of other resources such as compute or tool calls in agent systems.
Hybrid tiering that combines multiple realization axes might produce further improved trade-offs.
Deployment on agent tasks with highly variable query complexity would test whether the learned policy generalizes beyond the evaluated benchmarks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit.

Referee Report

1 major / 2 minor

Summary. The manuscript introduces BudgetMem, a runtime agent memory framework that decomposes memory processing into modules each available in Low/Mid/High budget tiers. A compact neural policy router, trained via reinforcement learning, performs query-aware tier assignment to explicitly trade off task performance against memory-construction cost. The framework is used as a testbed to compare three complementary tier-realization strategies (implementation complexity, inference-time reasoning behavior, and module model capacity). Experiments on LoCoMo, LongMemEval, and HotpotQA report that BudgetMem outperforms strong baselines in high-budget regimes and yields improved accuracy-cost frontiers under tighter budgets.

Significance. If the reported frontiers remain superior after router overhead is properly internalized, the work supplies a practical, controllable mechanism for runtime memory budgeting in long-context LLM agents and a reusable testbed for dissecting tiering axes. The disentanglement of implementation, reasoning, and capacity strategies under varying budget regimes could inform future system design.

major comments (1)

[Experimental Setup and Results] The central accuracy-cost frontier claims rest on the router producing net-positive tier assignments, yet the evaluation does not isolate or subtract the per-query inference cost of the lightweight router (nor the amortized RL training cost) from the reported memory-construction budgets. Under tight budgets this overhead may constitute a non-negligible fraction of the allowed cost; if the metrics treat router cost as external, the attributed gains cannot be unambiguously credited to query-aware routing rather than to the underlying tier implementations themselves.

minor comments (2)

[Abstract] The abstract states improved frontiers and benchmark results but supplies no quantitative numbers, error bars, or ablation details; a short table of headline deltas would improve readability.
[Method] Notation for the three tiering strategies (implementation, reasoning, capacity) is introduced without an explicit mapping to the concrete module variants used in each experiment; a small summary table would clarify the correspondence.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the major comment point by point below, providing clarifications and indicating revisions where appropriate to strengthen the evaluation of the accuracy-cost frontiers.

read point-by-point responses

Referee: [Experimental Setup and Results] The central accuracy-cost frontier claims rest on the router producing net-positive tier assignments, yet the evaluation does not isolate or subtract the per-query inference cost of the lightweight router (nor the amortized RL training cost) from the reported memory-construction budgets. Under tight budgets this overhead may constitute a non-negligible fraction of the allowed cost; if the metrics treat router cost as external, the attributed gains cannot be unambiguously credited to query-aware routing rather than to the underlying tier implementations themselves.

Authors: We agree that the router's per-query inference cost should be explicitly internalized for a rigorous assessment of net gains from query-aware routing. In the original experiments, the reported budgets centered on memory module construction costs, with the router (a compact neural policy) treated as a fixed lightweight overhead incurred uniformly per query. However, we acknowledge the referee's point that this could affect tight-budget regimes. To address it, we have added new measurements of the router's inference cost (approximately 0.5-2% of total budget depending on the setting, based on FLOPs and latency benchmarks) and recomputed the accuracy-cost frontiers with this overhead subtracted from the allowed budgets. The revised results confirm that BudgetMem retains superior frontiers over baselines. We have also clarified in the text that amortized RL training costs are offline and not part of runtime per-query budgets. These updates appear in a new paragraph in Section 4.3 and updated Figures 3-5. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical testbed with RL router validated on external benchmarks

full rationale

The paper presents BudgetMem as an empirical framework for runtime memory with a compact neural policy router trained via reinforcement learning to assign budget tiers. Claims of superior accuracy-cost frontiers rest on experimental comparisons against baselines across LoCoMo, LongMemEval, and HotpotQA rather than any derivation chain. No equations, self-definitional reductions, fitted inputs renamed as predictions, or load-bearing self-citations appear in the provided text. The router training and tier strategies are positioned as independent design choices whose effectiveness is measured externally, keeping the contribution self-contained without reducing outputs to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; the RL policy and three tiering strategies are described at a high level without implementation specifics or independent evidence.

pith-pipeline@v0.9.0 · 5813 in / 1172 out tokens · 58011 ms · 2026-05-21T13:41:17.616570+00:00 · methodology

0 comments

read the original abstract

Memory is increasingly central to Large Language Model (LLM) agents operating beyond a single context window, yet most existing systems rely on offline, query-agnostic memory construction that can be inefficient and may discard query-critical information. Although runtime memory utilization is a natural alternative, prior work often incurs substantial overhead and offers limited explicit control over the performance-cost trade-off. In this work, we present \textbf{BudgetMem}, a runtime agent memory framework for explicit, query-aware performance-cost control. BudgetMem structures memory processing as a set of memory modules, each offered in three budget tiers (i.e., \textsc{Low}/\textsc{Mid}/\textsc{High}). A lightweight router performs budget-tier routing across modules to balance task performance and memory construction cost, which is implemented as a compact neural policy trained with reinforcement learning. Using BudgetMem as a unified testbed, we study three complementary strategies for realizing budget tiers: implementation (method complexity), reasoning (inference behavior), and capacity (module model size). Across LoCoMo, LongMemEval, and HotpotQA, BudgetMem surpasses strong baselines when performance is prioritized (i.e., high-budget setting), and delivers better accuracy-cost frontiers under tighter budgets. Moreover, our analysis disentangles the strengths and weaknesses of different tiering strategies, clarifying when each axis delivers the most favorable trade-offs under varying budget regimes.

Figures

Figures reproduced from arXiv: 2602.06025 by Bowen Jin, Chengwei Qin, Haodong Yue, Haozhen Zhang, Jianzhu Bao, Jiaxuan You, Quanyu Long, Tao Feng, Weizhi Zhang, Wenya Wang, Xiao Li.

**Figure 1.** Figure 1: BudgetMem overview. Given a user query q, we retrieve raw chunks Cq from a chunked history (without offline memory preprocessing) and process them with a modular pipeline (filter → entity/temporal/topic → summary). Each module exposes LOW/MID/HIGH budget tiers instantiated by one of three strategies (implementation, reasoning, capacity). A shared lightweight router selects tiers module-wise based on the q… view at source ↗

**Figure 2.** Figure 2: Performance–cost trade-offs across tiering strategies on LoCoMo. By varying the cost weight λ, BudgetMem traces smooth, controllable frontiers that shift toward higher performance as budget increases, and envelop baselines in both low- and high-cost regimes. three variants: BudgetMem-IMP, BudgetMem-REA, and BudgetMem-CAP. Sec. 6.2 varies λ to trace performance– cost curves, Sec. 6.3 ablates cost modeling a… view at source ↗

**Figure 4.** Figure 4: Budget-tier selection ratios. Module-wise LOW/MID/HIGH routing ratios on LongMemEval under varying cost weights λ using the capacity tiering strategy. chunks on LoCoMo, evaluated under all three tiering strategies. Increasing the retrieval size predictably raises cost due to longer inputs and additional processing, and it often improves Judge score by providing more supporting evidence, reflecting the st… view at source ↗

**Figure 3.** Figure 3: Ablation of reward-scale alignment under capacity tiering strategy on LoCoMo. 6.4. Discussion Budget tier selection ratio [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 5.** Figure 5: Retrieval-size sensitivity on LoCoMo. Cost and Judge versus the number of retrieved raw chunks, evaluated under all three tiering strategies. 7. Conclusion We present BudgetMem, a runtime agent memory framework for explicit performance–cost control in on-demand memory extraction. BudgetMem equips each module in a modular memory pipeline with LOW/MID/HIGH budget tiers and learns a lightweight router to se… view at source ↗

discussion (0)

Forward citations

Cited by 8 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

LLM Agents Are Latent Context Managers: Eliciting Self-Managed Context via State Proprioception
cs.CL 2026-06 conditional novelty 7.0

Exposing per-block token, recency, and access metadata with lossless archive/recovery elicits latent context management in untrained LLM agents and roughly doubles LOCA-Bench success under pressure.
Learning What to Remember: Observability-Safe Memory Retention via Constrained Optimization for Long-Horizon Language Agents
cs.AI 2026-06 unverdicted novelty 7.0

OSL-MR is a learning-augmented framework that casts memory retention as constrained stochastic optimization under partial observability and outperforms heuristic baselines on LoCoMo and LongMemEval.
LLM Agents Are Latent Context Managers: Eliciting Self-Managed Context via State Proprioception
cs.CL 2026-06 unverdicted novelty 6.0

VISTA supplies LLM agents with a visible proprioceptive dashboard of typed context blocks, enabling untrained self-management that lifts performance on long-horizon tool-use benchmarks across multiple model scales.
Learning What to Remember: Observability-Safe Memory Retention via Constrained Optimization for Long-Horizon Language Agents
cs.AI 2026-06 unverdicted novelty 6.0

OSL-MR applies constrained stochastic optimization and learning to memory retention in long-horizon agents, outperforming recency and heuristic baselines on LoCoMo and LongMemEval under tight budgets.
EMBER: Efficient Memory via Budgeted Evidence Retention for Long-Horizon Agents
cs.CL 2026-06 conditional novelty 5.0

EMBER learns to retain budgeted, source-backed evidence capsules so long-horizon agents recover answer-relevant facts without rereading the full history.
EMBER: Efficient Memory via Budgeted Evidence Retention for Long-Horizon Agents
cs.CL 2026-06 unverdicted novelty 5.0

EMBER learns to retain source-backed evidence capsules under a fixed token budget, improving F1, Retain-Recall, and Read-Recall on LongMemEval-RR over budgeted baselines.
Exploring Cross-Scenario Generality of Agentic Memory Systems: Diagnostics and a Strong Baseline
cs.AI 2026-06 unverdicted novelty 5.0

An agentic harness letting the LLM self-manage flat text-file storage via tool calls outperforms eight prior memory systems on cross-scenario generality across QA, chat, trajectory, stress-test, and long-horizon tasks.
Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering
cs.SE 2026-04 accept novelty 5.0

LLM agent progress depends on externalizing cognitive functions into memory, skills, protocols, and harness engineering that coordinates them reliably.