pith. sign in

arxiv: 2606.25115 · v1 · pith:SQ5GF5ABnew · submitted 2026-06-23 · 💻 cs.LG · cs.NI

Forget to Improve: On-Device LLM-Agent Continual Learning via Budget-Curated Memory

Pith reviewed 2026-06-25 23:58 UTC · model grok-4.3

classification 💻 cs.LG cs.NI
keywords on-device LLM agentscontinual learningmemory curationbudget managementpoisoning defensenet-value scoringexperience memorytask drift
0
0 comments X

The pith

A net-value-per-byte score lets on-device LLM agents forget low-value memory entries to cut footprint, energy, uplink and poisoning while preserving or raising accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that on-device LLM agents improve through retrieved experience memory rather than weight updates, yet that memory is strictly bounded, power-hungry, uplink-limited and writable by the same inputs the agent reads. Existing approaches either grow memory without bound, retain entries by success alone, or treat poisoning only as an external attack. The authors introduce a single net-value-per-byte score (value minus harm) that acts as the sole curator for three decisions: evict low-value bytes to meet RAM and energy limits, transmit an insight only when its value exceeds uplink cost, and gate peer entries by provenance. On task-drift benchmarks and a heterogeneous Jetson robot-arm testbed the method reduces memory 2.7 times and uplink 2.4 times, drives injection success to zero, and improves accuracy on cases corrupted by poison or stale memory.

Core claim

Curating an agent's experience memory by a single net-value-per-byte score (value minus harm) simultaneously governs KEEP, SHARE and TRUST decisions; this budget-driven curation reduces memory footprint 2.7 times, uplink traffic 2.4 times and injection success from 0.75 to zero while raising accuracy on poisoned or stale cases, demonstrating that deliberate forgetting by net value improves rather than weakens the agent.

What carries the argument

The net-value-per-byte score that computes value minus harm for each memory entry and serves as the single ruler for keep, share and trust decisions under explicit RAM, energy and uplink budgets.

If this is right

  • Memory footprint falls 2.7 times while accuracy on clean and corrupted tasks stays the same or rises.
  • Uplink volume falls 2.4 times because only net-positive insights are transmitted.
  • Successful injection attacks drop from 0.75 to zero because low-value or harmful entries are never retained or shared.
  • Energy consumption on the device decreases because fewer bytes are stored and retrieved.
  • Forgetting entries by net value improves continual-learning performance rather than degrading it.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same score could be used to decide which entries to compress or quantize on even tighter devices.
  • If provenance metadata itself consumes bytes, the net-value calculation would need to include that overhead to remain consistent.
  • The approach may generalize to any continual-learning system whose memory is both capacity-constrained and security-exposed.
  • Heterogeneous device fleets could share the same scoring rule without per-device retuning if the value and harm estimates remain device-agnostic.

Load-bearing premise

That a single net-value-per-byte score can be computed accurately enough to serve as the sole curator for keep, share and trust decisions across changing tasks and heterogeneous devices.

What would settle it

On the same Jetson testbed with the same task-drift benchmarks, run the agent with and without the net-value curator; if accuracy on poisoned or stale cases falls or injection success rises above zero, the central claim is false.

Figures

Figures reproduced from arXiv: 2606.25115 by Beining Wu, Jun Huang, Yanxiao Zhao, Zihao Ding.

Figure 1
Figure 1. Figure 1: The on-device agent-memory design space: [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 3
Figure 3. Figure 3: Overview of CURATOR: one net-value-per-byte score ρ governs KEEP, SHARE, and TRUST. harm it can cause, per byte it occupies [45]–[47]. C. Insight: One Ruler for Keep, Share, and Trust Insight 2: Keeping, sharing, and trusting answer the same question: whether an experience is worth its footprint. A single net-value-per-byte score can therefore govern all three. Each edge constraint forces a decision about … view at source ↗
Figure 4
Figure 4. Figure 4: Real heterogeneous testbed: two NVIDIA Jetson AGX [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Backbone robustness across three on-device planners [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Per-budget Pareto fronts on three on-device planners [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 8
Figure 8. Figure 8: Sweeping the harm weight λ in ρ = (Vˆ − λHˆ )/b (5 seeds; solid: Qwen2.5-3B, dashed: Qwen2.5-7B). (a) accuracy traces an inverted-U peaking near λ≈1; (b) injection ASR falls to zero once λ≥1. Both backbones share the same safe band. The gate does not act on origin alone. A poisoned entry written to issue an instruction scores high on instruction￾likeness and is demoted below the trust threshold whatever ut… view at source ↗
Figure 9
Figure 9. Figure 9: Per-decision cost on real hardware across two Jetson [PITH_FULL_IMAGE:figures/full_fig_p009_9.png] view at source ↗
read the original abstract

On-device language-model agents improve by accumulating experience in retrieved memory rather than by updating weights. This memory is hard-bounded and exposed: it consumes RAM and energy, reaches peers through a thin uplink, and becomes an attack surface because it is writable by what the agent reads. Existing systems each cover one part of this problem: agentic memories grow without a budget, on-device methods keep entries by success alone, and poisoning is studied mainly as an attack rather than as a memory-governance problem. We propose \sys{}, a single net-value-per-byte score that governs an agent's experience-memory lifecycle. The main idea is to let the budget act as the curator: each entry is scored as value minus harm, per byte, so one ruler decides what to keep, share, and trust. \sys{} makes three decisions: (1) \textbf{KEEP} evicts low-value bytes under the RAM and energy budget; (2) \textbf{SHARE} sends an insight only when its value exceeds its uplink cost; and (3) \textbf{TRUST} gates a peer entry by provenance. On language-model-agent task-drift benchmarks and a real heterogeneous Jetson testbed with two robot-arm nodes and a hub, \sys{} reduces memory by $2.7\times$ and uplink by $2.4\times$, drives injection success from 0.75 to zero, and raises accuracy on cases corrupted by poison or stale memory. Curating by net value reduces footprint, energy, uplink, and injection success together without reducing accuracy. In this setting, forgetting by net value improves the agent rather than weakening it.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper proposes \sys{}, a system for on-device LLM-agent continual learning that curates experience memory using a single net-value-per-byte score (value minus harm) to govern KEEP (evict under RAM/energy budget), SHARE (send only if value exceeds uplink cost), and TRUST (gate peer entries by provenance) decisions. On task-drift benchmarks and a heterogeneous Jetson testbed with robot-arm nodes, it claims 2.7× memory reduction, 2.4× uplink reduction, injection success driven to zero, and preserved or improved accuracy compared to baselines that grow memory without budget or keep by success alone.

Significance. If the net-value estimation procedure can be shown to be reliable, device-independent, and non-circular under task drift, the approach would provide a unified, budget-driven mechanism for simultaneously addressing memory footprint, energy, communication cost, and poisoning resistance in on-device agents without sacrificing task performance; this would be a notable contribution to continual learning and secure on-device AI.

major comments (2)
  1. [Abstract] Abstract: the central claim that a single net-value-per-byte score governs keep/share/trust decisions and produces the reported gains (2.7× memory, 2.4× uplink, injection success → 0, accuracy preserved) rests on the ability to compute value and harm accurately on-device, yet the abstract (and by extension the method) provides no description of the estimation procedure, no equations, no baselines, no error bars, and no experimental protocol for how value/harm are obtained or how accuracy is measured under poison/stale memory.
  2. [Abstract] The weakest assumption (single scalar sufficient across changing tasks and heterogeneous devices) is load-bearing for all empirical claims; without evidence that the estimators remain accurate under drift or device variation, the reported improvements cannot be attributed to the net-value curator rather than to unstated heuristics or oracles.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments highlighting the need for greater clarity in the abstract regarding the net-value estimation. We address each point below and will revise the manuscript to strengthen the presentation of our method and evidence.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that a single net-value-per-byte score governs keep/share/trust decisions and produces the reported gains (2.7× memory, 2.4× uplink, injection success → 0, accuracy preserved) rests on the ability to compute value and harm accurately on-device, yet the abstract (and by extension the method) provides no description of the estimation procedure, no equations, no baselines, no error bars, and no experimental protocol for how value/harm are obtained or how accuracy is measured under poison/stale memory.

    Authors: We agree the abstract is too condensed and omits key details on estimation. The full manuscript (Section 3) provides the net-value-per-byte equations, the on-device LLM-based estimation procedure for value and harm, the baselines compared, error bars from repeated runs, and the protocol for accuracy measurement under poison and stale memory on the task-drift benchmarks. We will revise the abstract to include a concise description of the estimation approach along with a pointer to the detailed sections and experimental protocol. revision: yes

  2. Referee: [Abstract] The weakest assumption (single scalar sufficient across changing tasks and heterogeneous devices) is load-bearing for all empirical claims; without evidence that the estimators remain accurate under drift or device variation, the reported improvements cannot be attributed to the net-value curator rather than to unstated heuristics or oracles.

    Authors: The reported results are obtained exactly on task-drift benchmarks and a heterogeneous Jetson testbed with robot-arm nodes, where the net-value curator produces the stated gains while preserving accuracy; this constitutes direct empirical support that the single scalar functions under the tested drift and device conditions. We will add a short analysis subsection quantifying estimator stability (value/harm correlation with ground-truth task performance) across drift episodes and device types to make this attribution explicit. revision: partial

Circularity Check

0 steps flagged

No circularity; conceptual proposal with no derivations or self-referential reductions shown.

full rationale

The manuscript introduces a net-value-per-byte score as the governing mechanism for KEEP/SHARE/TRUST decisions but supplies no equations, formal derivations, or parameter-fitting procedures. All claims rest on empirical benchmark outcomes rather than any chain that reduces a prediction or uniqueness result to its own inputs by construction. No self-citations are invoked as load-bearing uniqueness theorems, and the score is presented as a design choice rather than a derived quantity. This is the normal case of a self-contained empirical proposal with no detectable circularity.

Axiom & Free-Parameter Ledger

1 free parameters · 0 axioms · 0 invented entities

The central mechanism rests on an unstated method for computing value and harm per byte; these quantities function as free parameters or domain assumptions whose accuracy is not evidenced in the provided text.

free parameters (1)
  • value and harm estimators
    The net-value score requires functions that assign value and harm to each memory entry; these are not specified and must be fitted or assumed to make the curator work.

pith-pipeline@v0.9.1-grok · 5840 in / 1220 out tokens · 18758 ms · 2026-06-25T23:58:09.687752+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

67 extracted references · 14 linked inside Pith

  1. [1]

    Apple Intelligence Foundation Language Models: Tech Report 2025,

    Apple, “Apple Intelligence Foundation Language Models: Tech Report 2025,” arXiv preprint arXiv:2507.13575, 2025

  2. [2]

    Gemini: A Family of Highly Capable Multi- modal Models,

    Gemini Team, Google, “Gemini: A Family of Highly Capable Multi- modal Models,” arXiv preprint arXiv:2312.11805, 2023

  3. [3]

    Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone,

    Microsoft, “Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone,” arXiv preprint arXiv:2404.14219, 2024

  4. [4]

    The Llama 3 Herd of Models,

    Llama Team, Meta AI, “The Llama 3 Herd of Models,” arXiv preprint arXiv:2407.21783, 2024

  5. [5]

    Qwen2.5 Technical Report,

    Qwen Team, Alibaba, “Qwen2.5 Technical Report,” arXiv preprint arXiv:2412.15115, 2024

  6. [6]

    Cognitive Edge Computing: A Comprehen- sive Survey on Optimizing Large Models and AI Agents for Pervasive Deployment,

    X. Wang, Q. Li, and W. Jia, “Cognitive Edge Computing: A Comprehen- sive Survey on Optimizing Large Models and AI Agents for Pervasive Deployment,” arXiv preprint arXiv:2501.03265, 2025

  7. [7]

    On- Device Language Models: A Comprehensive Review,

    J. Xu, Z. Li, W. Chen, Q. Wang, X. Gao, Q. Cai, and Z. Ling, “On- Device Language Models: A Comprehensive Review,” arXiv preprint arXiv:2409.00088, 2024

  8. [8]

    Reinforcement Learning- Based Energy-Aware Coverage Path Planning for Precision Agriculture,

    B. Wu, Z. Ding, L. Ostigaard, and J. Huang, “Reinforcement Learning- Based Energy-Aware Coverage Path Planning for Precision Agriculture,” in2025 ACM Research on Adaptive and Convergent Systems (RACS). ACM, 2025, pp. 1–8

  9. [9]

    Task-Oriented Communications for Visual Navigation with Edge-Aerial Collaboration in Low Altitude Economy,

    Z. Fang, Z. Liu, J. Wang, S. Hu, Y . Guo, Y . Deng, and Y . Fang, “Task-Oriented Communications for Visual Navigation with Edge-Aerial Collaboration in Low Altitude Economy,” inProc. IEEE Global Com- munications Conference (GLOBECOM), 2026

  10. [10]

    R- ACP: Real-Time Adaptive Collaborative Perception Leveraging Robust Task-Oriented Communications,

    Z. Fang, J. Wang, Y . Ma, Y . Tao, Y . Deng, X. Chen, and Y . Fang, “R- ACP: Real-Time Adaptive Collaborative Perception Leveraging Robust Task-Oriented Communications,”IEEE Journal on Selected Areas in Communications, 2025

  11. [11]

    When Continual Learning Moves to Memory: A Study of Experience Reuse in LLM Agents,

    Q. Hu, Q. Long, and W. Wang, “When Continual Learning Moves to Memory: A Study of Experience Reuse in LLM Agents,” arXiv preprint arXiv:2604.27003, 2026

  12. [12]

    PoisonedRAG: Knowledge Cor- ruption Attacks to Retrieval-Augmented Generation of Large Language Models,

    W. Zou, R. Geng, B. Wang, and J. Jia, “PoisonedRAG: Knowledge Cor- ruption Attacks to Retrieval-Augmented Generation of Large Language Models,” inProceedings of the USENIX Security Symposium, 2025

  13. [13]

    MemoryGraft: Persistent Compromise of LLM Agents via Poisoned Experience Retrieval,

    S. S. Srivastava and H. He, “MemoryGraft: Persistent Compromise of LLM Agents via Poisoned Experience Retrieval,” arXiv preprint arXiv:2512.16962, 2025

  14. [14]

    SPRInG: Continual LLM Personalization via Selective Parametric Adaptation and Retrieval-Interpolated Generation,

    S. Kim and J. Kim, “SPRInG: Continual LLM Personalization via Selective Parametric Adaptation and Retrieval-Interpolated Generation,” arXiv preprint arXiv:2601.09974, 2026

  15. [15]

    A-MEM: Agentic Memory for LLM Agents,

    W. Xu, Z. Liang, K. Mei, H. Gao, J. Tan, and Y . Zhang, “A-MEM: Agentic Memory for LLM Agents,” inAdvances in Neural Information Processing Systems, 2025

  16. [16]

    Agentic Mem- ory: Learning Unified Long-Term and Short-Term Memory Management for Large Language Model Agents,

    Y . Yu, L. Yao, Y . Xie, Q. Tan, J. Feng, Y . Li, and L. Wu, “Agentic Mem- ory: Learning Unified Long-Term and Short-Term Memory Management for Large Language Model Agents,” arXiv preprint arXiv:2601.01885, 2026

  17. [17]

    Forgetful but Faithful: A Cognitive Memory Architecture and Benchmark for Privacy-Aware Generative Agents,

    S. Alqithami, “Forgetful but Faithful: A Cognitive Memory Architecture and Benchmark for Privacy-Aware Generative Agents,” arXiv preprint arXiv:2512.12856, 2025

  18. [18]

    Inference-Time Budget Control for LLM Search Agents,

    Z. Fang, S. F. Hu, Z. Chang, Y . Guo, Y . Tao, H. Liu, M. Ruan, J. Huang, and Y . Fang, “Inference-Time Budget Control for LLM Search Agents,” arXiv preprint arXiv:2605.05701, 2026

  19. [19]

    Governing Evolving Memory in LLM Agents: Risks, Mechanisms, and the Stability and Safety Governed Memory (SSGM) Framework,

    C. Lam, J. Li, L. Zhang, and K. Zhao, “Governing Evolving Memory in LLM Agents: Risks, Mechanisms, and the Stability and Safety Governed Memory (SSGM) Framework,” arXiv preprint arXiv:2603.11768, 2026

  20. [20]

    Zombie Agents: Persistent Control of Self-Evolving LLM Agents via Self-Reinforcing Injections,

    X. Yang, Y . He, S. Ji, B. Hooi, and J. S. Dong, “Zombie Agents: Persistent Control of Self-Evolving LLM Agents via Self-Reinforcing Injections,” arXiv preprint arXiv:2602.15654, 2026

  21. [21]

    Pretrained Vision- Language-Action Models are Surprisingly Resistant to Forgetting in Continual Learning,

    H. Liu, C. Kim, B. Liu, M. Liu, and Y . Zhu, “Pretrained Vision- Language-Action Models are Surprisingly Resistant to Forgetting in Continual Learning,” arXiv preprint arXiv:2603.03818, 2026

  22. [22]

    Generative Agents: Interactive Simulacra of Human Behavior,

    J. S. Park, J. C. O’Brien, C. J. Cai, M. R. Morris, P. Liang, and M. S. Bernstein, “Generative Agents: Interactive Simulacra of Human Behavior,” inProceedings of the ACM Symposium on User Interface Software and Technology, 2023

  23. [23]

    EvoTest: Evolutionary Test-Time Learning for Self-Improving Agentic Systems,

    Y . He, J. Liu, Y . Liu, Y . Li, T. Cao, Z. Hu, X. Xu, and B. Hooi, “EvoTest: Evolutionary Test-Time Learning for Self-Improving Agentic Systems,” inInternational Conference on Learning Representations, 2026

  24. [24]

    A Review of Continual Learning in Edge AI,

    B. Wu, Z. Ding, and J. Huang, “A Review of Continual Learning in Edge AI,”IEEE Transactions on Network Science and Engineering, vol. 13, pp. 6571–6588, 2026

  25. [25]

    Lifecycle-Aware Federated Continual Learning in Mobile Autonomous Systems,

    B. Wu and J. Huang, “Lifecycle-Aware Federated Continual Learning in Mobile Autonomous Systems,” arXiv preprint arXiv:2604.20745, 2026

  26. [26]

    From Alpha to Omega: Lifecycle- Aware Forgetting Defense in Federated Continual Learning for Planetary Exploration,

    B. Wu, J. Huang, and Y . Zhao, “From Alpha to Omega: Lifecycle- Aware Forgetting Defense in Federated Continual Learning for Planetary Exploration,” inProceedings of the IEEE International Conference on Distributed Computing Systems (ICDCS), 2026

  27. [27]

    PRISM: Exposing and Resolving Spurious Isolation in Federated Multimodal Continual Learning,

    B. Wu, Z. Ding, and J. Huang, “PRISM: Exposing and Resolving Spurious Isolation in Federated Multimodal Continual Learning,” arXiv preprint arXiv:2605.01061, 2026

  28. [28]

    Cambricon-LLM: A Chiplet-Based Hybrid Architecture for On-Device Inference of 70B LLM,

    Z. Yu, S. Liang, T. Ma, Y . Cai, Z. Nan, D. Huang, X. Song, Y . Hao, J. Zhang, T. Zhi, Y . Zhao, Z. Du, X. Hu, Q. Guo, and T. Chen, “Cambricon-LLM: A Chiplet-Based Hybrid Architecture for On-Device Inference of 70B LLM,” inProceedings of the IEEE/ACM International Symposium on Microarchitecture, 2024, pp. 1474–1488

  29. [29]

    EdgeMoE: Empowering Sparse Large Language Models on Mobile Devices,

    R. Yi, L. Guo, S. Wei, A. Zhou, S. Wang, and M. Xu, “EdgeMoE: Empowering Sparse Large Language Models on Mobile Devices,”IEEE Transactions on Mobile Computing, vol. 24, no. 8, pp. 7059–7073, 2025

  30. [30]

    RELIEF: Turning Missing Modalities into Training Acceleration for Federated Learning on Heterogeneous IoT Edge,

    B. Wu, Z. Ding, and J. Huang, “RELIEF: Turning Missing Modalities into Training Acceleration for Federated Learning on Heterogeneous IoT Edge,” arXiv preprint arXiv:2604.04243, 2026

  31. [31]

    Application-Aware Twin-in- the-Loop Planning for Federated Split Learning over Wireless Edge Networks,

    Z. Ding, B. Wu, J. Huang, and S. Mao, “Application-Aware Twin-in- the-Loop Planning for Federated Split Learning over Wireless Edge Networks,” arXiv preprint arXiv:2604.26105, 2026

  32. [32]

    A Stochastic Geometry-Based Analysis of SWIPT-Assisted Underlaid Device-to-Device Energy Har- vesting,

    C.-C. Xing, Z. Ding, and J. Huang, “A Stochastic Geometry-Based Analysis of SWIPT-Assisted Underlaid Device-to-Device Energy Har- vesting,”SIGAPP Appl. Comput. Rev., vol. 25, no. 4, pp. 18–34, 2026

  33. [33]

    A Fault-Tolerant and Energy-Efficient Design of a Network Switch Based on a Quantum- Based Nano-Communication Technique,

    D. Pan, B.-N. Wu, Y .-L. Sun, and Y .-P. Xu, “A Fault-Tolerant and Energy-Efficient Design of a Network Switch Based on a Quantum- Based Nano-Communication Technique,”Sustainable Computing: In- formatics and Systems, vol. 37, p. 100827, 2023

  34. [34]

    LLM in a Flash: Efficient Large Language Model Inference with Limited Memory,

    K. Alizadeh, S. I. Mirzadeh, D. Belenko, S. K. Khatamifard, M. Cho, C. C. Del Mundo, M. Rastegari, and M. Farajtabar, “LLM in a Flash: Efficient Large Language Model Inference with Limited Memory,” in Proceedings of the Annual Meeting of the Association for Computational Linguistics, 2024, pp. 12 562–12 584

  35. [35]

    Agent Memory Below the Prompt: Persistent Q4 KV Cache for Multi-Agent LLM Inference on Edge Devices,

    Y . P. Shkolnikov, “Agent Memory Below the Prompt: Persistent Q4 KV Cache for Multi-Agent LLM Inference on Edge Devices,” arXiv preprint arXiv:2603.04428, 2026

  36. [36]

    Enhancing Vehic- ular Platooning With Wireless Federated Learning: A Resource-Aware Control Framework,

    B. Wu, J. Huang, Q. Duan, L. Dong, and Z. Cai, “Enhancing Vehic- ular Platooning With Wireless Federated Learning: A Resource-Aware Control Framework,”IEEE/ACM Transactions on Networking, pp. 1–1, 2025

  37. [37]

    A Fast UA V Tra- jectory Planning Framework in RIS-Assisted Communication Systems With Accelerated Learning via Multithreading and Federating,

    J. Huang, B. Wu, Q. Duan, L. Dong, and S. Yu, “A Fast UA V Tra- jectory Planning Framework in RIS-Assisted Communication Systems With Accelerated Learning via Multithreading and Federating,”IEEE Transactions on Mobile Computing, pp. 1–16, 2025

  38. [38]

    Prioritized Information Bottleneck Theoretic Framework With Distributed Online Learning for Edge Video Analytics,

    Z. Fang, S. Hu, J. Wang, Y . Deng, X. Chen, and Y . Fang, “Prioritized Information Bottleneck Theoretic Framework With Distributed Online Learning for Edge Video Analytics,”IEEE Transactions on Networking, pp. 1–17, 2025

  39. [39]

    AgentPoison: Red- teaming LLM Agents via Poisoning Memory or Knowledge Bases,

    Z. Chen, Z. Xiang, C. Xiao, D. Song, and B. Li, “AgentPoison: Red- teaming LLM Agents via Poisoning Memory or Knowledge Bases,” in Advances in Neural Information Processing Systems, 2024

  40. [40]

    SCALE: Sensitivity-Aware Federated Unlearning with Information Freshness Optimization for Mobile Edge Computing,

    Z. Ding, B. Wu, and J. Huang, “SCALE: Sensitivity-Aware Federated Unlearning with Information Freshness Optimization for Mobile Edge Computing,” inProceedings of the IEEE International Conference on Distributed Computing Systems (ICDCS), 2026

  41. [41]

    EASE: Federated Multimodal Unlearning via Entanglement- Aware Anchor Closure,

    ——, “EASE: Federated Multimodal Unlearning via Entanglement- Aware Anchor Closure,” arXiv preprint arXiv:2605.00733, 2026

  42. [42]

    Securing Smart Agriculture with Communication-Efficient Federated Unlearning,

    U. Pudasaini, Z. Ding, and J. Huang, “Securing Smart Agriculture with Communication-Efficient Federated Unlearning,” in2026 IEEE International Conference on High Performance Switching and Routing (HPSR). IEEE, 2026, pp. 1–8

  43. [43]

    Memory Poisoning Attack and Defense on Memory Based LLM-Agents,

    S. B. Devarangadi, I. Sinha, P. Maheshwari, S. Todmal, S. Mallik, and S. M. Mishra, “Memory Poisoning Attack and Defense on Memory Based LLM-Agents,” arXiv preprint arXiv:2601.05504, 2026

  44. [44]

    Learning to Defend: A Multi-Agent Reinforcement Learning Framework for Stackelberg Security Game in Mobile Edge Computing,

    Z. Ding, J. Huang, and J. Qi, “Learning to Defend: A Multi-Agent Reinforcement Learning Framework for Stackelberg Security Game in Mobile Edge Computing,” inInternational Conference on Computing, Networking and Communications (ICNC). IEEE, 2026

  45. [45]

    “X of Information

    B. Wu, J. Huang, and S. Yu, ““X of Information” Continuum: A Survey on AI-Driven Multi-Dimensional Metrics for Next-Generation Net- worked Systems,”IEEE Communications Surveys & Tutorials, vol. 28, pp. 5307–5344, 2026

  46. [46]

    AoI-Aware Resource Management for Smart Health via Deep Reinforcement Learning,

    B. Wu, Z. Cai, W. Wu, and X. Yin, “AoI-Aware Resource Management for Smart Health via Deep Reinforcement Learning,”IEEE Access, 2023

  47. [47]

    Real-Time Intelligent Healthcare Enabled by Federated Digital Twins With AoI Optimization,

    B. Wu, J. Huang, and Q. Duan, “Real-Time Intelligent Healthcare Enabled by Federated Digital Twins With AoI Optimization,”IEEE Network, vol. 40, no. 2, pp. 184–191, 2026

  48. [48]

    Kellerer, U

    H. Kellerer, U. Pferschy, and D. Pisinger,Knapsack Problems. Springer, 2004

  49. [49]

    M. J. Neely,Stochastic Network Optimization with Application to Communication and Queueing Systems. Morgan & Claypool, 2010

  50. [50]

    Model-Free Cooperative Optimal Output Regulation for Linear Discrete-Time Multi-Agent Systems Using Reinforcement Learning,

    B. Wu and W. Wu, “Model-Free Cooperative Optimal Output Regulation for Linear Discrete-Time Multi-Agent Systems Using Reinforcement Learning,”Mathematical Problems in Engineering, vol. 2023, no. 1, p. 6350647, 2023

  51. [51]

    FedTD3: An Accelerated Learning Approach for UA V Trajectory Planning,

    B. Wu, J. Huang, and Q. Duan, “FedTD3: An Accelerated Learning Approach for UA V Trajectory Planning,” inInternational Conference on Wireless Artificial Intelligent Computing Systems and Applications (WASA). Springer, 2025, pp. 13–24

  52. [52]

    A Dual- Level Game-Theoretic Approach for Collaborative Learning in UA V- Assisted Heterogeneous Vehicle Networks,

    Z. Ding, J. Huang, Q. Duan, C. Zhang, Y . Zhao, and S. Gu, “A Dual- Level Game-Theoretic Approach for Collaborative Learning in UA V- Assisted Heterogeneous Vehicle Networks,” in2025 IEEE International Performance, Computing, and Communications Conference (IPCCC). IEEE, 2025, pp. 1–8

  53. [53]

    ALFWorld: Aligning Text and Embodied Environments for Interactive Learning,

    M. Shridhar, X. Yuan, M.-A. C ˆot´e, Y . Bisk, A. Trischler, and M. Hausknecht, “ALFWorld: Aligning Text and Embodied Environments for Interactive Learning,” inInternational Conference on Learning Representations, 2021

  54. [54]

    BabyAI: A Platform to Study the Sample Efficiency of Grounded Language Learning,

    M. Chevalier-Boisvert, D. Bahdanau, S. Lahlou, L. Willems, C. Saharia, T. H. Nguyen, and Y . Bengio, “BabyAI: A Platform to Study the Sample Efficiency of Grounded Language Learning,” inInternational Conference on Learning Representations, 2019

  55. [55]

    Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions,

    Y . Hu, Y . Wang, and J. McAuley, “Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions,” inInternational Conference on Learning Representations, 2026

  56. [56]

    Memory Injection Attacks on LLM Agents via Query-Only Interac- tion,

    S. Dong, S. Xu, P. He, Y . Li, J. Tang, T. Liu, H. Liu, and Z. Xiang, “Memory Injection Attacks on LLM Agents via Query-Only Interac- tion,” arXiv preprint arXiv:2503.03704, 2025

  57. [57]

    Adaptive Memory Admission Control for LLM Agents,

    G. Zhang, W. Jiang, X. Wang, A. Behr, K. Zhao, J. Friedman, X. Chu, and A. Anoun, “Adaptive Memory Admission Control for LLM Agents,” arXiv preprint arXiv:2603.04549, 2026

  58. [58]

    MemGPT: Towards LLMs as Operating Systems,

    C. Packer, S. Wooders, K. Lin, V . Fang, S. G. Patil, I. Stoica, and J. E. Gonzalez, “MemGPT: Towards LLMs as Operating Systems,” arXiv preprint arXiv:2310.08560, 2023

  59. [59]

    MemoryBank: Enhanc- ing Large Language Models with Long-Term Memory,

    W. Zhong, L. Guo, Q. Gao, H. Ye, and Y . Wang, “MemoryBank: Enhanc- ing Large Language Models with Long-Term Memory,” inProceedings of the AAAI Conference on Artificial Intelligence, 2024, pp. 19 724– 19 731

  60. [60]

    Shared Spatial Memory Through Predictive Coding,

    Z. Fang, Y . Guo, J. Wang, Y . Zhang, H. An, Y . Wang, and Y . Fang, “Shared Spatial Memory Through Predictive Coding,” arXiv preprint arXiv:2511.04235, 2025

  61. [61]

    RAIE: Region-Aware Incremental Preference Editing with LoRA for LLM- based Recommendation,

    J. Zeng, Y . Qi, H. Li, C. Li, Z. Lyu, L. Cui, and L. Bai, “RAIE: Region-Aware Incremental Preference Editing with LoRA for LLM- based Recommendation,” inProceedings of the ACM Web Conference, 2026

  62. [62]

    Overcoming Catastrophic Forgetting in Neural Networks,

    J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska, D. Hassabis, C. Clopath, D. Kumaran, and R. Hadsell, “Overcoming Catastrophic Forgetting in Neural Networks,”Proceedings of the Na- tional Academy of Sciences, vol. 114, no. 13, pp. 3521–3526, 2017

  63. [63]

    HiDe-PET: Continual Learning via Hierarchical Decomposition of Parameter-Efficient Tun- ing,

    L. Wang, J. Xie, X. Zhang, H. Su, and J. Zhu, “HiDe-PET: Continual Learning via Hierarchical Decomposition of Parameter-Efficient Tun- ing,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 47, no. 8, pp. 6687–6702, 2025

  64. [64]

    ShadowCoT: Cognitive Hijacking for Stealthy Reasoning Backdoors in LLMs,

    G. Zhao, H. Wu, X. Zhang, and A. V . Vasilakos, “ShadowCoT: Cognitive Hijacking for Stealthy Reasoning Backdoors in LLMs,”IEEE Transac- tions on Information Forensics and Security, vol. 21, pp. 4625–4639, 2026

  65. [65]

    Traceback of Poisoning Attacks to Retrieval-Augmented Generation,

    B. Zhang, H. Xin, M. Fang, Z. Liu, B. Yi, T. Li, and Z. Liu, “Traceback of Poisoning Attacks to Retrieval-Augmented Generation,” inProceedings of the ACM Web Conference, 2025, pp. 2085–2097

  66. [66]

    Retrieval- Augmented Generation with Estimation of Source Reliability,

    J. Hwang, J. Park, H. Park, D.-W. Kim, S. Park, and J. Ok, “Retrieval- Augmented Generation with Estimation of Source Reliability,” inPro- ceedings of the Conference on Empirical Methods in Natural Language Processing, 2025, pp. 34 267–34 291

  67. [67]

    Learning in the Null Space: Small Singular Values for Continual Learning,

    C. A. Pham, P. Vepakomma, and S. Horv ´ath, “Learning in the Null Space: Small Singular Values for Continual Learning,” inProceedings of the Conference on Parsimony and Learning, 2026