{"total":17,"items":[{"citing_arxiv_id":"2606.28876","ref_index":29,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Memory-Managed Long-Context Attention: A Preliminary Study of Editable Request-Local Memory","primary_cat":"cs.CL","submitted_at":"2026-06-27T11:38:43+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A hybrid attention mechanism with editable request-local memory slots and sparse fallback achieves high accuracy on synthetic overwrite, version, and anti-pollution tasks where pure fixed-state or sparse methods fail, while identifying open-domain selection as the remaining bottleneck.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.26797","ref_index":22,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Latent Recurrent Transformer: Architecture Exploration, Training Strategies, and Scaling Behavior","primary_cat":"cs.LG","submitted_at":"2026-05-26T10:10:26+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Latent Recurrent Transformer augments autoregressive transformers with a cross-layer recurrent latent pathway from prior hidden states and uses interleaved parallel training to improve loss and in-context learning at ~0.3% extra parameters.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.24930","ref_index":24,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"H$^{2}$MT: Semantic Hierarchy-Aware Hierarchical Memory Transformer","primary_cat":"cs.CL","submitted_at":"2026-05-24T08:22:05+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"H²MT uses offline semantic hierarchy construction, bottom-up memory aggregation, and coarse-to-fine query routing to achieve competitive QA quality with lower memory and latency than flat or retrieval baselines on LongBench tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.22884","ref_index":20,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Tensor Cache: Eviction-conditioned Associative Memory for Transformers","primary_cat":"cs.LG","submitted_at":"2026-05-21T00:21:51+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Tensor Cache augments sliding-window attention with an eviction-fed outer-product associative memory and a training correction to improve long-context performance under bounded memory.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.16893","ref_index":40,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"NGM: A Plug-and-Play Training-Free Memory Module for LLMs","primary_cat":"cs.AI","submitted_at":"2026-05-16T09:12:52+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"NGM is a plug-and-play n-gram memory module that encodes n-grams from pretrained embeddings and gates their injection to improve LLM performance by 0.5-1.2 points on average across eight benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.13370","ref_index":21,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Phasor Memory Networks: Stable Backpropagation Through Time for Scalable Explicit Memory","primary_cat":"cs.LG","submitted_at":"2026-05-13T11:28:06+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"PMNet uses unitary phasor dynamics and hierarchical anchors to make explicit memory stable for long sequences, matching a 3x larger Mamba model on long-context robustness with a 119M parameter network.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.10993","ref_index":24,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"ECHO: Continuous Hierarchical Memory for Vision-Language-Action Models","primary_cat":"cs.RO","submitted_at":"2026-05-09T13:06:33+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"ECHO organizes VLA experiences into a hierarchical memory tree in hyperbolic space via autoencoder and entailment constraints, delivering a 12.8% success-rate gain on LIBERO-Long over the pi0 baseline.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"short-term history by stacking recent observations within the Transformer context window [19, 20], while retrieval-augmented approaches retrieve relevant past experiences or expert trajectories from external databases [21-23]. However, these memories are commonly stored as linear sequences or flat vector banks. As interaction data grows, global similarity search over a flat memory bank becomes increasingly expensive [24]. More importantly, flat retrieval ignores the hierarchical structure of manipulation, where high-level tasks naturally decompose into sub-goals and low-level control primitives [25, 26].ECHOaddresses this limitation by organizing manipulation experiences as a continuous hierarchy rather than merely increasing memory capacity. Hyperbolic Embeddings for Hierarchical Representation."},{"citing_arxiv_id":"2605.06225","ref_index":18,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Memory Inception: Latent-Space KV Cache Manipulation for Steering LLMs","primary_cat":"cs.LG","submitted_at":"2026-05-07T13:19:33+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Memory Inception is a training-free method that injects latent KV banks at chosen layers to steer LLMs, achieving superior control-drift balance and up to 118x storage reduction on personality and structured-reasoning tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.05066","ref_index":32,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"The Impossibility Triangle of Long-Context Modeling","primary_cat":"cs.CL","submitted_at":"2026-05-06T16:01:43+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"No model can achieve efficiency, compactness, and recall capacity scaling with sequence length at once, as any two imply a strict bound of O(poly(d)/log V) on recallable facts.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"The re-scan costsO(T·d) per query, which grows linearly withT. ThereforeEis violated. A.4 Proof of Theorem 23 (Trade-off Inequality) Proof[Proof of Theorem 23] By Definition 22, the state size satisfies|s T |bits ≤c·T·log 2 V·b, and the number of recallable pairs isn ∗ =r·T. Substituting into the bound (6) from Theorem 10, r·T≤ c·T·log 2 V·b (1−ε) log 2 V−1 .(32) Dividing both sides byT >0, r≤ c·log 2 V·b (1−ε) log 2 V−1 .(33) Factoring log2 Vfrom the denominator, r≤ c·b (1−ε)−1/log 2 V ,(34) which is (22). A.5 Proof of Proposition 17 ProofConsider a Transformer withn layers layers,n heads attention heads per layer, head dimensiond h =d/n heads, andb-bit precision. State size.At stept, each layer stores a key vectork i ∈R d and a value vectorv i ∈R d"},{"citing_arxiv_id":"2512.12602","ref_index":22,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Exact Flow Linear Attention: Exact Solution from Continuous-Time Dynamics","primary_cat":"cs.LG","submitted_at":"2025-12-14T08:51:02+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Exact Flow Linear Attention derives a closed-form exact update for delta-rule linear attention from continuous-time dynamics, removing Euler discretization error while preserving linear complexity and structure.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2510.26692","ref_index":109,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Kimi Linear: An Expressive, Efficient Attention Architecture","primary_cat":"cs.CL","submitted_at":"2025-10-30T16:59:43+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Kimi Linear hybridizes linear attention with a new KDA module to beat full attention on tasks while slashing KV cache by 75% and speeding decoding up to 6x.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Kimi Linear: An Expressive, Efficient Attention ArchitectureTECHNICALREPORT [107] Guangxuan Xiao et al. \"Efficient streaming language models with attention sinks\". In:arXiv preprint arXiv:2309.17453(2023). [108] Wenhan Xiong et al.Effective Long-Context Scaling of Foundation Models. 2023. arXiv:2309.16039 [cs.CL]. URL:https://arxiv.org/abs/2309.16039. [109] Ruyi Xu et al. \"Xattention: Block sparse attention with antidiagonal scoring\". In:arXiv preprint arXiv:2503.16428(2025). [110] Bowen Yang et al.Rope to Nope and Back Again: A New Hybrid Attention Strategy. 2025. arXiv: 2501.18795 [cs.CL].URL:https://arxiv.org/abs/2501.18795. [111] Songlin Yang, Jan Kautz, and Ali Hatamizadeh. \"Gated Delta Networks: Improving Mamba2 with Delta Rule\"."},{"citing_arxiv_id":"2506.13674","ref_index":37,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"PrefixMemory-Tuning: Modernizing Prefix-Tuning by Decoupling the Prefix from Attention","primary_cat":"cs.CL","submitted_at":"2025-06-16T16:30:26+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"PrefixMemory-Tuning decouples the prefix from attention to overcome performance limits of traditional prefix-tuning and reaches competitive results with modern PEFT methods on LLM adaptation benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2504.15965","ref_index":145,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"From Human Memory to AI Memory: A Survey on Memory Mechanisms in the Era of LLMs","primary_cat":"cs.IR","submitted_at":"2025-04-22T15:05:04+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"The paper surveys human memory categories, maps them to LLM memory, and proposes a new three-dimension (object, form, time) categorization into eight quadrants to organize existing work and highlight open problems.","context_count":1,"top_context_role":"method","top_context_polarity":"background","context_text":"Retrieval [84], CacheGen [129], ChunkAttention [130], RAGCache [131], SGLang [132], Ada-KV [133], HCache [134], Cake [135], EPIC [136], RelayAttention [137], Marconi [138], IKS [139], FastCache [140], Cache-Craft [141], KVLink [142], RAGServe [143], BumbleBee [144] VIII System Parametric Long-Term Parametric Memory Structures Memorizing Transformer [145], Focused Transformer [146], MAC [147], MemoryLLM [148], WISE [149], LongMem [150], LM2 [151], Titans [152] Table 3: System Memory 12 4.1 Contextual System Memory From a temporal perspective, non-parametric short-term system memory refers to a series of rea- soning and action results generated by large language models during task execution. This form of"},{"citing_arxiv_id":"2502.13189","ref_index":17,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"MoBA: Mixture of Block Attention for Long-Context LLMs","primary_cat":"cs.LG","submitted_at":"2025-02-18T14:06:05+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"MoBA routes attention over blocks via MoE-style gating to enable dynamic, bias-light long-context attention that matches full attention performance at lower cost.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2410.10813","ref_index":97,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory","primary_cat":"cs.CL","submitted_at":"2024-10-14T17:59:44+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"LongMemEval benchmarks long-term memory in chat assistants, revealing 30% accuracy drops across sustained interactions and proposing indexing-retrieval-reading optimizations that boost performance.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2404.07143","ref_index":28,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention","primary_cat":"cs.CL","submitted_at":"2024-04-10T16:18:42+00:00","verdict":"CONDITIONAL","verdict_confidence":"MODERATE","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Infini-attention combines compressive memory with masked local attention and long-term linear attention inside each Transformer block to support infinite context length with bounded resources.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2206.07682","ref_index":97,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Emergent Abilities of Large Language Models","primary_cat":"cs.CL","submitted_at":"2022-06-15T17:32:01+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Emergent abilities are capabilities present in large language models but absent in smaller ones and cannot be predicted by extrapolating smaller model performance.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}