{"total":24,"items":[{"citing_arxiv_id":"2605.27044","ref_index":50,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"BatteryMFormer: Multi-level Learning for Battery Degradation Trajectory Forecasting","primary_cat":"cs.AI","submitted_at":"2026-05-26T13:59:02+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"BatteryMFormer is a multi-level Transformer that adds an aging-condition-aware decoder, meta degradation pattern memory, and dual-view encoder to forecast battery state-of-health trajectories from early operational data and outperforms baselines on four domains.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.18601","ref_index":43,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Incantation: Natural Language as the Action Interface for Multi-Entity Video World Models","primary_cat":"cs.CV","submitted_at":"2026-05-18T16:12:52+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Incantation is the first video world model to use per-frame natural language conditioning for simultaneous multi-entity control and concept-level cross-entity transfer in interactive video generation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.17270","ref_index":135,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Beyond Detection: A Structure-Aware Framework for Scene Text Tracking","primary_cat":"cs.CV","submitted_at":"2026-05-17T05:40:55+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"SymTrack is the first systematic detection-free framework for scene text tracking that constructs benchmarks from video text spotting datasets and reports up to 11.97% AUC gains over prior trackers.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.10537","ref_index":23,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Mela: Test-Time Memory Consolidation based on Transformation Hypothesis","primary_cat":"cs.CL","submitted_at":"2026-05-11T13:20:51+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Mela is a Transformer variant with a dual-frequency Hierarchical Memory Module and MemStack that performs test-time memory consolidation, outperforming baselines on long contexts.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.08060","ref_index":10,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"The Memory Curse: How Expanded Recall Erodes Cooperative Intent in LLM Agents","primary_cat":"cs.CL","submitted_at":"2026-05-08T17:47:13+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Expanded recall in LLM agents erodes cooperative intent in multi-agent social dilemmas, observed in 18 of 28 model-game settings.","context_count":1,"top_context_role":"background","top_context_polarity":"unclear","context_text":"pre-filter retained only traces containing forward-looking keywords (e.g.,future, long-term, signal, mutual benefit), discarding approximately 28% of the raw corpus. A random subset of 12,000 surviving traces, balanced across the six source models and seven history-length conditions, was then evaluated by an LLM-as-a-judge (Llama-3.3-70B-Instruct, T= 0). The judge assigned integer scores in [0, 10] for forward-looking density (sfwd), logical coherence (squal), and history specificity (sspec), anchored to a pinned forward-vs-reactive vocabulary dictionary. To avoid circular causality, we didnothard-filter on the chosen action at. The selection pipeline applied three sequential filters: (i) judge thresholds sfwd ≥ 9 ∧s qual ≥ 9 ∧s spec ≥ 7 retained 5,124 traces; (ii) an independent anti-cheat substring-match pass on the same"},{"citing_arxiv_id":"2605.06216","ref_index":79,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"TIDE: Every Layer Knows the Token Beneath the Context","primary_cat":"cs.CL","submitted_at":"2026-05-07T13:16:18+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"TIDE augments standard transformers with per-layer token embedding injection via an ensemble of memory blocks and a depth-conditioned router to mitigate rare-token undertraining and contextual collapse.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.05189","ref_index":33,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Sharp Capacity Thresholds in Linear Associative Memory: From Winner-Take-All to Listwise Retrieval","primary_cat":"stat.ML","submitted_at":"2026-05-06T17:53:20+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Winner-take-all linear memory capacity scales as d² ~ n log n due to extreme values; listwise retrieval via Tail-Average Margin yields d² ~ n with exact asymptotic theory.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"h −u h ξ⊤ h ,(32) where ηh := 1 k X j̸=h ˆqj,h uj, ξ h := 1 k X i̸=h ˆai ˆqh,i vi. Step 3: Linearize the common part.In Appendix C, we carry out a first-order Tay- lor expansion ofR \\h in the perturbations ∆Sand ∆µ, with all higher-order terms collected into a residual matrix ˆE\\h supported on{j, i̸=h}. This yields R\\h :,i =M \\h i ∆S:,i + ˆE\\h :,i , i̸=h,(33) withM \\h i as in (28). Combining (33) with (32) and the definition ofH \\h yields H \\h[∆W] = ˆa h uhv⊤ h −ˆah ηh v⊤ h −u h ξ⊤ h −U ˆE\\hV ⊤.(34) Moreover, we show in Appendix C that, under (A1)-(A3), ˆE\\h i,i =O ≺(n−1) fori̸=h, ˆE\\h j,i =O ≺(n−2) forj̸=i, j, i̸=h.(35) Step 4: Invert and bound the correction.Applying (H \\h)−1 to (34) produces (29)"},{"citing_arxiv_id":"2605.04651","ref_index":19,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"FAAST: Forward-Only Associative Learning via Closed-Form Fast Weights for Test-Time Supervised Adaptation","primary_cat":"cs.LG","submitted_at":"2026-05-06T08:58:11+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"FAAST performs test-time supervised adaptation by analytically deriving fast weights from examples in one forward pass, matching backprop performance with over 90% less adaptation time and up to 95% memory savings versus memory-based methods.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"The closed-form solution W ⋆ admits an interpretation as attention-based retrieval, analogous to Eq. 6 in Section 3. Fast weights can be viewed as a special form of attention (Vaswani et al., 2017) that retrieves key-value pairs through a least- squares criterion (see Appendix B.2). We term this mechanismpseudoinverse attention, which solves the exact retrieval problem min a ∥K ⊤a−q∥ 2 2.(19) The solution is given bya ⋆ =K †q, leading to the retrieved output h= (a ⋆)⊤V=q ⊤(K †V) =q ⊤W ⋆.(20) Unlike standard attention mechanisms, pseudoinverse attention permits attention weights a⋆ to take negative values, reflecting a fundamentally different retrieval behavior. Relation to Classic Attention Mechanisms.FAAST represents the fully compressed limit of attention-based memory"},{"citing_arxiv_id":"2604.08519","ref_index":91,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Cram Less to Fit More: Training Data Pruning Improves Memorization of Facts","primary_cat":"cs.CL","submitted_at":"2026-04-09T17:55:50+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Loss-based pruning of training data to limit facts and flatten their frequency distribution enables a 110M-parameter GPT-2 model to memorize 1.3 times more entity facts than standard training, matching a 1.3B-parameter model on the full dataset.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2508.16745","ref_index":71,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Beyond Memorization: Extending Reasoning Depth with Recurrence, Memory and Test-Time Compute Scaling","primary_cat":"cs.LG","submitted_at":"2025-08-22T18:57:08+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"In a cellular automata rule-inference task designed to block memorization, neural models achieve high next-step accuracy but accuracy falls sharply with longer reasoning chains; depth, recurrence, memory, and test-time compute extend the reachable depth but do not remove the bound.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2507.02259","ref_index":31,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent","primary_cat":"cs.CL","submitted_at":"2025-07-03T03:11:50+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"MemAgent uses multi-conversation RL to train a memory agent that reads text in segments and overwrites memory, extrapolating from 8K training to 3.5M token QA with under 5% loss and 95%+ on 512K RULER.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2501.00663","ref_index":115,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Titans: Learning to Memorize at Test Time","primary_cat":"cs.LG","submitted_at":"2024-12-31T22:32:03+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Titans combine attention for current context with a learnable neural memory for long-term history, achieving better performance and scaling to over 2M-token contexts on language, reasoning, genomics, and time-series tasks.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"\"R-transformer: Recurrent neural network enhanced transformer\". In: arXiv preprint arXiv:1907.05572 (2019). [113] Jason Weston, Sumit Chopra, and Antoine Bordes. \"Memory networks\". In: arXiv preprint arXiv:1410.3916 (2014). [114] Bernard Widrow and Marcian E Hoff. \"Adaptive switching circuits\". In:Neurocomputing: foundations of research. 1988, pp. 123-134. [115] Ronald J Williams and David Zipser. \"A learning algorithm for continually running fully recurrent neural networks\". In: Neural computation 1.2 (1989), pp. 270-280. [116] Daniel B Willingham. \"Systems of memory in the human brain\". In: Neuron 18.1 (1997), pp. 5-8. [117] Chao-Yuan Wu, Christoph Feichtenhofer, Haoqi Fan, Kaiming He, Philipp Krahenbuhl, and Ross Girshick."},{"citing_arxiv_id":"2411.11259","ref_index":41,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Graph Retention Networks for Dynamic Graphs","primary_cat":"cs.LG","submitted_at":"2024-11-18T03:28:11+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Graph Retention Networks extend retention to dynamic graphs to enable parallelizable training, O(1) inference, and chunkwise long-term training while delivering competitive performance with major efficiency gains.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2410.10813","ref_index":96,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory","primary_cat":"cs.CL","submitted_at":"2024-10-14T17:59:44+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"LongMemEval benchmarks long-term memory in chat assistants, revealing 30% accuracy drops across sustained interactions and proposing indexing-retrieval-reading optimizations that boost performance.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2408.16061","ref_index":83,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"3D Reconstruction with Spatial Memory","primary_cat":"cs.CV","submitted_at":"2024-08-28T18:01:00+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Spann3R uses a learned spatial memory to regress per-image pointmaps directly in a shared global coordinate system, removing the need for optimization-based alignment after per-pair predictions.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2309.02427","ref_index":83,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Cognitive Architectures for Language Agents","primary_cat":"cs.AI","submitted_at":"2023-09-05T17:56:20+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":6.0,"formal_verification":"none","one_line_summary":"CoALA is a modular cognitive architecture for language agents that organizes memory components, action spaces for internal and external interaction, and a generalized decision-making loop to support more systematic development of capable agents.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2201.02177","ref_index":16,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets","primary_cat":"cs.LG","submitted_at":"2022-01-06T18:43:37+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":8.0,"formal_verification":"none","one_line_summary":"Neural networks exhibit grokking on small algorithmic datasets, achieving perfect generalization well after overfitting.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2005.11401","ref_index":68,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks","primary_cat":"cs.CL","submitted_at":"2020-05-22T21:34:34+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":7.0,"formal_verification":"none","one_line_summary":"RAG models set new state-of-the-art results on open-domain QA by retrieving Wikipedia passages and conditioning a generative model on them, while also producing more factual text than parametric baselines.","context_count":1,"top_context_role":"baseline","top_context_polarity":"baseline","context_text":"See Appendix D for further details. Model NQ TQA WQ CT Closed Book T5-11B [52] 34.5 - /50.1 37.4 - T5-11B+SSM[52] 36.6 - /60.5 44.7 - Open Book REALM [20] 40.4 - / - 40.7 46.8 DPR [26] 41.5 57.9/ - 41.1 50.6 RAG-Token 44.1 55.2/66.1 45.5 50.0 RAG-Seq. 44.5 56.8/68.0 45.2 52.2 Table 2: Generation and classiﬁcation Test Scores. MS-MARCO SotA is [4], FEVER-3 is [68] and FEVER-2 is [ 57] *Uses gold context/evidence. Best model without gold access underlined. Model Jeopardy MSMARCO FVR3 FVR2 B-1 QB-1 R-L B-1 Label Acc. SotA - - 49.8* 49.9* 76.8 92.2 * BART 15.1 19.7 38.2 41.6 64.0 81.1 RAG-Tok. 17.3 22.2 40.1 41.5 72.5 89.5RAG-Seq. 14.7 21.4 40.8 44.2 to more effective marginalization over documents. Furthermore, RAG can generate correct answers"},{"citing_arxiv_id":"2002.08909","ref_index":18,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"REALM: Retrieval-Augmented Language Model Pre-Training","primary_cat":"cs.CL","submitted_at":"2020-02-10T18:40:59+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":8.0,"formal_verification":"none","one_line_summary":"REALM augments language-model pre-training with an unsupervised retriever over Wikipedia documents and reports 4-16% absolute gains on open-domain QA benchmarks over prior implicit and explicit knowledge methods.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2001.04451","ref_index":21,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Reformer: The Efficient Transformer","primary_cat":"cs.LG","submitted_at":"2020-01-13T18:38:28+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":8.0,"formal_verification":"none","one_line_summary":"Reformer matches standard Transformer accuracy on long sequences while using far less memory and running faster via LSH attention and reversible residual layers.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1911.05507","ref_index":131,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Compressive Transformers for Long-Range Sequence Modelling","primary_cat":"cs.LG","submitted_at":"2019-11-13T14:36:01+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Compressive Transformer sets new records on WikiText-103 (17.1 ppl) and Enwik8 (0.97 bpc) via memory compression and introduces the PG-19 long-range language benchmark.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1907.03202","ref_index":15,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Evolutionary Algorithm for Sinhala to English Translation","primary_cat":"cs.CL","submitted_at":"2019-07-06T22:51:28+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":2.0,"formal_verification":"none","one_line_summary":"An evolutionary algorithm identifies meanings in Sinhala sentences to produce English translations that are then grammatically corrected, reported to yield accurate results.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1907.01686","ref_index":64,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Machine Reading Comprehension: a Literature Review","primary_cat":"cs.CL","submitted_at":"2019-06-30T09:18:31+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":1.0,"formal_verification":"none","one_line_summary":"A 2019 survey of machine reading comprehension corpora and methods.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1710.10903","ref_index":19,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Graph Attention Networks","primary_cat":"stat.ML","submitted_at":"2017-10-30T12:41:12+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Graph Attention Networks compute learnable attention coefficients over node neighborhoods to produce weighted feature aggregations, achieving state-of-the-art results on citation networks and inductive protein-protein interaction graphs.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}