{"total":49,"items":[{"citing_arxiv_id":"2605.28603","ref_index":12,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Online Irregular Multivariate Time Series Forecasting via Uncertainty-Driven Dual-Expert Calibration","primary_cat":"cs.LG","submitted_at":"2026-05-27T15:19:41+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Under-Cali is an uncertainty-driven dual-expert calibration framework for online adaptation in irregular multivariate time series forecasting that freezes the base model.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.27686","ref_index":19,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Tensor Memory: Fixed-Size Recurrent State for Long-Horizon Transformers","primary_cat":"cs.CV","submitted_at":"2026-05-26T21:03:42+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Tensor Memory augments Transformers with a constant-size 3D voxel grid using differentiable soft writes at predicted locations, local interaction, and gated recurrent dynamics to decouple memory capacity from sequence length.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.26562","ref_index":28,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Beyond Holistic Models: Systematic Component-level Benchmarking of Deep Multivariate Time-Series Forecasting","primary_cat":"cs.LG","submitted_at":"2026-05-26T05:18:38+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"TSCOMP is the first large-scale benchmark that deconstructs deep multivariate time series forecasters into fine-grained components, builds a corpus of over 20,000 evaluations, and shows that corpus-driven component selection outperforms state-of-the-art holistic models.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.25655","ref_index":27,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Bandwidth-Aware LLM Inference on Heterogeneous Many-Core Supercomputers","primary_cat":"cs.DC","submitted_at":"2026-05-25T10:03:25+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"THInfer achieves 62-84% higher throughput than GPU baselines for Llama 7B-30B models on MT-3000 through bandwidth-focused co-design, and runs 70B models where GPU frameworks fail.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.17276","ref_index":12,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"How Do Electrocardiogram Models Scale?","primary_cat":"cs.LG","submitted_at":"2026-05-17T05:53:35+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Empirical scaling study of ECG models finds SSL scales robustly while ResNets show 1.3-2.5x better parameter efficiency and SSL up to 16x better data efficiency than supervised baselines on out-of-distribution tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.15433","ref_index":8,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Spectral Priors vs. Attention: Investigating the Utility of Attention Mechanisms in EEG-Based Diagnosis","primary_cat":"cs.LG","submitted_at":"2026-05-14T21:26:07+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.12491","ref_index":60,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Elastic Attention Cores for Scalable Vision Transformers","primary_cat":"cs.CV","submitted_at":"2026-05-12T17:59:26+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"VECA learns effective visual representations using core-periphery attention where patches interact exclusively via a resolution-invariant set of learned core embeddings, achieving linear O(N) complexity while maintaining competitive performance.","context_count":1,"top_context_role":"background","top_context_polarity":"unclear","context_text":"model is an elasticvisual backbonethat is competitive with state-of-the-art vision foundation models across bothclassificationanddense tasks. Efficient Attention.To mitigate the quadratic time cost of self-attention, some models replace softmax with approximations [52, 53, 54, 55, 56, 57] or low-rank factorization [58, 59], while others use grouping, or sliding windows [ 60, 61, 62, 63, 64] or completely abandon attention in favor of unparameterized transforms [4, 65]. Fixed latent bottlenecks in Set Transformers and Perceiver [66, 67, 68, 69], fixed size nested softmax [70], and multimodal fusion [71] have also been used. Recent work has shifted towards linear RNNs for sequences [72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84]."},{"citing_arxiv_id":"2605.09498","ref_index":24,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Spectral Transformer Neural Processes","primary_cat":"cs.LG","submitted_at":"2026-05-10T12:17:29+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"STNPs extend TNPs with a spectral aggregator that estimates context spectra, forms spectral mixtures, and injects task-adaptive frequency features to better handle periodicity.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"the Describable Textures Dataset (DTD) [9], and construct each task from a processed 64×64 sub- sampled crop of an original image. For each batch, the number of context pixels is sampled uniformly as M∼ U[5,1024) and is shared across all tasks in the batch, while all remaining pixels are treated as query points. Additional experimental details are provided in Appendix A.8. Following [ 24], TETNP cannot be trained on this benchmark due to out-of-memory failures even with 8 A100 GPUs. Table 2 summarises the quantitative results (PSNR and SSIM are computed over the full 64×64 im- age), and Figure 3 shows representative qualita- tive completions. STNP yields sharper and more structurally faithful completions than TNP and SConvCNP, particularly for repeated textures such"},{"citing_arxiv_id":"2605.08587","ref_index":19,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Kaczmarz Linear Attention","primary_cat":"cs.LG","submitted_at":"2026-05-09T01:07:01+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Kaczmarz Linear Attention replaces the empirical coefficient in Gated DeltaNet with a key-norm-normalized step size derived from the online regression objective, yielding lower perplexity and better needle-in-haystack performance.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"PMLR, 13-18 Jul 2020. URLhttps://proceedings.mlr.press/v119/katharopoulos20a.html. [17] Tobias Katsch. Gateloop: Fully data-controlled linear recurrence for sequence modeling, 2024. URLhttps://arxiv.org/abs/2311.01927. [18] Nikita Kitaev, Łukasz Kaiser, and Anselm Levskaya. Reformer: The efficient transformer, 2020. URLhttps://arxiv.org/abs/2001.04451. [19] Opher Lieber, Barak Lenz, Hofit Bata, Gal Cohen, Jhonathan Osin, Itay Dalmedigos, Erez Safahi, Shaked Meirom, Yonatan Belinkov, Shai Shalev-Shwartz, Omri Abend, Raz Alon, Tomer Asida, Amir Bergman, Roman Glozman, Michael Gokhman, Avashalom Manevich, Nir Ratner, Noam Rozen, Erez Shwartz, Mor Zusman, and Yoav Shoham. Jamba: A hybrid transformer-mamba language model, 2024."},{"citing_arxiv_id":"2605.07160","ref_index":58,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"TENNOR: Trustworthy Execution for Neural Networks through Obliviousness and Retrievals","primary_cat":"cs.CR","submitted_at":"2026-05-08T02:46:24+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"TENNOR enables efficient private training of wide neural networks in TEEs by recasting sparsification as doubly oblivious LSH retrievals and introducing MP-WTA to cut hash table memory by 50x while preserving accuracy.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Kornaropoulos, Charalampos Papamanthou, and Roberto Tamassia. 2019. Data Recovery on Encrypted Databases With 𝑘-Nearest Neighbor Query Leakage. InProc. of the 40th IEEE S&P. [57] Evgenios M. Kornaropoulos, Charalampos Papamanthou, and Roberto Tamassia. 2020. The State of the Uniform: Attacks on Encrypted Databases Beyond the Uniform Query Distribution. InProc. of the 41th IEEE S&P. [58] Evgenios M. Kornaropoulos, Charalampos Papamanthou, and Roberto Tamassia. 2021. Response-Hiding Encrypted Ranges: Revisiting Security via Parametrized Leakage-Abuse Attacks. InProc. of the 42nd IEEE S&P. [59] Dayeol Lee, Dongha Jung, Ian T Fang, Chia-Che Tsai, and Raluca Ada Popa. 2020. An {Off-Chip} attack on hardware enclaves via the memory bus."},{"citing_arxiv_id":"2605.05066","ref_index":16,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"The Impossibility Triangle of Long-Context Modeling","primary_cat":"cs.CL","submitted_at":"2026-05-06T16:01:43+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"No model can achieve efficiency, compactness, and recall capacity scaling with sequence length at once, as any two imply a strict bound of O(poly(d)/log V) on recallable facts.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"dinteger Model dimension (embedding width) Def. 5 Tinteger Sequence length Eq. (1) Ninteger SSM state dimension (architecture-dependent) Sec. 3.1 binteger Floating-point precision (bits per component) Thm. 14 Lreal≥0 Lipschitz constant ofδAx. 3 dh integer Attention head dimension (d h =d/n heads) Sec. 7.1 dk, dv integer Key and value dimensions in unified recurrence Eq. (16) minteger Feature dimension in kernel approximation App. B.2 dc integer Latent compression dimension (MLA) App. B.1 Architectural quantities nlayers integer Total number of layers Prop. 17 6 Impossibility Triangle of Long-Context Modeling Table 1: Summary of notation (continued). Symbol Type Meaning Ref. nheads integer Number of attention heads per layer Sec."},{"citing_arxiv_id":"2605.02568","ref_index":17,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"StreamIndex: Memory-Bounded Compressed Sparse Attention via Streaming Top-k","primary_cat":"cs.LG","submitted_at":"2026-05-04T13:19:29+00:00","verdict":"ACCEPT","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Chunked streaming top-k enables CSA indexer execution at 1M sequence length with 6.21 GB peak memory and >=0.998 recall on synthetic V4-shaped inputs.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.02184","ref_index":36,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"RAFNet: Region-Aware Fusion Network for Pansharpening","primary_cat":"cs.CV","submitted_at":"2026-05-04T03:26:57+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"RAFNet uses wavelet-based directional separation, K-means regional clustering, and clustered sparse attention to create adaptive kernels and efficient frequency aggregation, outperforming prior pansharpening networks on benchmark datasets.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.00130","ref_index":29,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Learning Fingerprints for Medical Time Series with Redundancy-Constrained Information Maximization","primary_cat":"cs.LG","submitted_at":"2026-04-30T18:33:40+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"A self-supervised method learns a fixed set of disentangled fingerprint tokens from medical time series by combining reconstruction loss with a total coding rate diversity penalty, framed as a disentangled rate-distortion problem.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.06683","ref_index":63,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Toeplitz MLP Mixers are Low Complexity, Information-Rich Sequence Models","primary_cat":"cs.LG","submitted_at":"2026-04-24T20:37:21+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Toeplitz MLP Mixers replace attention with masked Toeplitz multiplications for sub-quadratic complexity while retaining more sequence information and outperforming on copying and in-context tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.22442","ref_index":23,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"HubRouter: A Pluggable Sub-Quadratic Routing Primitive for Hybrid Sequence Models","primary_cat":"cs.LG","submitted_at":"2026-04-24T10:59:30+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"HubRouter is a sub-quadratic routing primitive using learned hubs that replaces attention layers in hybrid models while delivering competitive perplexity and large throughput gains.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"tokenjthrough their shared hub fingerprints (F i·Fj), achievingO(nM) rather thanO(n 2). Intuitively, theM hubs act as a low-rank bottleneck through which pairwise comparisons are routed; this connects HubRouter conceptually to landmark-attention methods (Linformer [21], Nystr¨ omformer [22], Set Transformers [18]) and to clustering-based routing (Routing Transformer [24], Reformer [23]). Whether theM-hub bottleneck achieves a provable low-rank approximation guarantee in the sense of [21] is left to future work. What is new relative to closest prior art.HubRouter shares thelandmark-bottleneckintuition with Perceiver, Set Transformers, Linformer, and Nystr¨ omformer, and thecontent-based selectionintuition with Routing Transformer and Reformer."},{"citing_arxiv_id":"2604.21085","ref_index":41,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"climt-paraformer: Stable Emulation of Convective Parameterization using a Temporal Memory-aware Transformer","primary_cat":"physics.ao-ph","submitted_at":"2026-04-22T20:55:35+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"A temporal memory-aware Transformer emulator for the Emanuel convective parameterization shows lower offline errors and 10-year stability in single-column model tests compared to memory-less MLP and LSTM baselines.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.20819","ref_index":13,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Stream-CQSA: Avoiding Out-of-Memory in Attention Computation via Flexible Workload Scheduling","primary_cat":"cs.LG","submitted_at":"2026-04-22T17:46:09+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Stream-CQSA uses CQS-based decomposition to stream exact attention computations for billion-token sequences on limited-memory hardware.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.19351","ref_index":29,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"DASH-KV: Accelerating Long-Context LLM Inference via Asymmetric KV Cache Hashing","primary_cat":"cs.CL","submitted_at":"2026-04-21T11:33:24+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"DASH-KV accelerates long-context LLM inference to linear complexity via asymmetric KV cache hashing and mixed-precision retention, matching full attention performance on LongBench.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.17278","ref_index":19,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"PestVL-Net: Enabling Multimodal Pest Learning via Fine-grained Vision-Language Interaction","primary_cat":"cs.CV","submitted_at":"2026-04-19T06:17:51+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"PestVL-Net combines an RWKV visual backbone with saliency-guided window partitioning and MLLM-derived linguistic priors via multimodal chain-of-thought to enable fine-grained multimodal pest recognition on dedicated datasets.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.16859","ref_index":7,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"GAMMA-Net: Adaptive Long-Horizon Traffic Spatio-Temporal Forecasting Model based on Interleaved Graph Attention and Multi-Axis Mamba","primary_cat":"cs.AI","submitted_at":"2026-04-18T06:14:34+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"GAMMA-Net combines Graph Attention Networks and multi-axis Mamba to outperform prior models in long-horizon traffic forecasting, with up to 16.25% lower MAE on benchmarks like METR-LA and PEMS datasets.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.05214","ref_index":32,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"MedMamba: Recasting Mamba for Medical Time Series Classification","primary_cat":"eess.SP","submitted_at":"2026-04-17T01:20:02+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"MedMamba introduces a principle-guided bidirectional multi-scale Mamba model that outperforms prior methods on EEG, ECG, and activity classification benchmarks while delivering 4.6x inference speedup.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.05431","ref_index":13,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Cross-Stage Attention Propagation for Efficient Semantic Segmentation","primary_cat":"cs.CV","submitted_at":"2026-04-07T04:55:56+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"CSAP computes attention at the deepest scale and propagates the maps to shallower stages, bypassing per-scale query-key computations to cut decoder FLOPs while preserving multi-scale performance and beating SegNeXt-Tiny on ADE20K, Cityscapes, and COCO-Stuff.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.03446","ref_index":39,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Fast Cross-Operator Optimization of Attention Dataflow","primary_cat":"cs.AR","submitted_at":"2026-04-03T20:37:58+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"MMEE encodes dataflow decisions in matrix form for fast exhaustive search, delivering 40-69% lower latency and energy use than prior methods while running 64-343x faster.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Attention mechanisms play a central role in transformer- based models, which are prevalent across various application domains, including natural language processing [18], [22], [61], computer vision [23], [47], and image generation [54], [84]. As models seek to capture correlations across longer contexts, sequence lengths continue to increase [7], [39], [72]. However, the success of attention-based models comes with substantial memory and compute overhead, as the com- putational complexity of attention scales quadratically with sequence length during prefill and training stages [21], [38], [67], [82]. To address these challenges, numerous techniques have been proposed to improve the efficiency of attention"},{"citing_arxiv_id":"2603.29002","ref_index":12,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Understand and Accelerate Memory Processing Pipeline for Large Language Model Inference","primary_cat":"cs.DC","submitted_at":"2026-03-30T21:03:39+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2602.08064","ref_index":14,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"SiameseNorm: Breaking the Barrier to Reconciling Pre/Post-Norm","primary_cat":"cs.LG","submitted_at":"2026-02-08T17:17:56+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SiameseNorm is a two-stream architecture that reconciles Pre-Norm and Post-Norm in Transformers by coupling streams via shared residual blocks, yielding performance gains with maintained stability on language, vision, and diffusion models.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2601.16027","ref_index":18,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Deja Vu in Plots: Leveraging Cross-Session Evidence with Retrieval-Augmented LLMs for Live Streaming Risk Assessment","primary_cat":"cs.AI","submitted_at":"2026-01-22T14:55:51+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"CS-VAR uses an LLM to reason over cross-session behavioral evidence and transfer insights to a small model for efficient, structured live streaming risk assessment with claimed SOTA results.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2512.13956","ref_index":7,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"AOI: Context-Aware Multi-Agent Operations via Dynamic Scheduling and Hierarchical Memory Compression","primary_cat":"cs.MA","submitted_at":"2025-12-15T23:22:02+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"AOI is a multi-agent system that dynamically schedules operations and compresses context hierarchically to achieve 72% compression while preserving 93% critical information and cutting repair times by 34%.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2512.12602","ref_index":14,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Exact Flow Linear Attention: Exact Solution from Continuous-Time Dynamics","primary_cat":"cs.LG","submitted_at":"2025-12-14T08:51:02+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Exact Flow Linear Attention derives a closed-form exact update for delta-rule linear attention from continuous-time dynamics, removing Euler discretization error while preserving linear complexity and structure.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2512.03563","ref_index":20,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"State Space Models for Bioacoustics: A Comparative Evaluation with Transformers","primary_cat":"cs.SD","submitted_at":"2025-12-03T08:37:09+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"BioMamba matches Transformer performance on bioacoustics tasks while using significantly less VRAM.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2511.21016","ref_index":30,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Gated KalmaNet: A Fading Memory Layer Through Test-Time Ridge Regression","primary_cat":"cs.LG","submitted_at":"2025-11-26T03:26:37+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Gated KalmaNet uses exact Kalman gain computation with adaptive gating and Chebyshev iteration to improve SSM performance on long-context tasks over prior approximations like DeltaNet.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2511.03092","ref_index":13,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"SnapStream: Efficient Long Sequence Decoding on Dataflow Accelerators","primary_cat":"cs.AI","submitted_at":"2025-11-05T00:38:31+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"SnapStream deploys sparse KV attention in a production inference system on dataflow accelerators, delivering 4x on-chip memory savings for DeepSeek-671B at 128k context with up to 1832 tokens/sec and minimal accuracy loss on LongBench-v2, AIME24, and LiveCodeBench.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2510.26692","ref_index":52,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Kimi Linear: An Expressive, Efficient Attention Architecture","primary_cat":"cs.CL","submitted_at":"2025-10-30T16:59:43+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Kimi Linear hybridizes linear attention with a new KDA module to beat full attention on tasks while slashing KV cache by 75% and speeding decoding up to 6x.","context_count":1,"top_context_role":"dataset","top_context_polarity":"use_dataset","context_text":"• Language Understanding and Reasoning: Hellaswag [121], ARC-Challenge [14], Winogrande [83], MMLU [36], TriviaQA [47], MMLU-Redux [26], MMLU-Pro [103], GPQA-Diamond [82], BBH [94], and [105]. •Code Generation: LiveCodeBench v6 4[44], EvalPlus [60]. •Math & Reasoning: AIME 2025, MATH 500, HMMT 2025, PolyMath-en. • Long-context: MRCR 5 , RULER [38], Frames [52], HELMET-ICL [118], RepoQA [61], Long Code Arena [13] and LongBench v2 [6]. •Chinese Language Understanding and Reasoning: C-Eval [43], and CMMLU [55]. Evaluation ConfigurationsAll models are evaluated using temperature 1.0. For benchmarks with high variance, we report the score of Avg@k. For base model, We employ perplexity-based evaluation for MMLU, MMLU-Redux,"},{"citing_arxiv_id":"2510.23641","ref_index":13,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Spatially Aware Linear Transformer (SAL-T) for Particle Jet Tagging","primary_cat":"cs.LG","submitted_at":"2025-10-24T18:00:01+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"SAL-T enhances the linformer with spatially aware kinematic partitioning and convolutions to match full-attention transformer performance on jet tagging while keeping linear complexity and lower latency.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2510.18830","ref_index":64,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"MTraining: Distributed Dynamic Sparse Attention for Efficient Ultra-Long Context Training","primary_cat":"cs.CL","submitted_at":"2025-10-21T17:25:32+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"MTraining scales LLM training to 512K-token contexts on 32 A100 GPUs by integrating dynamic sparse training patterns with balanced and hierarchical sparse ring attention, achieving up to 6x throughput gains without accuracy loss on long-context benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2502.14644","ref_index":4,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"LIFT: A Novel Framework for Enhancing Long-Context Understanding of LLMs via Long Input Fine-Tuning","primary_cat":"cs.CL","submitted_at":"2025-02-20T15:32:24+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2502.13189","ref_index":15,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"MoBA: Mixture of Block Attention for Long-Context LLMs","primary_cat":"cs.LG","submitted_at":"2025-02-18T14:06:05+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"MoBA routes attention over blocks via MoE-style gating to enable dynamic, bias-light long-context attention that matches full attention performance at lower cost.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2502.12370","ref_index":41,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Positional Encoding in Transformer-Based Time Series Models: A Survey","primary_cat":"cs.LG","submitted_at":"2025-02-17T23:21:42+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"A survey of positional encoding methods in transformer-based time series models that evaluates fixed, learnable, relative, and hybrid approaches on classification tasks and links effectiveness to data characteristics.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2410.10813","ref_index":76,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory","primary_cat":"cs.CL","submitted_at":"2024-10-14T17:59:44+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"LongMemEval benchmarks long-term memory in chat assistants, revealing 30% accuracy drops across sustained interactions and proposing indexing-retrieval-reading optimizations that boost performance.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2404.14294","ref_index":159,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"A Survey on Efficient Inference for Large Language Models","primary_cat":"cs.CL","submitted_at":"2024-04-22T15:53:08+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":3.0,"formal_verification":"none","one_line_summary":"The paper surveys techniques to speed up and reduce the resource needs of LLM inference, organized by data-level, model-level, and system-level changes, with comparative experiments on representative methods.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Structure Factorization LoRD [143], TensorGPT [144], LoSparse [145], LPLR [146], ZeroQuant-V2 [147], DS- Former [148], ASVD [149] Sparsification Sparse Attention Sparse Transformer [150], StreamingLLM [151], Longformer [152], Bigbird [153], Structured Sparse Attention [154], SemSA [155], Spat- ten [156], SeqBoat [157], Adaptively Sparse Attention [158], Reformer [159], Sparse Flash Attention [160], Routing Transformer [161], Sparse Sinkhorn Attention [162], H 2O [163], Diffuser [164] Weight Pruning SparseGPT [165], Wanda [166], ISC [167], Prune and Tune [168], OWL [169], BESA [170], oBERT [171], FastPruning [172], RIA [173], LLM-Pruner [174], Sheared LLaMA [175], ZipLM [176], LoRAPrune [177], LoRAS- hear [178], SliceGPT [179], PLATON [180],"},{"citing_arxiv_id":"2312.06635","ref_index":44,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Gated Linear Attention Transformers with Hardware-Efficient Training","primary_cat":"cs.LG","submitted_at":"2023-12-11T18:51:59+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Gated linear attention Transformers achieve competitive language modeling results with linear-time inference, superior length generalization, and higher training throughput than Mamba.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2310.08560","ref_index":12,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"MemGPT: Towards LLMs as Operating Systems","primary_cat":"cs.AI","submitted_at":"2023-10-12T17:51:32+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"MemGPT uses OS-inspired virtual context management to extend LLM context windows for large document analysis and long-term multi-session chat.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"fun my bf james baked me a birthday cake How was your day today? February 7 working_context.append(\"Boyfriend named James\") System Alert: Memory Pressure yeah we went to six ﬂags! Did you do anything else to celebrate your birthday? 😊 February 7 Showing 3 of 3 results (page 1/1): [01/24/2024] \"lol yeah six ﬂags\", [01/14/2024] \"i love six ﬂags been like 100 times\", [10/12/2023] \"james and I actually ﬁrst met at six ﬂags\" Did you go with James? It's so cute how both met there! February 14 recall_storage.search(\"six flags\") Figure 1. MemGPT (left) writes data to persistent memory after it receives a system alert about limited context space. with virtual memory , which provides an illusion of there being more memory resources than are actually available"},{"citing_arxiv_id":"2306.14048","ref_index":7,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"H$_2$O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models","primary_cat":"cs.LG","submitted_at":"2023-06-24T20:11:14+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"H2O evicts non-heavy-hitter tokens from the KV cache using a dynamic submodular policy, retaining recent and frequent-co-occurrence tokens to reduce memory while preserving accuracy.","context_count":1,"top_context_role":"method","top_context_polarity":"background","context_text":"Figure 1: Upper plots illustrate symbolic plots of an attention map deploying different KV cache policies in LLM generation. Lower right: contrasts their accuracy-memory trade-off. Left: the overview of H2O framework. While there exists substantial literature on sparse attention approximation in training, they have not seen wide adoption for alleviating KV cache bottleneck. First, most existing methods, e.g., Reformer [7] and Flash Attention [8], are designed to overcome the quadratic memory required by attention mechanisms when modeling long sequences but still require a large cache size. Second, variants like sparse transformer [9], low-rank based transformers [10, 11] or multi-query attention [12, 13, 5] can reduce the cache size, but directly applying them on pre-trained LLMs for generation"},{"citing_arxiv_id":"2305.13048","ref_index":9,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"RWKV: Reinventing RNNs for the Transformer Era","primary_cat":"cs.CL","submitted_at":"2023-05-22T13:57:41+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"RWKV uses a linear attention mechanism to deliver Transformer-level performance with RNN-style inference efficiency, demonstrated at up to 14 billion parameters.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2204.02311","ref_index":70,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"PaLM: Scaling Language Modeling with Pathways","primary_cat":"cs.CL","submitted_at":"2022-04-05T16:11:45+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":6.0,"formal_verification":"none","one_line_summary":"PaLM 540B demonstrates continued scaling benefits by setting new few-shot SOTA results on hundreds of benchmarks and outperforming humans on BIG-bench.","context_count":1,"top_context_role":"baseline","top_context_polarity":"baseline","context_text":"Prior SOTA PaLM 540B TriviaQA (EM) 71 .3a 76.9 75.8a 81.4 75.8a (1) 81.4 (1) Natural Questions (EM) 24.7a 21.2 26 .3a 29.3 32.5a (1) 39.6 (64) Web Questions (EM) 19.0a 10.6 25.3b 22.6 41 .1b (64) 43.5 (64) Lambada (EM) 77 .7f 77.9 80.9a 81.8 87.2c (15) 89.7 (8) HellaSwag 80 .8f 83.4 80.2c 83.6 82.4c (20) 83.8 (5) StoryCloze 83 .2b 84.6 84.7b 86.1 87.7b (70) 89.0 (5) Winograd 88 .3b 90.1 89.7 b 87.5 88 .6a (2) 89.4 (5) Winogrande 74 .9f 81.1 73.7c 83.7 79.2a (16) 85.1 (5) Drop (F1) 57 .3a 69.4 57.8a 70.8 58.6a (2) 70.8 (1) CoQA (F1) 81.5b 77.6 84.0b 79.9 85.0b (5) 81.5 (5) QuAC (F1) 41 .5b 45.2 43.4b 47.7 44.3b (5) 47.7 (1) SQuADv2 (F1) 71 .1a 80.8 71.8a 82.9 71.8a (10) 83.3 (5) SQuADv2 (EM) 64 .7a 75."},{"citing_arxiv_id":"2202.08906","ref_index":66,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"ST-MoE: Designing Stable and Transferable Sparse Expert Models","primary_cat":"cs.CL","submitted_at":"2022-02-17T21:39:10+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"ST-MoE introduces stability techniques for sparse expert models, allowing a 269B-parameter model to achieve state-of-the-art transfer learning results across reasoning, summarization, and QA tasks at the compute cost of a 32B dense model.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2101.03961","ref_index":18,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity","primary_cat":"cs.LG","submitted_at":"2021-01-11T16:11:52+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Switch Transformers use top-1 expert routing in a Mixture of Experts setup to scale to trillion-parameter language models with constant compute and up to 4x speedup over T5-XXL.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2008.02275","ref_index":16,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Aligning AI With Shared Human Values","primary_cat":"cs.CY","submitted_at":"2020-08-05T17:59:16+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Introduces ETHICS benchmark showing current language models have promising but incomplete ability to predict basic human ethical judgments on text scenarios.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1910.03771","ref_index":161,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"HuggingFace's Transformers: State-of-the-art Natural Language Processing","primary_cat":"cs.CL","submitted_at":"2019-10-09T03:23:22+00:00","verdict":"ACCEPT","verdict_confidence":"HIGH","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Hugging Face releases an open-source Python library that supplies a unified API and pretrained weights for major Transformer architectures used in natural language processing.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}