RULER shows most long-context LMs drop sharply in performance on complex tasks as length and difficulty increase, with only half maintaining results at 32K tokens.
Can long-context lan- guage models subsume retrieval, rag, sql, and more?
8 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
Proposes the Intelligent Computing Architecture (ICA) as a six-layer framework with dual probabilistic-deterministic planes and three Amdahl-style heuristics to unify design of LLM-based systems.
A memory-efficient SMC clustering method decomposes problems into approximately independent subproblems to handle large-scale online clustering with complex distributions.
ATLAS is a length-dependent benchmarking framework that evaluates 26 models on 8 capability dimensions and shows substantial rank changes when moving from 128K to 1M token ranges.
Current LLMs remain robust to high levels of inference-time context sparsity across diverse tasks, enabling up to 10x acceleration via sparse kernels.
LLMs recover interpretable topic structures via attention and achieve competitive topic modeling performance as long-context generators.
Presents open-source 7B models for million-token video and language understanding via Blockwise RingAttention, setting new benchmarks in retrieval and long video tasks.
Gemini 2.5 Pro and Flash models are presented as achieving frontier performance in reasoning, coding, and long-context multimodal tasks while spanning a cost-capability Pareto curve.
citing papers explorer
-
RULER: What's the Real Context Size of Your Long-Context Language Models?
RULER shows most long-context LMs drop sharply in performance on complex tasks as length and difficulty increase, with only half maintaining results at 32K tokens.
-
Model-Native Computing Architecture: Envisioning Future System Architecture Through the Lens of Computer Architecture
Proposes the Intelligent Computing Architecture (ICA) as a six-layer framework with dual probabilistic-deterministic planes and three Amdahl-style heuristics to unify design of LLM-based systems.
-
Scalable Model-Based Clustering with Sequential Monte Carlo
A memory-efficient SMC clustering method decomposes problems into approximately independent subproblems to handle large-scale online clustering with complex distributions.
-
ATLAS: All-round Testing of Long-context Abilities across Scales
ATLAS is a length-dependent benchmarking framework that evaluates 26 models on 8 capability dimensions and shows substantial rank changes when moving from 128K to 1M token ranges.
-
Inference Time Context Sparsity: Illusion or Opportunity?
Current LLMs remain robust to high levels of inference-time context sparsity across diverse tasks, enabling up to 10x acceleration via sparse kernels.
-
LLM as Attention-Informed NTM and Topic Modeling as long-input Generation: Interpretability and long-Context Capability
LLMs recover interpretable topic structures via attention and achieve competitive topic modeling performance as long-context generators.
-
World Model on Million-Length Video And Language With Blockwise RingAttention
Presents open-source 7B models for million-token video and language understanding via Blockwise RingAttention, setting new benchmarks in retrieval and long video tasks.
-
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities
Gemini 2.5 Pro and Flash models are presented as achieving frontier performance in reasoning, coding, and long-context multimodal tasks while spanning a cost-capability Pareto curve.