hub

Rossi, Seunghyun Yoon, and Hinrich Schütze

Modarressi, Ali, Deilamsalehy, Hanieh, Dernoncourt, Franck, Bui, Trung, Rossi, Ryan A · 2025 · arXiv 2502.05167

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

read on arXiv browse 11 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Positional Failures in Long-Context LLMs: A Blind Spot in Reasoning Benchmarks

cs.CL · 2026-05-22 · conditional · novelty 7.0

Audits reveal no reasoning benchmark controls position/filler/length jointly; CRE shows LLMs drop up to 88pp on middle-position tasks at 64K context, with diagnostic probe supporting positional cause.

Reasoning over Video: Evaluating How MLLMs Extract, Integrate, and Reconstruct Spatiotemporal Evidence

cs.CV · 2026-03-13 · unverdicted · novelty 7.0

VAEX-BENCH shows state-of-the-art MLLMs perform substantially worse on abstractive spatiotemporal reasoning tasks than on matched extractive tasks in video understanding.

Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions

cs.CL · 2025-07-07 · unverdicted · novelty 7.0

MemoryAgentBench is a new multi-turn benchmark assessing four memory competencies in LLM agents—accurate retrieval, test-time learning, long-range understanding, and selective forgetting—showing that existing methods fall short.

Parallel Context Compaction for Long-Horizon LLM Agent Serving

cs.AI · 2026-05-22 · unverdicted · novelty 6.0

Parallel compaction for LLM agent context management provides predictable volume control and reduces wall time versus sequential baselines on HotpotQA and LoCoMo.

Where Does Long-Context Supervision Actually Go? Effective-Context Exposure Balancing

cs.CL · 2026-05-11 · conditional · novelty 6.0

EXACT re-allocates training supervision by inverse frequency of long effective-context targets, improving NoLiMa and RULER scores by 5-18 points on Qwen and LLaMA models without degrading standard QA or reasoning.

Slipstream: Trajectory-Grounded Compaction Validation for Long-Horizon Agents

cs.MA · 2026-05-09 · unverdicted · novelty 6.0

Slipstream uses asynchronous compaction with trajectory-grounded judge validation to improve long-horizon agent accuracy by up to 8.8 percentage points and reduce latency by up to 39.7%.

Retrieval from Within: An Intrinsic Capability of Attention-Based Models

cs.LG · 2026-05-07 · unverdicted · novelty 6.0 · 2 refs

Attention-based models can retrieve evidence intrinsically by using decoder attention to score and reuse their own pre-encoded chunks, outperforming separate retrieval pipelines on QA benchmarks.

Tool Attention Is All You Need: Dynamic Tool Gating and Lazy Schema Loading for Eliminating the MCP/Tools Tax in Scalable Agentic Workflows

cs.AI · 2026-04-23 · unverdicted · novelty 5.0

Tool Attention cuts tool-related tokens by 95% and raises context utilization from 24% to 91% in a 120-tool simulation via dynamic gating and lazy loading.

Context Collapse: Barriers to Adoption for Generative AI in Workplace Settings

cs.CY · 2026-04-06 · unverdicted · novelty 5.0

Expert interviews demonstrate that context in generative AI workplace use collapses or rots over time, limiting tool effectiveness and revealing pitfalls in computational context approaches.

MemOCR: Layout-Aware Visual Memory for Efficient Long-Horizon Reasoning

cs.AI · 2026-01-29 · unverdicted · novelty 5.0

MemOCR renders structured memory as images with adaptive visual density to improve long-horizon reasoning under tight context budgets.

MiMo-V2-Flash Technical Report

cs.CL · 2026-01-06 · unverdicted · novelty 5.0

MiMo-V2-Flash is a 309B/15B MoE model trained on 27T tokens with hybrid attention and multi-teacher on-policy distillation that matches larger models like DeepSeek-V3.2 while enabling 2.6x faster decoding via repurposed MTP layers.

citing papers explorer

Showing 11 of 11 citing papers.

Positional Failures in Long-Context LLMs: A Blind Spot in Reasoning Benchmarks cs.CL · 2026-05-22 · conditional · none · ref 37
Audits reveal no reasoning benchmark controls position/filler/length jointly; CRE shows LLMs drop up to 88pp on middle-position tasks at 64K context, with diagnostic probe supporting positional cause.
Reasoning over Video: Evaluating How MLLMs Extract, Integrate, and Reconstruct Spatiotemporal Evidence cs.CV · 2026-03-13 · unverdicted · none · ref 13
VAEX-BENCH shows state-of-the-art MLLMs perform substantially worse on abstractive spatiotemporal reasoning tasks than on matched extractive tasks in video understanding.
Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions cs.CL · 2025-07-07 · unverdicted · none · ref 28
MemoryAgentBench is a new multi-turn benchmark assessing four memory competencies in LLM agents—accurate retrieval, test-time learning, long-range understanding, and selective forgetting—showing that existing methods fall short.
Parallel Context Compaction for Long-Horizon LLM Agent Serving cs.AI · 2026-05-22 · unverdicted · none · ref 17
Parallel compaction for LLM agent context management provides predictable volume control and reduces wall time versus sequential baselines on HotpotQA and LoCoMo.
Where Does Long-Context Supervision Actually Go? Effective-Context Exposure Balancing cs.CL · 2026-05-11 · conditional · none · ref 20
EXACT re-allocates training supervision by inverse frequency of long effective-context targets, improving NoLiMa and RULER scores by 5-18 points on Qwen and LLaMA models without degrading standard QA or reasoning.
Slipstream: Trajectory-Grounded Compaction Validation for Long-Horizon Agents cs.MA · 2026-05-09 · unverdicted · none · ref 20
Slipstream uses asynchronous compaction with trajectory-grounded judge validation to improve long-horizon agent accuracy by up to 8.8 percentage points and reduce latency by up to 39.7%.
Retrieval from Within: An Intrinsic Capability of Attention-Based Models cs.LG · 2026-05-07 · unverdicted · none · ref 24 · 2 links
Attention-based models can retrieve evidence intrinsically by using decoder attention to score and reuse their own pre-encoded chunks, outperforming separate retrieval pipelines on QA benchmarks.
Tool Attention Is All You Need: Dynamic Tool Gating and Lazy Schema Loading for Eliminating the MCP/Tools Tax in Scalable Agentic Workflows cs.AI · 2026-04-23 · unverdicted · none · ref 20
Tool Attention cuts tool-related tokens by 95% and raises context utilization from 24% to 91% in a 120-tool simulation via dynamic gating and lazy loading.
Context Collapse: Barriers to Adoption for Generative AI in Workplace Settings cs.CY · 2026-04-06 · unverdicted · none · ref 89
Expert interviews demonstrate that context in generative AI workplace use collapses or rots over time, limiting tool effectiveness and revealing pitfalls in computational context approaches.
MemOCR: Layout-Aware Visual Memory for Efficient Long-Horizon Reasoning cs.AI · 2026-01-29 · unverdicted · none · ref 17
MemOCR renders structured memory as images with adaptive visual density to improve long-horizon reasoning under tight context budgets.
MiMo-V2-Flash Technical Report cs.CL · 2026-01-06 · unverdicted · none · ref 35
MiMo-V2-Flash is a 309B/15B MoE model trained on 27T tokens with hybrid attention and multi-teacher on-policy distillation that matches larger models like DeepSeek-V3.2 while enabling 2.6x faster decoding via repurposed MTP layers.

Rossi, Seunghyun Yoon, and Hinrich Schütze

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer