hub Mixed citations

Debrup Das, Sam O’ Nuallain, and Razieh Rahimi

Orion Weller, Kathryn Ricci, Eugene Yang, Andrew Yates, Dawn Lawrie, Benjamin Van Durme · 2025 · arXiv 2502.18418

Mixed citation behavior. Most common role is background (60%).

14 Pith papers citing it

Background 60% of classified citations

read on arXiv browse 14 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 3 baseline 1 other 1

citation-polarity summary

background 3 baseline 1 unclear 1

representative citing papers

FollowTable: A Benchmark for Instruction-Following Table Retrieval

cs.IR · 2026-05-01 · unverdicted · novelty 8.0

FollowTable is the first large-scale benchmark for instruction-following table retrieval, paired with an Instruction Responsiveness Score, showing that existing models fail to adapt to fine-grained constraints beyond topical similarity.

Layer-wise Token Compression for Efficient Document Reranking

cs.IR · 2026-05-20 · unverdicted · novelty 7.0 · 2 refs

Layer-wise Token Compression applies adaptive token pooling at middle transformer layers for cross-encoder rerankers, preserving MS MARCO ranking quality while raising QPS up to 25% on passages and 116% on documents, with added gains on listwise LLM rerankers and a regularizer effect for long inputs

LeanSearch v2: Global Premise Retrieval for Lean 4 Theorem Proving

cs.IR · 2026-05-13 · conditional · novelty 7.0 · 2 refs

LeanSearch v2 recovers 46.1% of ground-truth premise groups for research-level Lean 4 theorems within 10 candidates and raises fixed-loop proof success to 20%.

Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search Systems

cs.CL · 2026-05-05 · unverdicted · novelty 7.0

BRIGHT-Pro and RTriever-Synth advance reasoning-intensive retrieval by adding multi-aspect evidence evaluation and aspect-decomposed synthetic training, with the fine-tuned RTriever-4B showing gains over its base model.

MARVEL: Multimodal Adaptive Reasoning-intensiVe Expand-rerank and retrievaL

cs.IR · 2026-04-08 · unverdicted · novelty 7.0

MARVEL reaches 37.9 nDCG@10 on the MM-BRIGHT benchmark by combining LLM query expansion, a reasoning-enhanced dense retriever, and GPT-4o CoT reranking, beating prior multimodal encoders by 10.3 points.

Verbal-R3: Verbal Reranker as the Missing Bridge between Retrieval and Reasoning

cs.CL · 2026-05-02 · unverdicted · novelty 6.0

Verbal-R3 uses a verbal reranker to generate analytic narratives that guide retrieval and reasoning in LLMs, achieving SOTA results on complex QA benchmarks.

A Survey of Reasoning-Intensive Retrieval: Progress and Challenges

cs.IR · 2026-04-30 · unverdicted · novelty 6.0

A survey that categorizes RIR benchmarks by domain and modality, proposes a taxonomy for integrating reasoning into retrieval pipelines, and outlines key challenges.

DualView: Adaptive Local-Global Fusion for Multi-Hop Document Reranking

cs.IR · 2026-04-13 · unverdicted · novelty 6.0

DualView fuses local cross-attention and global context aggregation via adaptive gating to rerank fixed candidate sets for multi-hop QA, reporting 99.4% Top-4 Recall on MuSiQue at 4 ms latency while beating larger cross-encoders.

HIVE: Query, Hypothesize, Verify An LLM Framework for Multimodal Reasoning-Intensive Retrieval

cs.IR · 2026-04-08 · unverdicted · novelty 6.0

HIVE raises multimodal retrieval nDCG@10 to 41.7 on the MM-BRIGHT benchmark by inserting LLM-driven hypothesis generation and verification between retrieval passes, delivering +9.5 over the best text-only baseline and +14.1 over the best multimodal baseline.

ProRank: Prompt Warmup via Reinforcement Learning for Small Language Models Reranking

cs.IR · 2025-06-04 · unverdicted · novelty 6.0

ProRank uses RL-based prompt warmup and fine-grained scoring to train small language models that surpass LLM rerankers on BEIR.

Test-time Scaling over Perception: Resolving the Grounding Paradox in Thinking with Images

cs.CV · 2026-04-13 · unverdicted · novelty 5.0

TTSP resolves the Grounding Paradox by treating perception as a scalable test-time process that generates, filters, and iteratively refines multiple visual exploration traces, outperforming baselines on high-resolution and multimodal reasoning tasks.

Beyond Relevance: On the Relationship Between Retrieval and RAG Information Coverage

cs.IR · 2026-03-09 · unverdicted · novelty 5.0

Coverage-focused retrieval metrics correlate strongly with nugget coverage in RAG responses across text and multimodal benchmarks, supporting their use as performance proxies when retrieval and generation goals align.

Reproducing Adaptive Reranking for Reasoning-Intensive IR

cs.IR · 2026-04-30 · unverdicted · novelty 2.0

Reproducing GAR on BRIGHT shows it boosts reasoning-intensive retrieval effectiveness with low overhead when the reranker's signal quality is strong.

DeliCIR: Deliberative Test-Time Evolutionary Hierarchical Multi-Agents for Composed Image Retrieval

cs.CV · 2026-05-21

citing papers explorer

Showing 14 of 14 citing papers.

FollowTable: A Benchmark for Instruction-Following Table Retrieval cs.IR · 2026-05-01 · unverdicted · none · ref 54
FollowTable is the first large-scale benchmark for instruction-following table retrieval, paired with an Instruction Responsiveness Score, showing that existing models fail to adapt to fine-grained constraints beyond topical similarity.
Layer-wise Token Compression for Efficient Document Reranking cs.IR · 2026-05-20 · unverdicted · none · ref 38 · 2 links
Layer-wise Token Compression applies adaptive token pooling at middle transformer layers for cross-encoder rerankers, preserving MS MARCO ranking quality while raising QPS up to 25% on passages and 116% on documents, with added gains on listwise LLM rerankers and a regularizer effect for long inputs
LeanSearch v2: Global Premise Retrieval for Lean 4 Theorem Proving cs.IR · 2026-05-13 · conditional · none · ref 24 · 2 links
LeanSearch v2 recovers 46.1% of ground-truth premise groups for research-level Lean 4 theorems within 10 candidates and raises fixed-loop proof success to 20%.
Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search Systems cs.CL · 2026-05-05 · unverdicted · none · ref 11
BRIGHT-Pro and RTriever-Synth advance reasoning-intensive retrieval by adding multi-aspect evidence evaluation and aspect-decomposed synthetic training, with the fine-tuned RTriever-4B showing gains over its base model.
MARVEL: Multimodal Adaptive Reasoning-intensiVe Expand-rerank and retrievaL cs.IR · 2026-04-08 · unverdicted · none · ref 34
MARVEL reaches 37.9 nDCG@10 on the MM-BRIGHT benchmark by combining LLM query expansion, a reasoning-enhanced dense retriever, and GPT-4o CoT reranking, beating prior multimodal encoders by 10.3 points.
Verbal-R3: Verbal Reranker as the Missing Bridge between Retrieval and Reasoning cs.CL · 2026-05-02 · unverdicted · none · ref 51
Verbal-R3 uses a verbal reranker to generate analytic narratives that guide retrieval and reasoning in LLMs, achieving SOTA results on complex QA benchmarks.
A Survey of Reasoning-Intensive Retrieval: Progress and Challenges cs.IR · 2026-04-30 · unverdicted · none · ref 76
A survey that categorizes RIR benchmarks by domain and modality, proposes a taxonomy for integrating reasoning into retrieval pipelines, and outlines key challenges.
DualView: Adaptive Local-Global Fusion for Multi-Hop Document Reranking cs.IR · 2026-04-13 · unverdicted · none · ref 33
DualView fuses local cross-attention and global context aggregation via adaptive gating to rerank fixed candidate sets for multi-hop QA, reporting 99.4% Top-4 Recall on MuSiQue at 4 ms latency while beating larger cross-encoders.
HIVE: Query, Hypothesize, Verify An LLM Framework for Multimodal Reasoning-Intensive Retrieval cs.IR · 2026-04-08 · unverdicted · none · ref 37
HIVE raises multimodal retrieval nDCG@10 to 41.7 on the MM-BRIGHT benchmark by inserting LLM-driven hypothesis generation and verification between retrieval passes, delivering +9.5 over the best text-only baseline and +14.1 over the best multimodal baseline.
ProRank: Prompt Warmup via Reinforcement Learning for Small Language Models Reranking cs.IR · 2025-06-04 · unverdicted · none · ref 8
ProRank uses RL-based prompt warmup and fine-grained scoring to train small language models that surpass LLM rerankers on BEIR.
Test-time Scaling over Perception: Resolving the Grounding Paradox in Thinking with Images cs.CV · 2026-04-13 · unverdicted · none · ref 35
TTSP resolves the Grounding Paradox by treating perception as a scalable test-time process that generates, filters, and iteratively refines multiple visual exploration traces, outperforming baselines on high-resolution and multimodal reasoning tasks.
Beyond Relevance: On the Relationship Between Retrieval and RAG Information Coverage cs.IR · 2026-03-09 · unverdicted · none · ref 59
Coverage-focused retrieval metrics correlate strongly with nugget coverage in RAG responses across text and multimodal benchmarks, supporting their use as performance proxies when retrieval and generation goals align.
Reproducing Adaptive Reranking for Reasoning-Intensive IR cs.IR · 2026-04-30 · unverdicted · none · ref 47
Reproducing GAR on BRIGHT shows it boosts reasoning-intensive retrieval effectiveness with low overhead when the reranker's signal quality is strong.
DeliCIR: Deliberative Test-Time Evolutionary Hierarchical Multi-Agents for Composed Image Retrieval cs.CV · 2026-05-21 · unreviewed · ref 60

Debrup Das, Sam O’ Nuallain, and Razieh Rahimi

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer