hub

arXiv preprint arXiv:2508.21038 , year=

Orion Weller, Michael Boratko, Iftekhar Naim, Jinhyuk Lee · 2025 · arXiv 2508.21038

13 Pith papers cite this work. Polarity classification is still indexing.

13 Pith papers citing it

read on arXiv browse 13 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 3 method 1

citation-polarity summary

background 1 support 1 unclear 1 use method 1

representative citing papers

Beyond Bag-of-Patches: Learning Global Layout via Textual Supervision for Late-Interaction Visual Document Retrieval

cs.CV · 2026-05-08 · unverdicted · novelty 7.0

A text-supervised global layout embedding augments local patch representations in late-interaction VDR, yielding +2.4 nDCG@5 and +2.3 MAP@5 gains over ColPali/ColQwen baselines on ViDoRe-v2.

Semantic Recall for Vector Search

cs.IR · 2026-04-22 · unverdicted · novelty 7.0

Semantic Recall is a new evaluation metric for approximate nearest neighbor search that focuses only on semantically relevant results, with Tolerant Recall as a proxy when relevance labels are unavailable.

On the Robustness of LLM-Based Dense Retrievers: A Systematic Analysis of Generalizability and Stability

cs.IR · 2026-04-17 · unverdicted · novelty 7.0

LLM-based dense retrievers generalize better when instruction-tuned but pay a specialization tax when optimized for reasoning; they resist typos and corpus poisoning better than encoder-only baselines yet remain vulnerable to semantic perturbations, with larger models and certain embedding geometry,

Topic Is Not Agenda: A Citation-Community Audit of Text Embeddings

cs.IR · 2026-05-08 · unverdicted · novelty 6.0

Embeddings retrieve same-subfield papers at 45-52% but same-agenda papers at only 15-21%; citation rerank reaches 57-59% on agenda queries.

Aspect-Aware Content-Based Recommendations for Mathematical Research Papers

cs.IR · 2026-05-05 · unverdicted · novelty 6.0

The authors introduce aspect-aware datasets GoldRiM and SilverRiM for math papers and AchGNN, a heterogeneous GNN that outperforms prior methods by jointly modeling textual semantics, citations, and author lineage across aspects.

Reproducing Complex Set-Compositional Information Retrieval

cs.CL · 2026-05-05 · unverdicted · novelty 6.0

Neural retrievers that double BM25 performance on QUEST collapse below 0.02 Recall@100 on the new LIMIT+ benchmark while lexical methods reach 0.96, with all methods degrading as compositional depth increases.

Provable Accuracy Collapse in Embedding-Based Representations under Dimensionality Mismatch

cs.DS · 2026-05-05 · unverdicted · novelty 6.0

Triplet constraints realizable in D-dimensional Euclidean space cannot be preserved above 50% accuracy by any embedding of dimension at most cD for constant c<1, with UGC-hardness preventing better polynomial-time solutions in any dimension.

Generative Retrieval Overcomes Limitations of Dense Retrieval but Struggles with Identifier Ambiguity

cs.IR · 2026-04-07 · conditional · novelty 6.0

Generative retrieval beats dense retrieval and BM25 on the LIMIT dataset but degrades with hard negatives due to identifier ambiguity during decoding.

Diffusion-Inspired Masked Fine-Tuning for Knowledge Injection in Autoregressive LLMs

cs.CL · 2025-10-10 · unverdicted · novelty 6.0

Masked fine-tuning enables autoregressive LLMs to inject new factual knowledge without paraphrases and with reversal-curse resistance, matching diffusion LLM advantages on QA tasks.

MetaEmbed: Scaling Multimodal Retrieval at Test-Time with Flexible Late Interaction

cs.IR · 2025-09-22 · unverdicted · novelty 6.0

MetaEmbed trains fixed learnable Meta Tokens to produce granularity-organized multi-vector embeddings that support test-time scaling in multimodal retrieval.

Superintelligent Retrieval Agent: The Next Frontier of Information Retrieval

cs.IR · 2026-05-07 · unverdicted · novelty 5.0

SIRA compresses multi-round exploratory retrieval into one LLM-guided, corpus-statistic-validated weighted BM25 query and reports superior results over dense retrievers and agentic baselines on BEIR benchmarks.

LLM-Oriented Information Retrieval: A Denoising-First Perspective

cs.IR · 2026-05-01 · unverdicted · novelty 4.0 · 2 refs

Argues for a denoising-first paradigm in LLM-oriented information retrieval, framing challenges via a four-stage progression and providing a taxonomy of signal-to-noise optimization techniques across the pipeline.

Beyond the Failures: Rethinking Foundation Models in Pathology

cs.AI · 2025-10-27 · unverdicted · novelty 2.0

Foundation models stumble in pathology due to conceptual mismatches with biological tissue, requiring explicitly designed models rather than adaptations of natural-image methods.

citing papers explorer

Showing 13 of 13 citing papers.

Beyond Bag-of-Patches: Learning Global Layout via Textual Supervision for Late-Interaction Visual Document Retrieval cs.CV · 2026-05-08 · unverdicted · none · ref 43
A text-supervised global layout embedding augments local patch representations in late-interaction VDR, yielding +2.4 nDCG@5 and +2.3 MAP@5 gains over ColPali/ColQwen baselines on ViDoRe-v2.
Semantic Recall for Vector Search cs.IR · 2026-04-22 · unverdicted · none · ref 33
Semantic Recall is a new evaluation metric for approximate nearest neighbor search that focuses only on semantically relevant results, with Tolerant Recall as a proxy when relevance labels are unavailable.
On the Robustness of LLM-Based Dense Retrievers: A Systematic Analysis of Generalizability and Stability cs.IR · 2026-04-17 · unverdicted · none · ref 71
LLM-based dense retrievers generalize better when instruction-tuned but pay a specialization tax when optimized for reasoning; they resist typos and corpus poisoning better than encoder-only baselines yet remain vulnerable to semantic perturbations, with larger models and certain embedding geometry,
Topic Is Not Agenda: A Citation-Community Audit of Text Embeddings cs.IR · 2026-05-08 · unverdicted · none · ref 31
Embeddings retrieve same-subfield papers at 45-52% but same-agenda papers at only 15-21%; citation rerank reaches 57-59% on agenda queries.
Aspect-Aware Content-Based Recommendations for Mathematical Research Papers cs.IR · 2026-05-05 · unverdicted · none · ref 55
The authors introduce aspect-aware datasets GoldRiM and SilverRiM for math papers and AchGNN, a heterogeneous GNN that outperforms prior methods by jointly modeling textual semantics, citations, and author lineage across aspects.
Reproducing Complex Set-Compositional Information Retrieval cs.CL · 2026-05-05 · unverdicted · none · ref 23
Neural retrievers that double BM25 performance on QUEST collapse below 0.02 Recall@100 on the new LIMIT+ benchmark while lexical methods reach 0.96, with all methods degrading as compositional depth increases.
Provable Accuracy Collapse in Embedding-Based Representations under Dimensionality Mismatch cs.DS · 2026-05-05 · unverdicted · none · ref 232
Triplet constraints realizable in D-dimensional Euclidean space cannot be preserved above 50% accuracy by any embedding of dimension at most cD for constant c<1, with UGC-hardness preventing better polynomial-time solutions in any dimension.
Generative Retrieval Overcomes Limitations of Dense Retrieval but Struggles with Identifier Ambiguity cs.IR · 2026-04-07 · conditional · none · ref 18
Generative retrieval beats dense retrieval and BM25 on the LIMIT dataset but degrades with hard negatives due to identifier ambiguity during decoding.
Diffusion-Inspired Masked Fine-Tuning for Knowledge Injection in Autoregressive LLMs cs.CL · 2025-10-10 · unverdicted · none · ref 23
Masked fine-tuning enables autoregressive LLMs to inject new factual knowledge without paraphrases and with reversal-curse resistance, matching diffusion LLM advantages on QA tasks.
MetaEmbed: Scaling Multimodal Retrieval at Test-Time with Flexible Late Interaction cs.IR · 2025-09-22 · unverdicted · none · ref 62
MetaEmbed trains fixed learnable Meta Tokens to produce granularity-organized multi-vector embeddings that support test-time scaling in multimodal retrieval.
Superintelligent Retrieval Agent: The Next Frontier of Information Retrieval cs.IR · 2026-05-07 · unverdicted · none · ref 13
SIRA compresses multi-round exploratory retrieval into one LLM-guided, corpus-statistic-validated weighted BM25 query and reports superior results over dense retrievers and agentic baselines on BEIR benchmarks.
LLM-Oriented Information Retrieval: A Denoising-First Perspective cs.IR · 2026-05-01 · unverdicted · none · ref 195 · 2 links
Argues for a denoising-first paradigm in LLM-oriented information retrieval, framing challenges via a four-stage progression and providing a taxonomy of signal-to-noise optimization techniques across the pipeline.
Beyond the Failures: Rethinking Foundation Models in Pathology cs.AI · 2025-10-27 · unverdicted · none · ref 15
Foundation models stumble in pathology due to conceptual mismatches with biological tissue, requiring explicitly designed models rather than adaptations of natural-image methods.

arXiv preprint arXiv:2508.21038 , year=

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer