Layer-wise Token Compression applies adaptive token pooling at middle transformer layers for cross-encoder rerankers, preserving MS MARCO ranking quality while raising QPS up to 25% on passages and 116% on documents, with added gains on listwise LLM rerankers and a regularizer effect for long inputs
hub Baseline reference
CoRRabs/2003.07820(2020), https://arxiv.org/ abs/2003.07820
Baseline reference. 67% of citing Pith papers use this work as a benchmark or comparison.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
CRAFT is a supervised LLM framework using retrieval-augmented generation, self-refinement, fine-tuning, and preference optimization to create fluent adversarial content that boosts target ranks in neural ranking models, outperforming baselines on MS MARCO and TREC benchmarks with cross-architecture
ResRank unifies retrieval and listwise reranking by compressing passages to one token each, using residual connections and cosine-similarity scoring, achieving competitive effectiveness on TREC DL and BEIR benchmarks with zero generated tokens.
HeadRank lifts preference optimization into attention space via entropy-regularized head selection and distribution regularizers to sharpen discriminability for efficient listwise reranking.
Cross-encoder reranker performance scales predictably via power laws with model size and training exposure, allowing accurate forecasts for 400M and 1B models and data-heavy compute allocation.
GAIA benchmark shows humans at 92% accuracy on simple real-world questions far outperform current AI systems at 15%, proposing this gap as a key milestone for general AI.
Retrieved query variants from logs combined with LLM-augmented generation improve unsupervised QPP accuracy by up to 30% for neural rankers on TREC DL'19 and DL'20.
Bias toward LLM texts in neural retrievers arises from artifact imbalances between positive and negative documents in training data that are absorbed during contrastive learning.
Synthetically formalizing information needs into topics with descriptions and narratives improves LLM relevance assessor agreement with humans and reduces over-labeling of relevant documents on TREC Deep Learning and Robust04.
Internal attention in LLMs shows a bell-curve relevance distribution across layers, enabling Selective-ICR that cuts inference latency 30-50% and lets an 8B zero-shot model match 14B RL re-rankers on BRIGHT.
Introduces the LLM ORDER BY semantic operator with algorithmic improvements, a semantic-aware external merge sort, and a budget-aware optimizer that selects near-optimal access paths for LLM-based ordering.
RankFlow deploys four LLM roles in sequence to rewrite queries, generate pseudo-answers, summarize passages, and rerank candidates, outperforming prior methods on TREC-DL, BEIR, and NovelEval.
A unified evaluation finds LLM query reformulation gains are strongly conditioned on retrieval paradigm, do not consistently transfer to neural retrievers, and are not uniformly improved by larger LLMs.
RRK compresses documents to multi-token embeddings for efficient listwise reranking, enabling an 8B model to achieve 3x-18x speedups over smaller models with comparable or better effectiveness.
Stratified sampling preserving teacher score distribution outperforms hard-negative mining as a robust baseline for knowledge distillation in dense retrieval.
A statistical risk estimation method enables query-specific dimension selection in dense embeddings, achieving equivalent effectiveness with about 50% smaller embeddings at inference time.
Reproducibility study confirms Hypencoder's non-linear query-specific scoring improves retrieval over bi-encoders on standard benchmarks but standard methods remain faster and hard-task results are mixed due to implementation issues.
citing papers explorer
-
Layer-wise Token Compression for Efficient Document Reranking
Layer-wise Token Compression applies adaptive token pooling at middle transformer layers for cross-encoder rerankers, preserving MS MARCO ranking quality while raising QPS up to 25% on passages and 116% on documents, with added gains on listwise LLM rerankers and a regularizer effect for long inputs
-
Led to Mislead: Adversarial Content Injection for Attacks on Neural Ranking Models
CRAFT is a supervised LLM framework using retrieval-augmented generation, self-refinement, fine-tuning, and preference optimization to create fluent adversarial content that boosts target ranks in neural ranking models, outperforming baselines on MS MARCO and TREC benchmarks with cross-architecture
-
ResRank: Unifying Retrieval and Listwise Reranking via End-to-End Joint Training with Residual Passage Compression
ResRank unifies retrieval and listwise reranking by compressing passages to one token each, using residual connections and cosine-similarity scoring, achieving competitive effectiveness on TREC DL and BEIR benchmarks with zero generated tokens.
-
HeadRank: Decoding-Free Passage Reranking via Preference-Aligned Attention Heads
HeadRank lifts preference optimization into attention space via entropy-regularized head selection and distribution regularizers to sharpen discriminability for efficient listwise reranking.
-
Scaling Laws for Cross-Encoder Reranking
Cross-encoder reranker performance scales predictably via power laws with model size and training exposure, allowing accurate forecasts for 400M and 1B models and data-heavy compute allocation.
-
GAIA: a benchmark for General AI Assistants
GAIA benchmark shows humans at 92% accuracy on simple real-world questions far outperform current AI systems at 15%, proposing this gap as a key milestone for general AI.
-
RAQG-QPP: Query Performance Prediction with Retrieved Query Variants and Retrieval Augmented Query Generation
Retrieved query variants from logs combined with LLM-augmented generation improve unsupervised QPP accuracy by up to 30% for neural rankers on TREC DL'19 and DL'20.
-
Data, Not Model: Explaining Bias toward LLM Texts in Neural Retrievers
Bias toward LLM texts in neural retrievers arises from artifact imbalances between positive and negative documents in training data that are absorbed during contrastive learning.
-
Formalized Information Needs Improve Large-Language-Model Relevance Judgments
Synthetically formalizing information needs into topics with descriptions and narratives improves LLM relevance assessor agreement with humans and reduces over-labeling of relevant documents on TREC Deep Learning and Robust04.
-
Where Relevance Emerges: A Layer-Wise Study of Internal Attention for Zero-Shot Re-Ranking
Internal attention in LLMs shows a bell-curve relevance distribution across layers, enabling Selective-ICR that cuts inference latency 30-50% and lets an 8B zero-shot model match 14B RL re-rankers on BRIGHT.
-
Access Paths for Efficient Ordering with Large Language Models
Introduces the LLM ORDER BY semantic operator with algorithmic improvements, a semantic-aware external merge sort, and a budget-aware optimizer that selects near-optimal access paths for LLM-based ordering.
-
RankFlow: A Multi-Role Collaborative Reranking Workflow Utilizing Large Language Models
RankFlow deploys four LLM roles in sequence to rewrite queries, generate pseudo-answers, summarize passages, and rerank candidates, outperforming prior methods on TREC-DL, BEIR, and NovelEval.
-
A Reproducibility Study of LLM-Based Query Reformulation
A unified evaluation finds LLM query reformulation gains are strongly conditioned on retrieval paradigm, do not consistently transfer to neural retrievers, and are not uniformly improved by larger LLMs.
-
Efficient Listwise Reranking with Compressed Document Representations
RRK compresses documents to multi-token embeddings for efficient listwise reranking, enabling an 8B model to achieve 3x-18x speedups over smaller models with comparable or better effectiveness.
-
Beyond Hard Negatives: The Importance of Score Distribution in Knowledge Distillation for Dense Retrieval
Stratified sampling preserving teacher score distribution outperforms hard-negative mining as a robust baseline for knowledge distillation in dense retrieval.
-
Statistical Foundations of DIME: Risk Estimation for Practical Index Selection
A statistical risk estimation method enables query-specific dimension selection in dense embeddings, achieving equivalent effectiveness with about 50% smaller embeddings at inference time.
-
Hypencoder Revisited: Reproducibility and Analysis of Non-Linear Scoring for First-Stage Retrieval
Reproducibility study confirms Hypencoder's non-linear query-specific scoring improves retrieval over bi-encoders on standard benchmarks but standard methods remain faster and hard-task results are mixed due to implementation issues.
- Dynamic Ranked List Truncation for Reranking Pipelines via LLM-generated Reference-Documents