hub

Passage Re-ranking with BERT

Rodrigo Nogueira, Kyunghyun Cho · 2019 · cs.IR · arXiv 1901.04085

44 Pith papers cite this work. Polarity classification is still indexing.

44 Pith papers citing it

open full Pith review browse 44 citing papers arXiv PDF

abstract

Recently, neural models pretrained on a language modeling task, such as ELMo (Peters et al., 2017), OpenAI GPT (Radford et al., 2018), and BERT (Devlin et al., 2018), have achieved impressive results on various natural language processing tasks such as question-answering and natural language inference. In this paper, we describe a simple re-implementation of BERT for query-based passage re-ranking. Our system is the state of the art on the TREC-CAR dataset and the top entry in the leaderboard of the MS MARCO passage retrieval task, outperforming the previous state of the art by 27% (relative) in MRR@10. The code to reproduce our results is available at https://github.com/nyu-dl/dl4marco-bert

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 3

citation-polarity summary

background 3

claims ledger

abstract Recently, neural models pretrained on a language modeling task, such as ELMo (Peters et al., 2017), OpenAI GPT (Radford et al., 2018), and BERT (Devlin et al., 2018), have achieved impressive results on various natural language processing tasks such as question-answering and natural language inference. In this paper, we describe a simple re-implementation of BERT for query-based passage re-ranking. Our system is the state of the art on the TREC-CAR dataset and the top entry in the leaderboard of the MS MARCO passage retrieval task, outperforming the previous state of the art by 27% (relative)

co-cited works

representative citing papers

Hackers or Hallucinators? A Comprehensive Analysis of LLM-Based Automated Penetration Testing

cs.CR · 2026-04-07 · unverdicted · novelty 8.0

The first SoK on LLM-based AutoPT frameworks provides a six-dimension taxonomy of agent designs and a unified empirical benchmark evaluating 15 frameworks via over 10 billion tokens and 1,500 manually reviewed logs.

Learning to Unscramble Feynman Loop Integrals with SAILIR

hep-ph · 2026-04-06 · unverdicted · novelty 8.0

A self-supervised transformer learns to unscramble Feynman integrals for online IBP reduction, delivering bounded memory use on complex two-loop topologies while matching Kira's speed on the hardest cases tested.

BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models

cs.IR · 2021-04-17 · accept · novelty 8.0

BEIR is a heterogeneous zero-shot benchmark showing BM25 as a robust baseline while re-ranking and late-interaction models perform best on average at higher cost, with dense and sparse models lagging in generalization.

Dense Passage Retrieval for Open-Domain Question Answering

cs.CL · 2020-04-10 · accept · novelty 8.0

Dense dual-encoder retrievers outperform BM25 by 9-19% absolute in top-20 passage retrieval accuracy across open-domain QA datasets and enable new state-of-the-art end-to-end QA results.

Very Efficient Listwise Multimodal Reranking for Long Documents

cs.IR · 2026-05-12 · unverdicted · novelty 7.0

ZipRerank delivers state-of-the-art multimodal listwise reranking accuracy for long documents at up to 10x lower latency via early interaction and single-pass scoring.

Hypothesis-Driven Deep Research with Large Language Models: A Structured Methodology for Automated Knowledge Discovery

cs.AI · 2026-05-11 · unverdicted · novelty 7.0

HDRI is a six-principle eight-stage framework for hypothesis-organized LLM research featuring gap-driven iteration, traceable fact reasoning, and subject locking, realized in INFOMINER with reported gains in fact density and completeness.

Prism-Reranker: Beyond Relevance Scoring -- Jointly Producing Contributions and Evidence for Agentic Retrieval

cs.IR · 2026-04-26 · accept · novelty 7.0

Prism-Reranker models output relevance, contribution statements, and evidence passages to support agentic retrieval beyond scalar scoring.

Bayesian Active Learning with Gaussian Processes Guided by LLM Relevance Scoring for Dense Passage Retrieval

cs.IR · 2026-04-20 · unverdicted · novelty 7.0

BAGEL is a Bayesian active learning framework that uses Gaussian Processes to propagate LLM relevance signals across embedding space and guide global exploration, outperforming standard LLM reranking under identical budgets on four retrieval benchmarks.

KIRA: Knowledge-Intensive Image Retrieval and Reasoning Architecture for Specialized Visual Domains

cs.CV · 2026-04-18 · unverdicted · novelty 7.0

KIRA is a unified architecture for visual RAG that reports 0.97 retrieval precision, 1.0 grounding, and 0.707 domain correctness across medical, circuit, satellite, and histopathology domains via hierarchical chunking, dual-path retrieval, and evidence-conditioned generation.

Scaling Laws for Cross-Encoder Reranking

cs.IR · 2026-03-05 · unverdicted · novelty 7.0

Cross-encoder reranker performance scales predictably via power laws with model size and training exposure, allowing accurate forecasts for 400M and 1B models and data-heavy compute allocation.

SPIRE: Structure-Preserving Interpretable Retrieval of Evidence

cs.IR · 2026-02-12 · unverdicted · novelty 7.0

SPIRE presents a tree-structured retrieval method using subdocuments, paths, and dual contextualization that produces higher-quality and more diverse citations than passage-based baselines on HTML QA benchmarks.

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

cs.CL · 2020-05-22 · accept · novelty 7.0

RAG models set new state-of-the-art results on open-domain QA by retrieving Wikipedia passages and conditioning a generative model on them, while also producing more factual text than parametric baselines.

Derivation Prompting: A Logic-Based Method for Improving Retrieval-Augmented Generation

cs.CL · 2026-05-13 · unverdicted · novelty 6.0

Derivation Prompting constructs logic-based derivation trees in RAG generation to improve interpretability and reduce unacceptable answers compared to standard RAG or long-context methods in a case study.

PRISM: Pareto-Efficient Retrieval over Intent-Aware Structured Memory for Long-Horizon Agents

cs.CL · 2026-05-12 · unverdicted · novelty 6.0

PRISM achieves higher accuracy than baselines on long-horizon agent tasks at an order-of-magnitude smaller context budget by combining hierarchical bundle search, query-sensitive costing, evidence compression, and adaptive intent routing over structured memory.

Escaping the Diversity Trap in Robotic Manipulation via Anchor-Centric Adaptation

cs.RO · 2026-05-08 · unverdicted · novelty 6.0

Anchor-Centric Adaptation escapes the diversity trap by prioritizing repeated demonstrations at core anchors over broad coverage, yielding higher success rates under fixed data budgets in robotic manipulation.

Storage Is Not Memory: A Retrieval-Centered Architecture for Agent Recall

cs.CL · 2026-05-06 · conditional · novelty 6.0

True Memory is a verbatim-event retrieval pipeline running on a single SQLite file that reaches 93% accuracy on LoCoMo multi-session questions, outperforming Mem0, Supermemory, Zep, and matching or exceeding EverMemOS and Hindsight on other long-context benchmarks.

Interactive Multi-Turn Retrieval for Health Videos

cs.IR · 2026-05-02 · unverdicted · novelty 6.0

DATR combines coarse CLIP-based retrieval with multi-turn query fusion and cross-encoder re-ranking to improve health video retrieval, supported by the new MHVRC corpus.

Verbal-R3: Verbal Reranker as the Missing Bridge between Retrieval and Reasoning

cs.CL · 2026-05-02 · unverdicted · novelty 6.0

Verbal-R3 uses a verbal reranker to generate analytic narratives that guide retrieval and reasoning in LLMs, achieving SOTA results on complex QA benchmarks.

A Replicability Study of XTR

cs.IR · 2026-05-01 · accept · novelty 6.0

XTR training does not improve retrieval effectiveness over ColBERT but enhances IVF engine efficiency by flattening token scores to produce more discriminative centroids.

From Unstructured Recall to Schema-Grounded Memory: Reliable AI Memory via Iterative, Schema-Aware Extraction

cs.AI · 2026-04-30 · unverdicted · novelty 6.0

Schema-aware iterative extraction turns AI memory into a verified system of record, reaching 90-97% accuracy on extraction and end-to-end memory benchmarks where retrieval baselines score 80-87%.

Onyx: Cost-Efficient Disk-Oblivious ANN Search

cs.CR · 2026-04-22 · unverdicted · novelty 6.0

Onyx inverts ANN-ORAM optimization priorities with a compact pruning representation and locality-aware shallow tree to deliver 1.7-9.9x lower cost and 2.3-12.3x lower latency for disk-oblivious ANN search.

The Effect of Document Selection on Query-focused Text Analysis

cs.IR · 2026-04-13 · conditional · novelty 6.0

Semantic and hybrid document retrieval methods provide reliable, efficient selection for query-focused text analyses like LDA and BERTopic, outperforming random or keyword-only approaches.

Entities as Retrieval Signals: A Systematic Study of Coverage, Supervision, and Evaluation in Entity-Oriented Ranking

cs.IR · 2026-04-06 · conditional · novelty 6.0

Entity signals cover only 19.7% of relevant documents on Robust04 and no configuration among 443 systems improves MAP by more than 0.05 in open-world evaluation, despite gains when entities are pre-restricted.

Beyond Single-Score Ranking: Facet-Aware Reranking for Controllable Diversity in Paper Recommendation

cs.IR · 2026-03-11 · unverdicted · novelty 6.0

SciFACE improves facet-specific paper ranking NDCG scores by training separate cross-encoders for Background and Method similarity on 5,891 GPT-4o-mini labeled pairs, outperforming SPECTER by up to 31 points.

citing papers explorer

Showing 44 of 44 citing papers.

Hackers or Hallucinators? A Comprehensive Analysis of LLM-Based Automated Penetration Testing cs.CR · 2026-04-07 · unverdicted · none · ref 82 · internal anchor
The first SoK on LLM-based AutoPT frameworks provides a six-dimension taxonomy of agent designs and a unified empirical benchmark evaluating 15 frameworks via over 10 billion tokens and 1,500 manually reviewed logs.
Learning to Unscramble Feynman Loop Integrals with SAILIR hep-ph · 2026-04-06 · unverdicted · none · ref 40 · internal anchor
A self-supervised transformer learns to unscramble Feynman integrals for online IBP reduction, delivering bounded memory use on complex two-loop topologies while matching Kira's speed on the hardest cases tested.
BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models cs.IR · 2021-04-17 · accept · none · ref 49 · internal anchor
BEIR is a heterogeneous zero-shot benchmark showing BM25 as a robust baseline while re-ranking and late-interaction models perform best on average at higher cost, with dense and sparse models lagging in generalization.
Dense Passage Retrieval for Open-Domain Question Answering cs.CL · 2020-04-10 · accept · none · ref 97 · internal anchor
Dense dual-encoder retrievers outperform BM25 by 9-19% absolute in top-20 passage retrieval accuracy across open-domain QA datasets and enable new state-of-the-art end-to-end QA results.
Very Efficient Listwise Multimodal Reranking for Long Documents cs.IR · 2026-05-12 · unverdicted · none · ref 46 · internal anchor
ZipRerank delivers state-of-the-art multimodal listwise reranking accuracy for long documents at up to 10x lower latency via early interaction and single-pass scoring.
Hypothesis-Driven Deep Research with Large Language Models: A Structured Methodology for Automated Knowledge Discovery cs.AI · 2026-05-11 · unverdicted · none · ref 28 · internal anchor
HDRI is a six-principle eight-stage framework for hypothesis-organized LLM research featuring gap-driven iteration, traceable fact reasoning, and subject locking, realized in INFOMINER with reported gains in fact density and completeness.
Prism-Reranker: Beyond Relevance Scoring -- Jointly Producing Contributions and Evidence for Agentic Retrieval cs.IR · 2026-04-26 · accept · none · ref 19 · internal anchor
Prism-Reranker models output relevance, contribution statements, and evidence passages to support agentic retrieval beyond scalar scoring.
Bayesian Active Learning with Gaussian Processes Guided by LLM Relevance Scoring for Dense Passage Retrieval cs.IR · 2026-04-20 · unverdicted · none · ref 27 · internal anchor
BAGEL is a Bayesian active learning framework that uses Gaussian Processes to propagate LLM relevance signals across embedding space and guide global exploration, outperforming standard LLM reranking under identical budgets on four retrieval benchmarks.
KIRA: Knowledge-Intensive Image Retrieval and Reasoning Architecture for Specialized Visual Domains cs.CV · 2026-04-18 · unverdicted · none · ref 20 · internal anchor
KIRA is a unified architecture for visual RAG that reports 0.97 retrieval precision, 1.0 grounding, and 0.707 domain correctness across medical, circuit, satellite, and histopathology domains via hierarchical chunking, dual-path retrieval, and evidence-conditioned generation.
Scaling Laws for Cross-Encoder Reranking cs.IR · 2026-03-05 · unverdicted · none · ref 32 · internal anchor
Cross-encoder reranker performance scales predictably via power laws with model size and training exposure, allowing accurate forecasts for 400M and 1B models and data-heavy compute allocation.
SPIRE: Structure-Preserving Interpretable Retrieval of Evidence cs.IR · 2026-02-12 · unverdicted · none · ref 17 · internal anchor
SPIRE presents a tree-structured retrieval method using subdocuments, paths, and dual contextualization that produces higher-quality and more diverse citations than passage-based baselines on HTML QA benchmarks.
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks cs.CL · 2020-05-22 · accept · none · ref 47 · internal anchor
RAG models set new state-of-the-art results on open-domain QA by retrieving Wikipedia passages and conditioning a generative model on them, while also producing more factual text than parametric baselines.
Derivation Prompting: A Logic-Based Method for Improving Retrieval-Augmented Generation cs.CL · 2026-05-13 · unverdicted · none · ref 9 · internal anchor
Derivation Prompting constructs logic-based derivation trees in RAG generation to improve interpretability and reduce unacceptable answers compared to standard RAG or long-context methods in a case study.
PRISM: Pareto-Efficient Retrieval over Intent-Aware Structured Memory for Long-Horizon Agents cs.CL · 2026-05-12 · unverdicted · none · ref 15 · internal anchor
PRISM achieves higher accuracy than baselines on long-horizon agent tasks at an order-of-magnitude smaller context budget by combining hierarchical bundle search, query-sensitive costing, evidence compression, and adaptive intent routing over structured memory.
Escaping the Diversity Trap in Robotic Manipulation via Anchor-Centric Adaptation cs.RO · 2026-05-08 · unverdicted · none · ref 85 · internal anchor
Anchor-Centric Adaptation escapes the diversity trap by prioritizing repeated demonstrations at core anchors over broad coverage, yielding higher success rates under fixed data budgets in robotic manipulation.
Storage Is Not Memory: A Retrieval-Centered Architecture for Agent Recall cs.CL · 2026-05-06 · conditional · none · ref 9 · internal anchor
True Memory is a verbatim-event retrieval pipeline running on a single SQLite file that reaches 93% accuracy on LoCoMo multi-session questions, outperforming Mem0, Supermemory, Zep, and matching or exceeding EverMemOS and Hindsight on other long-context benchmarks.
Interactive Multi-Turn Retrieval for Health Videos cs.IR · 2026-05-02 · unverdicted · none · ref 25 · internal anchor
DATR combines coarse CLIP-based retrieval with multi-turn query fusion and cross-encoder re-ranking to improve health video retrieval, supported by the new MHVRC corpus.
Verbal-R3: Verbal Reranker as the Missing Bridge between Retrieval and Reasoning cs.CL · 2026-05-02 · unverdicted · none · ref 38 · internal anchor
Verbal-R3 uses a verbal reranker to generate analytic narratives that guide retrieval and reasoning in LLMs, achieving SOTA results on complex QA benchmarks.
A Replicability Study of XTR cs.IR · 2026-05-01 · accept · none · ref 21 · internal anchor
XTR training does not improve retrieval effectiveness over ColBERT but enhances IVF engine efficiency by flattening token scores to produce more discriminative centroids.
From Unstructured Recall to Schema-Grounded Memory: Reliable AI Memory via Iterative, Schema-Aware Extraction cs.AI · 2026-04-30 · unverdicted · none · ref 18 · internal anchor
Schema-aware iterative extraction turns AI memory into a verified system of record, reaching 90-97% accuracy on extraction and end-to-end memory benchmarks where retrieval baselines score 80-87%.
Onyx: Cost-Efficient Disk-Oblivious ANN Search cs.CR · 2026-04-22 · unverdicted · none · ref 86 · internal anchor
Onyx inverts ANN-ORAM optimization priorities with a compact pruning representation and locality-aware shallow tree to deliver 1.7-9.9x lower cost and 2.3-12.3x lower latency for disk-oblivious ANN search.
The Effect of Document Selection on Query-focused Text Analysis cs.IR · 2026-04-13 · conditional · none · ref 3 · internal anchor
Semantic and hybrid document retrieval methods provide reliable, efficient selection for query-focused text analyses like LDA and BERTopic, outperforming random or keyword-only approaches.
Entities as Retrieval Signals: A Systematic Study of Coverage, Supervision, and Evaluation in Entity-Oriented Ranking cs.IR · 2026-04-06 · conditional · none · ref 13 · internal anchor
Entity signals cover only 19.7% of relevant documents on Robust04 and no configuration among 443 systems improves MAP by more than 0.05 in open-world evaluation, despite gains when entities are pre-restricted.
Beyond Single-Score Ranking: Facet-Aware Reranking for Controllable Diversity in Paper Recommendation cs.IR · 2026-03-11 · unverdicted · none · ref 2 · internal anchor
SciFACE improves facet-specific paper ranking NDCG scores by training separate cross-encoders for Background and Method similarity on 5,891 GPT-4o-mini labeled pairs, outperforming SPECTER by up to 31 points.
RankZephyr: Effective and Robust Zero-Shot Listwise Reranking is a Breeze! cs.IR · 2023-12-05 · conditional · none · ref 23 · internal anchor
RankZephyr is a new open-source LLM that closes the effectiveness gap with GPT-4 for zero-shot listwise reranking while showing robustness to input ordering and document count.
Atlas: Few-shot Learning with Retrieval Augmented Language Models cs.CL · 2022-08-05 · unverdicted · none · ref 94 · internal anchor
Atlas reaches over 42% accuracy on Natural Questions with only 64 examples, outperforming a 540B-parameter model by 3% with 50x fewer parameters.
Unsupervised Dense Information Retrieval with Contrastive Learning cs.IR · 2021-12-16 · unverdicted · none · ref 163 · internal anchor
Contrastive learning trains unsupervised dense retrievers that beat BM25 on most BEIR datasets and support cross-lingual retrieval across scripts.
Not All RAGs Are Created Equal: A Component-Wise Empirical Study for Software Engineering Tasks cs.SE · 2026-05-14 · unverdicted · none · ref 33 · internal anchor
Retriever-side choices, particularly the retrieval algorithm, exert more influence on RAG performance than generator selection across code generation, summarization, and repair tasks.
AgenticRAG: Agentic Retrieval for Enterprise Knowledge Bases cs.AI · 2026-05-07 · unverdicted · none · ref 6 · internal anchor
AgenticRAG equips an LLM with iterative retrieval and navigation tools, delivering 49.6% recall@1 on BRIGHT, 0.96 factuality on WixQA, and 92% correctness on FinanceBench.
KG-First, LLM-Fallback: A Hybrid Microservice for Grounded Skill Search and Explanation cs.IR · 2026-05-02 · unverdicted · none · ref 12 · internal anchor
SkillGraph-Service builds a provenance-preserving knowledge graph from multiple competency frameworks and achieves nDCG@5 above 0.94 with sub-200 ms latency via KG-first hybrid retrieval and constrained LLM explanations.
LLM-Oriented Information Retrieval: A Denoising-First Perspective cs.IR · 2026-05-01 · unverdicted · none · ref 135 · internal anchor
Denoising to maximize usable evidence density and verifiability is becoming the primary bottleneck in LLM-oriented information retrieval, conceptualized via a four-stage framework and addressed through a pipeline taxonomy of optimization techniques.
Efficient Listwise Reranking with Compressed Document Representations cs.IR · 2026-04-29 · unverdicted · none · ref 23 · internal anchor
RRK compresses documents to multi-token embeddings for efficient listwise reranking, enabling an 8B model to achieve 3x-18x speedups over smaller models with comparable or better effectiveness.
Adaptive Query Routing: A Tier-Based Framework for Hybrid Retrieval Across Financial, Legal, and Medical Documents cs.IR · 2026-04-14 · conditional · none · ref 8 · internal anchor
Tree reasoning outperforms vector search on complex document queries but a hybrid approach balances results across tiers, with validation showing an 11.7-point gap on real finance documents.
Dynamic Ranked List Truncation for Reranking Pipelines via LLM-generated Reference-Documents cs.IR · 2026-04-10 · unverdicted · none · ref 19 · internal anchor
LLM-generated reference documents enable dynamic ranked list truncation and adaptive batching for listwise reranking, outperforming prior RLT methods and accelerating processing by up to 66% on TREC benchmarks.
Beyond Hard Negatives: The Importance of Score Distribution in Knowledge Distillation for Dense Retrieval cs.IR · 2026-04-06 · unverdicted · none · ref 21 · internal anchor
Stratified sampling preserving teacher score distribution outperforms hard-negative mining as a robust baseline for knowledge distillation in dense retrieval.
Resolving the Robustness-Precision Trade-off in Financial RAG through Hybrid Document-Routed Retrieval cs.CL · 2026-03-26 · unverdicted · none · ref 11 · internal anchor
HDRR combines document-level semantic routing with scoped chunk retrieval to outperform both pure chunk-based retrieval and semantic file routing on the FinDER benchmark, delivering higher average scores, lower failure rates, and more perfect answers.
Retrieval-Augmented Generation for AI-Generated Content: A Survey cs.CV · 2024-02-29 · accept · none · ref 151 · internal anchor
A survey classifying RAG foundations for AIGC, summarizing enhancements, cross-modal applications, benchmarks, limitations, and future directions.
An End-to-End Ukrainian RAG for Local Deployment. Optimized Hybrid Search and Lightweight Generation cs.CL · 2026-04-23 · unverdicted · none · ref 1 · internal anchor
A two-stage hybrid search pipeline paired with a synthetic-data fine-tuned and compressed Ukrainian language model delivers competitive local question answering under strict compute limits.
Mira-Embeddings-V1: Domain-Adapted Semantic Reranking for Recruitment via LLM-Synthesized Data cs.CL · 2026-04-20 · conditional · none · ref 17 · internal anchor
Mira-Embeddings-V1 adapts embeddings for recruitment reranking by synthesizing positive and hard-negative samples with LLMs, then applies JD-JD contrastive and JD-CV triplet training plus a BoundaryHead MLP, lifting Recall@50 from 68.89% to 77.55% and Recall@200 from 0.5969 to 0.7047.
Peerispect: Claim Verification in Scientific Peer Reviews cs.CL · 2026-04-19 · unverdicted · none · ref 16 · internal anchor
Peerispect extracts claims from peer reviews, retrieves evidence from the manuscript, and verifies them via NLI in a modular pipeline with a visual interface.
FRAGATA: Semantic Retrieval of HPC Support Tickets via Hybrid RAG over 20 Years of Request Tracker History cs.IR · 2026-04-15 · unverdicted · none · ref 4 · internal anchor
Fragata applies hybrid RAG to enable semantic retrieval of HPC support tickets across 20 years of history, handling language differences, typos, and varied wording better than traditional keyword search.
A Systematic Study of Retrieval Pipeline Design for Retrieval-Augmented Medical Question Answering cs.CL · 2026-04-08 · unverdicted · none · ref 29 · internal anchor
Dense retrieval plus query reformulation and reranking reaches 60.49% accuracy on MedQA USMLE, outperforming other setups while domain-specialized models make better use of the retrieved evidence.
A Case-Driven Multi-Agent Framework for E-Commerce Search Relevance cs.IR · 2026-05-07 · unverdicted · none · ref 24 · internal anchor
A case-driven multi-agent system automates the full pipeline of bad-case detection, annotation, and resolution for e-commerce search relevance using Annotator, Optimizer, and User agents plus supporting components.
Hypencoder Revisited: Reproducibility and Analysis of Non-Linear Scoring for First-Stage Retrieval cs.IR · 2026-04-29 · conditional · none · ref 42 · internal anchor
Reproducibility study confirms Hypencoder's non-linear query-specific scoring improves retrieval over bi-encoders on standard benchmarks but standard methods remain faster and hard-task results are mixed due to implementation issues.

Passage Re-ranking with BERT

hub tools

citation-role summary

citation-polarity summary

claims ledger

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer