hub Canonical reference

Passage Re-ranking with BERT

Rodrigo Nogueira, Kyunghyun Cho · 2019 · cs.IR · arXiv 1901.04085

Canonical reference. 88% of citing Pith papers cite this work as background.

79 Pith papers citing it

Background 88% of classified citations

open full Pith review browse 79 citing papers arXiv PDF

abstract

Recently, neural models pretrained on a language modeling task, such as ELMo (Peters et al., 2017), OpenAI GPT (Radford et al., 2018), and BERT (Devlin et al., 2018), have achieved impressive results on various natural language processing tasks such as question-answering and natural language inference. In this paper, we describe a simple re-implementation of BERT for query-based passage re-ranking. Our system is the state of the art on the TREC-CAR dataset and the top entry in the leaderboard of the MS MARCO passage retrieval task, outperforming the previous state of the art by 27% (relative) in MRR@10. The code to reproduce our results is available at https://github.com/nyu-dl/dl4marco-bert

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 15 baseline 1

citation-polarity summary

background 14 baseline 1 support 1

claims ledger

abstract Recently, neural models pretrained on a language modeling task, such as ELMo (Peters et al., 2017), OpenAI GPT (Radford et al., 2018), and BERT (Devlin et al., 2018), have achieved impressive results on various natural language processing tasks such as question-answering and natural language inference. In this paper, we describe a simple re-implementation of BERT for query-based passage re-ranking. Our system is the state of the art on the TREC-CAR dataset and the top entry in the leaderboard of the MS MARCO passage retrieval task, outperforming the previous state of the art by 27% (relative)

co-cited works

representative citing papers

Benchmarking LLM Agents on Meta-Analysis Articles from Nature Portfolio

cs.CL · 2026-06-15 · unverdicted · novelty 8.0

MetaSyn benchmark shows LLM pipelines recover at most 52.7% of ground-truth included studies due to screening failures on PI/ECO eligibility, despite 90.9% retrieval recall at K=200.

From Regulatory Approvals to Patents: Cross-Domain Linking for Cardiovascular Device Traceability

cs.IR · 2026-06-06 · unverdicted · novelty 8.0

A benchmark and ontology-driven framework links 434 cardiovascular devices to patents at 91.6% recall, producing 6.8M high-confidence links for regulatory-IP integration.

Hackers or Hallucinators? A Comprehensive Analysis of LLM-Based Automated Penetration Testing

cs.CR · 2026-04-07 · unverdicted · novelty 8.0

The first SoK on LLM-based AutoPT frameworks provides a six-dimension taxonomy of agent designs and a unified empirical benchmark evaluating 15 frameworks via over 10 billion tokens and 1,500 manually reviewed logs.

Learning to Unscramble Feynman Loop Integrals with SAILIR

hep-ph · 2026-04-06 · unverdicted · novelty 8.0

A self-supervised transformer learns to unscramble Feynman integrals for online IBP reduction, delivering bounded memory use on complex two-loop topologies while matching Kira's speed on the hardest cases tested.

BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models

cs.IR · 2021-04-17 · accept · novelty 8.0

BEIR is a heterogeneous zero-shot benchmark showing BM25 as a robust baseline while re-ranking and late-interaction models perform best on average at higher cost, with dense and sparse models lagging in generalization.

Dense Passage Retrieval for Open-Domain Question Answering

cs.CL · 2020-04-10 · accept · novelty 8.0

Dense dual-encoder retrievers outperform BM25 by 9-19% absolute in top-20 passage retrieval accuracy across open-domain QA datasets and enable new state-of-the-art end-to-end QA results.

Fast LLM-Based Semantic Filtering: From a Unified Framework to an Adaptive Two-Phase Method

cs.DB · 2026-06-06 · unverdicted · novelty 7.0

An adaptive two-phase semantic filter using clustering then a hybrid proxy trained on LLM confidence achieves 1.6-2.0x speedup over prior methods at 90% accuracy on 10K document corpora.

Re-Ranking Through an Attribution Lens for Citation Quality in Legal QA

cs.CL · 2026-06-02 · unverdicted · novelty 7.0

Re-ranking retrieval candidates via a cross-encoder trained on continuous perturbation-based attribution scores improves citation faithfulness and gold-answer alignment in legal QA over semantic similarity.

Test-Time Training for Zero-Resource Dense Retrieval Reranking

cs.IR · 2026-05-31 · unverdicted · novelty 7.0

DART adapts a scoring matrix at inference time via gradient updates on pseudo-labels from top/bottom documents to gain +2.1% mean NDCG@10 on six BEIR benchmarks with under 10ms added latency.

SilentRetrieval: Hijacking Retrieval-Augmented Generation via Semantically-Preserving Adversarial Data Poisoning

cs.CR · 2026-05-27 · unverdicted · novelty 7.0

SilentRetrieval is a data poisoning attack achieving 84.6% HR@10 and 57.5% ASR-LLM on Natural Questions via coordinated beam search and trigger fusion while preserving document fluency.

Layer-wise Token Compression for Efficient Document Reranking

cs.IR · 2026-05-20 · unverdicted · novelty 7.0 · 2 refs

Layer-wise Token Compression applies adaptive token pooling at middle transformer layers for cross-encoder rerankers, preserving MS MARCO ranking quality while raising QPS up to 25% on passages and 116% on documents, with added gains on listwise LLM rerankers and a regularizer effect for long inputs

Very Efficient Listwise Multimodal Reranking for Long Documents

cs.IR · 2026-05-12 · unverdicted · novelty 7.0

ZipRerank delivers state-of-the-art multimodal listwise reranking accuracy for long documents at up to 10x lower latency via early interaction and single-pass scoring.

Hypothesis-Driven Deep Research with Large Language Models: A Structured Methodology for Automated Knowledge Discovery

cs.AI · 2026-05-11 · unverdicted · novelty 7.0

HDRI is a six-principle eight-stage framework for hypothesis-organized LLM research featuring gap-driven iteration, traceable fact reasoning, and subject locking, realized in INFOMINER with reported gains in fact density and completeness.

Prism-Reranker: Beyond Relevance Scoring -- Jointly Producing Contributions and Evidence for Agentic Retrieval

cs.IR · 2026-04-26 · accept · novelty 7.0

Prism-Reranker models output relevance, contribution statements, and evidence passages to support agentic retrieval beyond scalar scoring.

Bayesian Active Learning with Gaussian Processes Guided by LLM Relevance Scoring for Dense Passage Retrieval

cs.IR · 2026-04-20 · unverdicted · novelty 7.0

BAGEL is a Bayesian active learning framework that uses Gaussian Processes to propagate LLM relevance signals across embedding space and guide global exploration, outperforming standard LLM reranking under identical budgets on four retrieval benchmarks.

KIRA: Knowledge-Intensive Image Retrieval and Reasoning Architecture for Specialized Visual Domains

cs.CV · 2026-04-18 · unverdicted · novelty 7.0

KIRA is a unified architecture for visual RAG that reports 0.97 retrieval precision, 1.0 grounding, and 0.707 domain correctness across medical, circuit, satellite, and histopathology domains via hierarchical chunking, dual-path retrieval, and evidence-conditioned generation.

Scaling Laws for Cross-Encoder Reranking

cs.IR · 2026-03-05 · unverdicted · novelty 7.0

Cross-encoder reranker performance scales predictably via power laws with model size and training exposure, allowing accurate forecasts for 400M and 1B models and data-heavy compute allocation.

SPIRE: Structure-Preserving Interpretable Retrieval of Evidence

cs.IR · 2026-02-12 · unverdicted · novelty 7.0

SPIRE presents a tree-structured retrieval method using subdocuments, paths, and dual contextualization that produces higher-quality and more diverse citations than passage-based baselines on HTML QA benchmarks.

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

cs.CL · 2020-05-22 · accept · novelty 7.0

RAG models set new state-of-the-art results on open-domain QA by retrieving Wikipedia passages and conditioning a generative model on them, while also producing more factual text than parametric baselines.

Relevance Is Not Permission: Warranted Attention for Value Contributions

cs.AI · 2026-06-29 · unverdicted · novelty 6.0 · 2 refs

Warrant adds a query-item permission gate g_ij to attention value terms, improving primary metrics in 27 of 32 comparisons across CTDG, MTPP, RAG, STPP, and TKG tasks.

AB-RAG: Adaptive Budgeted Retrieval-Augmented Generation for Reliable Question Answering

cs.CL · 2026-06-27 · unverdicted · novelty 6.0

AB-RAG adaptively budgets retrieval in RAG by combining three confidence signals to decide when to stop or fetch more evidence, separating correct from incorrect answers at 57.6% vs 0% exact match on a factoid dataset.

Multi-Agent Routing as Set-Valued Prediction: A WildChat Benchmark and Cost-Aware Evaluation

cs.LG · 2026-06-27 · unverdicted · novelty 6.0

Presents a WildChat-derived benchmark for multi-agent routing as set-valued prediction and reports that supervised methods outperform nearest-neighbor and zero-shot LLM baselines in both unconstrained accuracy and constrained cost settings.

HistoRAG: Embedding Historical Methodology in Retrieval-Augmented Generation Through Critical Technical Practice

cs.CL · 2026-06-16 · unverdicted · novelty 6.0

HistoRAG embeds historiographical principles into RAG via temporal windowing, decoupled retrieval, and contestable LLM relevance judgments, evaluated on 102k Der Spiegel articles from 1950-1979.

Membrane: A Self-Evolving Contrastive Safety Memory for LLM Agent Defense

cs.CR · 2026-06-04 · unverdicted · novelty 6.0

A contrastive memory system evolves without retraining to defend LLM agents against jailbreaks, achieving top F1 scores and low benign refusal on HarmBench and AgentHarm benchmarks.

citing papers explorer

Showing 15 of 15 citing papers after filters.

Benchmarking LLM Agents on Meta-Analysis Articles from Nature Portfolio cs.CL · 2026-06-15 · unverdicted · none · ref 28 · internal anchor
MetaSyn benchmark shows LLM pipelines recover at most 52.7% of ground-truth included studies due to screening failures on PI/ECO eligibility, despite 90.9% retrieval recall at K=200.
Re-Ranking Through an Attribution Lens for Citation Quality in Legal QA cs.CL · 2026-06-02 · unverdicted · none · ref 8 · internal anchor
Re-ranking retrieval candidates via a cross-encoder trained on continuous perturbation-based attribution scores improves citation faithfulness and gold-answer alignment in legal QA over semantic similarity.
AB-RAG: Adaptive Budgeted Retrieval-Augmented Generation for Reliable Question Answering cs.CL · 2026-06-27 · unverdicted · none · ref 37 · internal anchor
AB-RAG adaptively budgets retrieval in RAG by combining three confidence signals to decide when to stop or fetch more evidence, separating correct from incorrect answers at 57.6% vs 0% exact match on a factoid dataset.
HistoRAG: Embedding Historical Methodology in Retrieval-Augmented Generation Through Critical Technical Practice cs.CL · 2026-06-16 · unverdicted · none · ref 28 · internal anchor
HistoRAG embeds historiographical principles into RAG via temporal windowing, decoupled retrieval, and contestable LLM relevance judgments, evaluated on 102k Der Spiegel articles from 1950-1979.
Derivation Prompting: A Logic-Based Method for Improving Retrieval-Augmented Generation cs.CL · 2026-05-13 · unverdicted · none · ref 9 · internal anchor
Derivation Prompting constructs logic-based derivation trees in RAG generation to improve interpretability and reduce unacceptable answers compared to standard RAG or long-context methods in a case study.
PRISM: Pareto-Efficient Retrieval over Intent-Aware Structured Memory for Long-Horizon Agents cs.CL · 2026-05-12 · unverdicted · none · ref 15 · 2 links · internal anchor
PRISM is a new inference-time retrieval system that achieves higher accuracy than baselines on long-horizon agent tasks while using an order of magnitude less context by combining hierarchical graph search, intent-based costing, compression, and adaptive routing over structured memory.
Verbal-R3: Verbal Reranker as the Missing Bridge between Retrieval and Reasoning cs.CL · 2026-05-02 · unverdicted · none · ref 38 · internal anchor
Verbal-R3 uses a verbal reranker to generate analytic narratives that guide retrieval and reasoning in LLMs, achieving SOTA results on complex QA benchmarks.
Atlas: Few-shot Learning with Retrieval Augmented Language Models cs.CL · 2022-08-05 · unverdicted · none · ref 94 · internal anchor
Atlas reaches over 42% accuracy on Natural Questions with only 64 examples, outperforming a 540B-parameter model by 3% with 50x fewer parameters.
When Does Complexity Conditioning Help a Frozen Sentence Embedding? A Controlled Study of Per-Sentence and Pair-Level Difficulty Adaptation cs.CL · 2026-06-02 · unverdicted · none · ref 9 · internal anchor
Controlled experiments on frozen embeddings show per-sentence difficulty adaptation fails to help while pair-level gating by a held-out signal gives modest gains on larger tasks.
Source-Grounded Semantic Reinforcement Learning for Low-Resource Target-Language Generation cs.CL · 2026-05-28 · unverdicted · none · ref 2 · internal anchor
SG-SRL applies cross-lingual semantic RL on source monolingual data plus a recovery stage to improve semantic grounding over standard SFT in low-resource target-language generation.
PennySynth: RAG-Driven Data Synthesis for Automated Quantum Code Generation cs.CL · 2026-05-25 · unverdicted · none · ref 15 · internal anchor
PennySynth raises pass@5 success on QHack quantum coding challenges by 25-28 points over a base LLM by retrieving from a curated PennyLane dataset using code-aware embeddings.
AtomMem: Building Simple and Effective Memory System for LLM Agents via Atomic Facts cs.CL · 2026-06-18 · unverdicted · none · ref 15 · internal anchor
AtomMem introduces atomic-fact extraction, hierarchical event structures, and an associative memory graph to build stable long-term memory for LLM agents, claiming SOTA results on the LoCoMo benchmark.
An End-to-End Ukrainian RAG for Local Deployment. Optimized Hybrid Search and Lightweight Generation cs.CL · 2026-04-23 · unverdicted · none · ref 1 · internal anchor
A two-stage hybrid search pipeline paired with a synthetic-data fine-tuned and compressed Ukrainian language model delivers competitive local question answering under strict compute limits.
Peerispect: Claim Verification in Scientific Peer Reviews cs.CL · 2026-04-19 · unverdicted · none · ref 16 · internal anchor
Peerispect extracts claims from peer reviews, retrieves evidence from the manuscript, and verifies them via NLI in a modular pipeline with a visual interface.
A Systematic Study of Retrieval Pipeline Design for Retrieval-Augmented Medical Question Answering cs.CL · 2026-04-08 · unverdicted · none · ref 29 · internal anchor
Dense retrieval plus query reformulation and reranking reaches 60.49% accuracy on MedQA USMLE, outperforming other setups while domain-specialized models make better use of the retrieved evidence.

Passage Re-ranking with BERT

hub tools

citation-role summary

citation-polarity summary

claims ledger

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer