Incorporating Query Term Independence Assumption for Efficient Retrieval and Ranking using Deep Neural Networks

Bhaskar Mitra; Corby Rosset; David Hawking; Emine Yilmaz; Fernando Diaz; Nick Craswell

arxiv: 1907.03693 · v1 · pith:4KGDU3R4new · submitted 2019-07-08 · 💻 cs.IR · cs.LG

Incorporating Query Term Independence Assumption for Efficient Retrieval and Ranking using Deep Neural Networks

Bhaskar Mitra , Corby Rosset , David Hawking , Nick Craswell , Fernando Diaz , Emine Yilmaz This is my paper

Pith reviewed 2026-05-25 00:55 UTC · model grok-4.3

classification 💻 cs.IR cs.LG

keywords neural information retrievalquery term independencepassage rankingBERTDuetCKNRMefficient retrievalprecomputation

0 comments

The pith

Applying query term independence to neural IR models enables offline precomputation of term-document scores with little loss in ranking quality.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper modifies three neural ranking models—BERT, Duet, and CKNRM—to score documents independently per query term and then sum those scores, following the approach used in classical methods like BM25. This change makes the models compatible with inverted indexes and offline precomputation, which classical models already exploit for fast retrieval. Experiments on a passage ranking task show no significant quality drop for Duet and CKNRM and only minor degradation for BERT. The result extends the efficiency advantages of term-independent scoring to deep models, allowing them to move beyond late-stage re-ranking into initial retrieval over large collections.

Core claim

Incorporating the query term independence assumption into BERT, Duet, and CKNRM allows each model to process query terms separately, compute term-document scores that can be precomputed offline, and accumulate those scores at query time. On the passage ranking task this produces no significant loss in result quality for Duet and CKNRM and only a small degradation for BERT, while making the otherwise expensive models amenable to the same data structures and precomputation strategies used by classical IR systems.

What carries the argument

Query term independence assumption applied to neural models, which decomposes whole-query scoring into per-term independent scoring followed by score accumulation.

If this is right

State-of-the-art neural ranking models become practical for retrieval from large collections rather than only re-ranking.
Offline precomputation of term-document scores becomes feasible for Duet, CKNRM, and similar models.
Inverted-index-style data structures can now store precomputed neural scores.
Query evaluation cost for deep models drops dramatically because per-term scores are looked up instead of recomputed.
Neural models can be combined with classical retrieval pipelines without sacrificing most of their effectiveness.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same independence modification could be tested on other whole-query neural models beyond the three evaluated.
Hybrid indexes mixing classical and neural term scores might emerge as a practical next step.
The approach may generalize to tasks other than passage ranking if the quality retention holds.
Precomputed neural scores could change how index compression and pruning are designed.

Load-bearing premise

The query term independence assumption does not materially degrade ranking quality for the modified neural models on the passage ranking task.

What would settle it

A head-to-head comparison on the same passage ranking task where the term-independent neural models show a statistically significant drop in NDCG or MAP relative to their full-query counterparts.

read the original abstract

Classical information retrieval (IR) methods, such as query likelihood and BM25, score documents independently w.r.t. each query term, and then accumulate the scores. Assuming query term independence allows precomputing term-document scores using these models---which can be combined with specialized data structures, such as inverted index, for efficient retrieval. Deep neural IR models, in contrast, compare the whole query to the document and are, therefore, typically employed only for late stage re-ranking. We incorporate query term independence assumption into three state-of-the-art neural IR models: BERT, Duet, and CKNRM---and evaluate their performance on a passage ranking task. Surprisingly, we observe no significant loss in result quality for Duet and CKNRM---and a small degradation in the case of BERT. However, by operating on each query term independently, these otherwise computationally intensive models become amenable to offline precomputation---dramatically reducing the cost of query evaluations employing state-of-the-art neural ranking models. This strategy makes it practical to use deep models for retrieval from large collections---and not restrict their usage to late stage re-ranking.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes incorporating the query term independence assumption into three neural IR models (BERT, Duet, CKNRM) so that per-term document scores can be precomputed offline. This enables efficient first-stage retrieval over large collections using inverted-index-style structures, in contrast to the typical late-stage re-ranking use of neural models. On a passage ranking task the authors report no significant quality loss for Duet and CKNRM and only a small degradation for BERT.

Significance. If the empirical result holds under a first-stage retrieval protocol and the precomputation overhead is quantified, the work would meaningfully extend the practical reach of neural rankers beyond re-ranking. The approach directly addresses the computational barrier that currently confines state-of-the-art neural models to second-stage use.

major comments (2)

[Evaluation] Evaluation section: the reported experiments are described only as a 'passage ranking task' without stating whether the neural models replace the initial retriever or are applied only to BM25 top-k candidates. Because the central claim concerns enabling retrieval from large collections, the absence of a high-recall first-stage protocol (or end-to-end latency on the full collection) is load-bearing for the significance of the result.
[Abstract/Evaluation] Abstract and Evaluation: no dataset name, collection size, metric definitions, or statistical significance tests are supplied to support the claim of 'no significant loss.' Without these details the reader cannot assess whether the observed differences are within the variance of the baseline.

minor comments (2)

[Method] Notation for the term-independent scoring functions is introduced without an explicit equation; adding a short formal definition would improve clarity.
[Related Work] The paper should cite prior work on term-independent neural scoring (e.g., early query-likelihood neural extensions) to situate the contribution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the evaluation protocol and missing details. We address each major comment below.

read point-by-point responses

Referee: [Evaluation] Evaluation section: the reported experiments are described only as a 'passage ranking task' without stating whether the neural models replace the initial retriever or are applied only to BM25 top-k candidates. Because the central claim concerns enabling retrieval from large collections, the absence of a high-recall first-stage protocol (or end-to-end latency on the full collection) is load-bearing for the significance of the result.

Authors: We agree the protocol requires explicit description. Our experiments measure ranking quality under the independence assumption on the standard passage ranking task (re-ranking BM25 top-1000 candidates using relevance judgments). This directly quantifies the quality impact of the modeling change that enables precomputation and first-stage use. We will revise the evaluation section to state this clearly. Full end-to-end first-stage retrieval and latency measurements on the entire collection would require additional indexing infrastructure and are left for future work, but the reported quality results remain relevant to the claim. revision: partial
Referee: [Abstract/Evaluation] Abstract and Evaluation: no dataset name, collection size, metric definitions, or statistical significance tests are supplied to support the claim of 'no significant loss.' Without these details the reader cannot assess whether the observed differences are within the variance of the baseline.

Authors: We acknowledge these details were omitted. The experiments use the MS MARCO passage ranking dataset (~8.8M passages). We report MRR@10 and NDCG@10 with paired t-tests for significance. We will update both the abstract and evaluation section to include the dataset name, collection size, metric definitions, and significance test results. revision: yes

Circularity Check

0 steps flagged

No circularity; purely empirical evaluation of modified models

full rationale

The paper contains no equations, derivations, or fitted parameters. It modifies three neural models (BERT, Duet, CKNRM) to score query terms independently, runs a standard passage ranking experiment (MS MARCO-style re-ranking), reports small or no quality loss for two models, and notes the resulting precomputation opportunity. All load-bearing steps are direct experimental measurements rather than any self-definition, renamed fit, or self-citation chain. The evaluation setup may or may not fully validate first-stage use, but that is a question of experimental validity, not circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on the classical query-term-independence assumption imported from BM25-style models; no new free parameters, invented entities, or additional axioms are introduced beyond that domain assumption.

axioms (1)

domain assumption Query terms can be scored independently without material loss of ranking quality for the neural models under test
This is the explicit modeling choice that enables precomputation and is the load-bearing premise for the efficiency claim.

pith-pipeline@v0.9.0 · 5744 in / 1161 out tokens · 16971 ms · 2026-05-25T00:55:02.794389+00:00 · methodology

Incorporating Query Term Independence Assumption for Efficient Retrieval and Ranking using Deep Neural Networks

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)