Hardness of Bichromatic Closest Pair with Jaccard Similarity
Pith reviewed 2026-05-25 08:44 UTC · model grok-4.3
The pith
Assuming the Orthogonal Vectors Conjecture, Bichromatic Closest Pair with Jaccard similarity requires near-quadratic time when j1 is at most j2 to a power close to 1.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Assuming the Orthogonal Vectors Conjecture, for any δ>0 there exists ε>0 such that Bichromatic Closest Pair with Jaccard similarity requires time Ω(n^{2-δ}) for any choice of thresholds j2 < j1 <1-δ that satisfy j1 ≤ j2^{1-ε}.
What carries the argument
A reduction from the Orthogonal Vectors problem to Bichromatic Closest Pair under Jaccard similarity that maps vectors to sets while preserving the distinction between orthogonal and non-orthogonal pairs as a gap in Jaccard values.
If this is right
- Locality-sensitive hashing based on MinHash cannot improve beyond polynomial approximation ratios in subquadratic time.
- Any subquadratic algorithm for the problem must either use a different technique or restrict to cases where j1/j2 grows polynomially with 1/j2.
- The known upper bounds from LSH are tight in the subpolynomial approximation regime under the Orthogonal Vectors Conjecture.
- Similar parameter restrictions on approximation ratios are likely to hold for other similarity measures reducible from orthogonal vectors.
Where Pith is reading between the lines
- Practical implementations of approximate set similarity search may need to accept quadratic time or switch to different similarity functions when thresholds are close.
- The reduction technique could be adapted to establish hardness for related problems such as monochromatic closest pair or other set similarities.
- If the conjecture is true, it motivates studying whether average-case or smoothed versions of the problem admit faster algorithms.
- One could look for small explicit instances where the predicted time lower bound already appears in practice.
Load-bearing premise
The Orthogonal Vectors Conjecture holds: no algorithm solves the orthogonal vectors problem in O(n^{2-ε}) time for some ε>0 on n binary vectors of dimension ω(log n).
What would settle it
An O(n^{2-δ}) algorithm for Bichromatic Closest Pair with Jaccard similarity on thresholds satisfying j1 ≤ j2^{1-ε} for small ε, or a subquadratic algorithm for the Orthogonal Vectors problem.
read the original abstract
Consider collections $\mathcal{A}$ and $\mathcal{B}$ of red and blue sets, respectively. Bichromatic Closest Pair is the problem of finding a pair from $\mathcal{A}\times \mathcal{B}$ that has similarity higher than a given threshold according to some similarity measure. Our focus here is the classic Jaccard similarity $|\textbf{a}\cap \textbf{b}|/|\textbf{a}\cup \textbf{b}|$ for $(\textbf{a},\textbf{b})\in \mathcal{A}\times \mathcal{B}$. We consider the approximate version of the problem where we are given thresholds $j_1>j_2$ and wish to return a pair from $\mathcal{A}\times \mathcal{B}$ that has Jaccard similarity higher than $j_2$ if there exists a pair in $\mathcal{A}\times \mathcal{B}$ with Jaccard similarity at least $j_1$. The classic locality sensitive hashing (LSH) algorithm of Indyk and Motwani (STOC '98), instantiated with the MinHash LSH function of Broder et al., solves this problem in $\tilde O(n^{2-\delta})$ time if $j_1\ge j_2^{1-\delta}$. In particular, for $\delta=\Omega(1)$, the approximation ratio $j_1/j_2=1/j_2^{\delta}$ increases polynomially in $1/j_2$. In this paper we give a corresponding hardness result. Assuming the Orthogonal Vectors Conjecture (OVC), we show that there cannot be a general solution that solves the Bichromatic Closest Pair problem in $O(n^{2-\Omega(1)})$ time for $j_1/j_2=1/j_2^{o(1)}$. Specifically, assuming OVC, we prove that for any $\delta>0$ there exists an $\varepsilon>0$ such that Bichromatic Closest Pair with Jaccard similarity requires time $\Omega(n^{2-\delta})$ for any choice of thresholds $j_2<j_1<1-\delta$, that satisfy $j_1\le j_2^{1-\varepsilon}$.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proves a conditional fine-grained hardness result for the Bichromatic Closest Pair problem under Jaccard similarity. Assuming the Orthogonal Vectors Conjecture, for any δ>0 there exists ε>0 such that no O(n^{2-δ})-time algorithm exists for the approximate BCP-Jaccard problem on thresholds satisfying j2 < j1 < 1-δ and j1 ≤ j2^{1-ε}. The argument is a direct, near-linear reduction from OV instances to red/blue set families that encodes the orthogonality condition into the Jaccard gap.
Significance. If correct, the result tightly complements the known LSH upper bound (subquadratic time when j1 ≥ j2^{1-δ}) by showing that the approximation ratio cannot be improved to 1/j2^{o(1)} while retaining subquadratic time, under OVC. It supplies a parameter-free derivation of the hardness threshold from an independent conjecture and gives a falsifiable prediction about the approximability of BCP-Jaccard.
minor comments (2)
- [§1] §1, paragraph 3: the sentence defining the approximate version could explicitly restate that j1 > j2 to avoid any momentary ambiguity with the later inequality j1 ≤ j2^{1-ε}.
- [§3] The reduction description in §3 would benefit from an explicit statement of the universe size and set-cardinality bounds used to ensure the mapping remains near-linear.
Simulated Author's Rebuttal
We thank the referee for their positive review and recommendation to accept the paper. The summary accurately captures the main result and its relation to the LSH upper bound.
Circularity Check
No circularity: hardness via external OVC reduction
full rationale
The paper establishes conditional hardness for Bichromatic Closest Pair under Jaccard similarity by a direct reduction from the Orthogonal Vectors problem. The Orthogonal Vectors Conjecture is an independent, externally stated conjecture in fine-grained complexity (not originating from these authors or their prior work). The reduction maps OV instances to red/blue set families while preserving the required Jaccard gap parameters, with no internal definitions, fitted parameters, or self-citations serving as load-bearing premises for the main claim. The derivation chain is therefore self-contained against an external benchmark and exhibits none of the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Orthogonal Vectors Conjecture (OVC)
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Assuming OVC, for any δ>0 there exists ε>0 such that BCP-Jaccard requires Ω(n^{2-δ}) when j1 ≤ j2^{1-ε} (Thm 2; reductions in §3-5)
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanembed_add unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Squaring: a'_ij = a_i · a_j; J after i squarings = |a∩b|^{2i} / (|a|^{2i}+|b|^{2i}-|a∩b|^{2i}) (Eq. 2, Lemma 7)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.