pith. sign in

arxiv: 1907.02251 · v1 · pith:4PG42TECnew · submitted 2019-07-04 · 💻 cs.CC

Hardness of Bichromatic Closest Pair with Jaccard Similarity

Pith reviewed 2026-05-25 08:44 UTC · model grok-4.3

classification 💻 cs.CC
keywords Bichromatic Closest PairJaccard similarityOrthogonal Vectors Conjectureconditional hardnesslocality sensitive hashingapproximate similarity searchfine-grained complexity
0
0 comments X

The pith

Assuming the Orthogonal Vectors Conjecture, Bichromatic Closest Pair with Jaccard similarity requires near-quadratic time when j1 is at most j2 to a power close to 1.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proves a conditional lower bound for the approximate bichromatic closest pair problem under Jaccard similarity. It shows that, assuming the Orthogonal Vectors Conjecture, no algorithm running in O(n to any power strictly less than 2) can solve the problem for thresholds j2 less than j1 less than 1 minus a constant, whenever j1 is at most j2 raised to a power 1 minus a small positive constant. This rules out subquadratic solutions in the regime where the approximation factor between the two thresholds is subpolynomial in 1 over j2. A reader would care because known locality-sensitive hashing methods achieve subquadratic time only when the approximation ratio grows polynomially, so the result identifies a concrete barrier for better ratios. The argument proceeds by reduction from the orthogonal vectors problem, mapping binary vectors to sets so that orthogonality corresponds to low Jaccard similarity.

Core claim

Assuming the Orthogonal Vectors Conjecture, for any δ>0 there exists ε>0 such that Bichromatic Closest Pair with Jaccard similarity requires time Ω(n^{2-δ}) for any choice of thresholds j2 < j1 <1-δ that satisfy j1 ≤ j2^{1-ε}.

What carries the argument

A reduction from the Orthogonal Vectors problem to Bichromatic Closest Pair under Jaccard similarity that maps vectors to sets while preserving the distinction between orthogonal and non-orthogonal pairs as a gap in Jaccard values.

If this is right

  • Locality-sensitive hashing based on MinHash cannot improve beyond polynomial approximation ratios in subquadratic time.
  • Any subquadratic algorithm for the problem must either use a different technique or restrict to cases where j1/j2 grows polynomially with 1/j2.
  • The known upper bounds from LSH are tight in the subpolynomial approximation regime under the Orthogonal Vectors Conjecture.
  • Similar parameter restrictions on approximation ratios are likely to hold for other similarity measures reducible from orthogonal vectors.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Practical implementations of approximate set similarity search may need to accept quadratic time or switch to different similarity functions when thresholds are close.
  • The reduction technique could be adapted to establish hardness for related problems such as monochromatic closest pair or other set similarities.
  • If the conjecture is true, it motivates studying whether average-case or smoothed versions of the problem admit faster algorithms.
  • One could look for small explicit instances where the predicted time lower bound already appears in practice.

Load-bearing premise

The Orthogonal Vectors Conjecture holds: no algorithm solves the orthogonal vectors problem in O(n^{2-ε}) time for some ε>0 on n binary vectors of dimension ω(log n).

What would settle it

An O(n^{2-δ}) algorithm for Bichromatic Closest Pair with Jaccard similarity on thresholds satisfying j1 ≤ j2^{1-ε} for small ε, or a subquadratic algorithm for the Orthogonal Vectors problem.

read the original abstract

Consider collections $\mathcal{A}$ and $\mathcal{B}$ of red and blue sets, respectively. Bichromatic Closest Pair is the problem of finding a pair from $\mathcal{A}\times \mathcal{B}$ that has similarity higher than a given threshold according to some similarity measure. Our focus here is the classic Jaccard similarity $|\textbf{a}\cap \textbf{b}|/|\textbf{a}\cup \textbf{b}|$ for $(\textbf{a},\textbf{b})\in \mathcal{A}\times \mathcal{B}$. We consider the approximate version of the problem where we are given thresholds $j_1>j_2$ and wish to return a pair from $\mathcal{A}\times \mathcal{B}$ that has Jaccard similarity higher than $j_2$ if there exists a pair in $\mathcal{A}\times \mathcal{B}$ with Jaccard similarity at least $j_1$. The classic locality sensitive hashing (LSH) algorithm of Indyk and Motwani (STOC '98), instantiated with the MinHash LSH function of Broder et al., solves this problem in $\tilde O(n^{2-\delta})$ time if $j_1\ge j_2^{1-\delta}$. In particular, for $\delta=\Omega(1)$, the approximation ratio $j_1/j_2=1/j_2^{\delta}$ increases polynomially in $1/j_2$. In this paper we give a corresponding hardness result. Assuming the Orthogonal Vectors Conjecture (OVC), we show that there cannot be a general solution that solves the Bichromatic Closest Pair problem in $O(n^{2-\Omega(1)})$ time for $j_1/j_2=1/j_2^{o(1)}$. Specifically, assuming OVC, we prove that for any $\delta>0$ there exists an $\varepsilon>0$ such that Bichromatic Closest Pair with Jaccard similarity requires time $\Omega(n^{2-\delta})$ for any choice of thresholds $j_2<j_1<1-\delta$, that satisfy $j_1\le j_2^{1-\varepsilon}$.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper proves a conditional fine-grained hardness result for the Bichromatic Closest Pair problem under Jaccard similarity. Assuming the Orthogonal Vectors Conjecture, for any δ>0 there exists ε>0 such that no O(n^{2-δ})-time algorithm exists for the approximate BCP-Jaccard problem on thresholds satisfying j2 < j1 < 1-δ and j1 ≤ j2^{1-ε}. The argument is a direct, near-linear reduction from OV instances to red/blue set families that encodes the orthogonality condition into the Jaccard gap.

Significance. If correct, the result tightly complements the known LSH upper bound (subquadratic time when j1 ≥ j2^{1-δ}) by showing that the approximation ratio cannot be improved to 1/j2^{o(1)} while retaining subquadratic time, under OVC. It supplies a parameter-free derivation of the hardness threshold from an independent conjecture and gives a falsifiable prediction about the approximability of BCP-Jaccard.

minor comments (2)
  1. [§1] §1, paragraph 3: the sentence defining the approximate version could explicitly restate that j1 > j2 to avoid any momentary ambiguity with the later inequality j1 ≤ j2^{1-ε}.
  2. [§3] The reduction description in §3 would benefit from an explicit statement of the universe size and set-cardinality bounds used to ensure the mapping remains near-linear.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive review and recommendation to accept the paper. The summary accurately captures the main result and its relation to the LSH upper bound.

Circularity Check

0 steps flagged

No circularity: hardness via external OVC reduction

full rationale

The paper establishes conditional hardness for Bichromatic Closest Pair under Jaccard similarity by a direct reduction from the Orthogonal Vectors problem. The Orthogonal Vectors Conjecture is an independent, externally stated conjecture in fine-grained complexity (not originating from these authors or their prior work). The reduction maps OV instances to red/blue set families while preserving the required Jaccard gap parameters, with no internal definitions, fitted parameters, or self-citations serving as load-bearing premises for the main claim. The derivation chain is therefore self-contained against an external benchmark and exhibits none of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The result rests on the single external assumption of the Orthogonal Vectors Conjecture; no free parameters, no new entities, and no additional axioms are introduced.

axioms (1)
  • domain assumption Orthogonal Vectors Conjecture (OVC)
    Invoked to obtain the Ω(n^{2-δ}) lower bound via reduction.

pith-pipeline@v0.9.0 · 5936 in / 1207 out tokens · 23829 ms · 2026-05-25T08:44:11.820004+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.