pith. sign in

arxiv: 1907.08579 · v1 · pith:CPVFO3PVnew · submitted 2019-07-19 · 💻 cs.DS

On Approximate Range Mode and Range Selection

Pith reviewed 2026-05-24 18:46 UTC · model grok-4.3

classification 💻 cs.DS
keywords range moderange selectionapproximate queriesdata encodingspace lower boundsdynamic data structuresrange queries
0
0 comments X

The pith

An O(n/ε)-bit encoding answers (1+ε)-approximate range mode queries in O(log(1/ε)) time and is asymptotically optimal for constant ε.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops space-efficient encodings for two approximate range query problems on sequences. For range mode, an encoding of size O(n/ε) bits returns a position whose frequency is at most a (1+ε) factor below the true mode frequency, with query time O(log(1/ε)). This improves prior space by a logarithmic factor while matching an information-theoretic lower bound when ε is constant. For range selection, an O(n)-bit encoding returns an element whose rank lies in an α-interval around any requested rank k, in constant time when α is constant, again matching a lower bound and improving prior space usage.

Core claim

The authors design an O(n/ε)-bit encoding data structure for (1+ε)-approximate range mode queries that supports queries in O(lg(1/ε)) time without storing the input sequence and prove the space bound asymptotically optimal for constant ε. They also present an O(n)-word dynamic structure with O(lg n / lg lg n) query time and O(lg n) update time. Separately, for constant α they give an O(n)-bit encoding that answers α-approximate range selection queries in constant time and prove optimality.

What carries the argument

Encoding data structure that stores only summary information sufficient to answer approximate frequency and rank queries without access to the full input sequence.

If this is right

  • The O(n/ε) bit structure improves the previous best space by a factor of lg n while preserving the same query time.
  • The dynamic structure is the first result for dynamic approximate range mode and yields the first static structure for approximate 3-sided range mode queries in two dimensions.
  • The O(n)-bit structure for range selection improves space from O(n lg n) bits and supports arbitrary k given at query time rather than only the median.
  • Both results are asymptotically optimal under the information-theoretic arguments when the approximation factors are constants.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The logarithmic space savings suggest that approximation can circumvent typical logarithmic factors in exact range query lower bounds.
  • The same encoding approach may extend to other approximate range aggregates such as approximate range sums or order statistics beyond mode and selection.
  • For very large n the constant-factor space reduction could allow indexing larger datasets entirely in memory compared with previous O(n lg n) solutions.

Load-bearing premise

The information-theoretic lower bounds establishing asymptotic optimality of the O(n/ε) and O(n) bit spaces hold when ε and α are treated as fixed constants independent of n.

What would settle it

A construction of a data structure using o(n/ε) bits that correctly answers every (1+ε)-approximate range mode query on all sequences of length n, for some fixed constant ε, would falsify the optimality claim.

read the original abstract

For any $\epsilon \in (0,1)$, a $(1+\epsilon)$-approximate range mode query asks for the position of an element whose frequency in the query range is at most a factor $(1+\epsilon)$ smaller than the true mode. For this problem, we design an $O(n/\epsilon)$ bit data structure supporting queries in $O(\lg(1/\epsilon))$ time. This is an encoding data structure which does not require access to the input sequence; we prove the space cost is asymptotically optimal for constant $\epsilon$. Our solution improves the previous best result of Greve et al. (Cell Probe Lower Bounds and Approximations for Range Mode, ICALP'10) by reducing the space cost by a factor of $\lg n$ while achieving the same query time. We also design an $O(n)$-word dynamic data structure that answers queries in $O(\lg n /\lg\lg n)$ time and supports insertions and deletions in $O(\lg n)$ time, for any constant $\epsilon \in (0,1)$. This is the first result on dynamic approximate range mode; it can also be used to obtain the first static data structure for approximate 3-sided range mode queries in two dimensions. We also consider approximate range selection. For any $\alpha \in (0,1/2)$, an $\alpha$-approximate range selection query asks for the position of an element whose rank in the query range is in $[k - \alpha s, k + \alpha s]$, where $k$ is a rank given by the query and $s$ is the size of the query range. When $\alpha$ is a constant, we design an $O(n)$-bit encoding data structure that can answer queries in constant time and prove this space cost is asymptotically optimal. The previous best result by Krizanc et al. (Range Mode and Range Median Queries on Lists and Trees, Nordic Journal of Computing, 2005) uses $O(n\lg n)$ bits, or $O(n)$ words, to achieve constant approximation for range median only. Thus we not only improve the space cost, but also provide support for any arbitrary $k$ given at query time.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper claims an O(n/ε)-bit static encoding data structure for (1+ε)-approximate range mode queries with O(lg(1/ε)) query time that is asymptotically optimal for constant ε, improving prior space by a lg n factor. It also presents an O(n)-word dynamic data structure for the same queries with O(lg n / lg lg n) query time and O(lg n) update time. For α-approximate range selection with constant α, it claims an O(n)-bit encoding with constant query time that is asymptotically optimal, improving prior space and supporting arbitrary k.

Significance. If the results hold, they are significant for providing the first asymptotically optimal space bounds for these approximate query problems and the first dynamic structure for approximate range mode. The optimality is a notable strength, as is the application to 2D 3-sided range mode queries. The lower-bound arguments appear sound and correctly handle the approximation factors, so the stress-test concern does not land.

minor comments (2)
  1. The notation 'lg' for base-2 logarithm is used without explicit definition in the abstract and early sections; a brief clarification or consistent use of 'log' would aid readability.
  2. The description of the range selection rank interval [k - α s, k + α s] in the abstract would benefit from a short concrete example to illustrate the approximation for readers unfamiliar with the problem.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive review and recommendation of minor revision. The referee's summary accurately captures our main results on space-optimal encodings for approximate range mode and range selection, as well as the first dynamic structure for approximate range mode. We are pleased that the significance of the optimality results and the soundness of the lower-bound arguments are recognized.

Circularity Check

0 steps flagged

No circularity: upper-bound constructions and separate information-theoretic lower bounds are independent of each other and of prior self-citations.

full rationale

The paper's central claims consist of explicit new data-structure constructions (O(n/ε)-bit static encoding, O(n)-word dynamic structure, O(n)-bit selection encoding) whose space and time bounds are derived from algorithmic design, plus independent information-theoretic lower-bound arguments establishing asymptotic optimality for constant ε and α. These lower bounds are presented as fresh reductions (not re-derivations of fitted parameters or self-citations). No equation or claim reduces by construction to its own inputs; prior citations (Greve et al., Krizanc et al.) are to external results and are not load-bearing for the new optimality statements. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Review based solely on abstract; no free parameters, axioms, or invented entities are described in the provided text.

pith-pipeline@v0.9.0 · 5959 in / 1173 out tokens · 26594 ms · 2026-05-24T18:46:05.704345+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.