On Approximate Range Mode and Range Selection
Pith reviewed 2026-05-24 18:46 UTC · model grok-4.3
The pith
An O(n/ε)-bit encoding answers (1+ε)-approximate range mode queries in O(log(1/ε)) time and is asymptotically optimal for constant ε.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors design an O(n/ε)-bit encoding data structure for (1+ε)-approximate range mode queries that supports queries in O(lg(1/ε)) time without storing the input sequence and prove the space bound asymptotically optimal for constant ε. They also present an O(n)-word dynamic structure with O(lg n / lg lg n) query time and O(lg n) update time. Separately, for constant α they give an O(n)-bit encoding that answers α-approximate range selection queries in constant time and prove optimality.
What carries the argument
Encoding data structure that stores only summary information sufficient to answer approximate frequency and rank queries without access to the full input sequence.
If this is right
- The O(n/ε) bit structure improves the previous best space by a factor of lg n while preserving the same query time.
- The dynamic structure is the first result for dynamic approximate range mode and yields the first static structure for approximate 3-sided range mode queries in two dimensions.
- The O(n)-bit structure for range selection improves space from O(n lg n) bits and supports arbitrary k given at query time rather than only the median.
- Both results are asymptotically optimal under the information-theoretic arguments when the approximation factors are constants.
Where Pith is reading between the lines
- The logarithmic space savings suggest that approximation can circumvent typical logarithmic factors in exact range query lower bounds.
- The same encoding approach may extend to other approximate range aggregates such as approximate range sums or order statistics beyond mode and selection.
- For very large n the constant-factor space reduction could allow indexing larger datasets entirely in memory compared with previous O(n lg n) solutions.
Load-bearing premise
The information-theoretic lower bounds establishing asymptotic optimality of the O(n/ε) and O(n) bit spaces hold when ε and α are treated as fixed constants independent of n.
What would settle it
A construction of a data structure using o(n/ε) bits that correctly answers every (1+ε)-approximate range mode query on all sequences of length n, for some fixed constant ε, would falsify the optimality claim.
read the original abstract
For any $\epsilon \in (0,1)$, a $(1+\epsilon)$-approximate range mode query asks for the position of an element whose frequency in the query range is at most a factor $(1+\epsilon)$ smaller than the true mode. For this problem, we design an $O(n/\epsilon)$ bit data structure supporting queries in $O(\lg(1/\epsilon))$ time. This is an encoding data structure which does not require access to the input sequence; we prove the space cost is asymptotically optimal for constant $\epsilon$. Our solution improves the previous best result of Greve et al. (Cell Probe Lower Bounds and Approximations for Range Mode, ICALP'10) by reducing the space cost by a factor of $\lg n$ while achieving the same query time. We also design an $O(n)$-word dynamic data structure that answers queries in $O(\lg n /\lg\lg n)$ time and supports insertions and deletions in $O(\lg n)$ time, for any constant $\epsilon \in (0,1)$. This is the first result on dynamic approximate range mode; it can also be used to obtain the first static data structure for approximate 3-sided range mode queries in two dimensions. We also consider approximate range selection. For any $\alpha \in (0,1/2)$, an $\alpha$-approximate range selection query asks for the position of an element whose rank in the query range is in $[k - \alpha s, k + \alpha s]$, where $k$ is a rank given by the query and $s$ is the size of the query range. When $\alpha$ is a constant, we design an $O(n)$-bit encoding data structure that can answer queries in constant time and prove this space cost is asymptotically optimal. The previous best result by Krizanc et al. (Range Mode and Range Median Queries on Lists and Trees, Nordic Journal of Computing, 2005) uses $O(n\lg n)$ bits, or $O(n)$ words, to achieve constant approximation for range median only. Thus we not only improve the space cost, but also provide support for any arbitrary $k$ given at query time.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims an O(n/ε)-bit static encoding data structure for (1+ε)-approximate range mode queries with O(lg(1/ε)) query time that is asymptotically optimal for constant ε, improving prior space by a lg n factor. It also presents an O(n)-word dynamic data structure for the same queries with O(lg n / lg lg n) query time and O(lg n) update time. For α-approximate range selection with constant α, it claims an O(n)-bit encoding with constant query time that is asymptotically optimal, improving prior space and supporting arbitrary k.
Significance. If the results hold, they are significant for providing the first asymptotically optimal space bounds for these approximate query problems and the first dynamic structure for approximate range mode. The optimality is a notable strength, as is the application to 2D 3-sided range mode queries. The lower-bound arguments appear sound and correctly handle the approximation factors, so the stress-test concern does not land.
minor comments (2)
- The notation 'lg' for base-2 logarithm is used without explicit definition in the abstract and early sections; a brief clarification or consistent use of 'log' would aid readability.
- The description of the range selection rank interval [k - α s, k + α s] in the abstract would benefit from a short concrete example to illustrate the approximation for readers unfamiliar with the problem.
Simulated Author's Rebuttal
We thank the referee for the positive review and recommendation of minor revision. The referee's summary accurately captures our main results on space-optimal encodings for approximate range mode and range selection, as well as the first dynamic structure for approximate range mode. We are pleased that the significance of the optimality results and the soundness of the lower-bound arguments are recognized.
Circularity Check
No circularity: upper-bound constructions and separate information-theoretic lower bounds are independent of each other and of prior self-citations.
full rationale
The paper's central claims consist of explicit new data-structure constructions (O(n/ε)-bit static encoding, O(n)-word dynamic structure, O(n)-bit selection encoding) whose space and time bounds are derived from algorithmic design, plus independent information-theoretic lower-bound arguments establishing asymptotic optimality for constant ε and α. These lower bounds are presented as fresh reductions (not re-derivations of fitted parameters or self-citations). No equation or claim reduces by construction to its own inputs; prior citations (Greve et al., Krizanc et al.) are to external results and are not load-bearing for the new optimality statements. The derivation chain is therefore self-contained against external benchmarks.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.