Sketched MinDist
Pith reviewed 2026-05-25 02:43 UTC · model grok-4.3
The pith
Sketched minimum distances to O(d/ε²) points preserve relative distances between hyperplanes.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Collecting the vector of minimum distances to points in Q induces the Euclidean distance on these vectors as a proxy for the true distance between objects J. For hyperplanes this proxy has relative error with |Q| = O(d/ε²). For other shapes a min distance ρ and domain L are enforced, yielding |Q| = Õ((L/ρ) 1/ε²) in d=2. For k-piece trajectories the bound is Õ((L/ρ) k³ / ε²) for for-all approximations, and exact reconstruction is possible with similar |Q|.
What carries the argument
The minDist sketch vector v(J) where each coordinate is the infimum distance from points in J to a query point q_i in Q; the Euclidean norm of the difference between two such vectors.
If this is right
- Relative error preservation for hyperplane distances using linear in d sample size.
- Sample size linear in the aspect ratio L/ρ for planar shapes.
- Polynomial dependence on number of pieces k for trajectory approximations.
- Exact recovery of k-piece trajectories from the sketch vectors.
Where Pith is reading between the lines
- These sketches might enable sublinear time algorithms for geometric queries by reducing to vector operations.
- The approach could extend to higher dimensions if the L/ρ term is handled differently.
- Exact reconstruction implies the sketches capture all information needed for certain discrete objects.
- Applications in trajectory analysis where storage is limited.
Load-bearing premise
The sample size bounds for non-hyperplane shapes require assuming a positive minimum distance ρ between objects and a bounded domain of size L.
What would settle it
A counterexample where the relative error exceeds ε when using fewer than O(d/ε²) points for some set of hyperplanes, or failure of exact reconstruction for trajectories when |Q| is below the stated bound.
Figures
read the original abstract
We consider sketch vectors of geometric objects $J$ through the \mindist function \[ v_i(J) = \inf_{p \in J} \|p-q_i\| \] for $q_i \in Q$ from a point set $Q$. Collecting the vector of these sketch values induces a simple, effective, and powerful distance: the Euclidean distance between these sketched vectors. This paper shows how large this set $Q$ needs to be under a variety of shapes and scenarios. For hyperplanes we provide direct connection to the sensitivity sample framework, so relative error can be preserved in $d$ dimensions using $Q = O(d/\varepsilon^2)$. However, for other shapes, we show we need to enforce a minimum distance parameter $\rho$, and a domain size $L$. For $d=2$ the sample size $Q$ then can be $\tilde{O}((L/\rho) \cdot 1/\varepsilon^2)$. For objects (e.g., trajectories) with at most $k$ pieces this can provide stronger \emph{for all} approximations with $\tilde{O}((L/\rho)\cdot k^3 / \varepsilon^2)$ points. Moreover, with similar size bounds and restrictions, such trajectories can be reconstructed exactly using only these sketch vectors.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces sketch vectors for geometric objects J defined via the min-dist function v_i(J) = inf_{p ∈ J} ||p - q_i|| over points q_i in a set Q, with the Euclidean distance on these vectors serving as the induced distance. It claims that for hyperplanes this connects directly to the sensitivity sampling framework to achieve relative-error preservation with |Q| = O(d/ε²). For other shapes it requires a minimum-distance parameter ρ and bounded domain L, yielding |Q| = Õ((L/ρ) · 1/ε²) in d=2; for k-piece trajectories the bound becomes Õ((L/ρ)·k³ / ε²) for for-all approximations, and exact reconstruction is possible under similar restrictions.
Significance. If the stated sample-size bounds are rigorously established, the approach supplies a simple, parameter-light sketching primitive for min-dist distances on geometric objects that leverages an existing sensitivity-sampling result for the hyperplane case and gives explicit dependence on shape complexity k for trajectories. The for-all approximation and exact-reconstruction claims for piecewise-linear objects would be of interest in computational geometry applications.
major comments (1)
- [Abstract] Abstract: the central sample-size claims (Q = O(d/ε²) for hyperplanes; Õ((L/ρ)·k³ / ε²) for k-piece trajectories) are asserted without any derivation, proof sketch, or error analysis supplied in the text, rendering the load-bearing quantitative statements unverifiable from the given material.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. Below we address the sole major comment regarding the presentation of the central sample-size claims.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central sample-size claims (Q = O(d/ε²) for hyperplanes; Õ((L/ρ)·k³ / ε²) for k-piece trajectories) are asserted without any derivation, proof sketch, or error analysis supplied in the text, rendering the load-bearing quantitative statements unverifiable from the given material.
Authors: The abstract is intended as a high-level summary. The hyperplane bound follows from a direct reduction to sensitivity sampling (detailed in Section 3), where the min-dist function on hyperplanes is shown to have per-point sensitivity O(1) after appropriate normalization, immediately yielding the O(d/ε²) size via the standard sensitivity-sampling theorem of Braverman et al. The trajectory bound is obtained in Section 5 by (i) applying the 2D shape bound to each linear piece, (ii) taking a union bound over the k pieces, and (iii) incurring an extra k² factor from a discretization argument that controls the interaction between pieces; the resulting Õ((L/ρ)·k³/ε²) size is therefore explicit. Nevertheless, we agree that the abstract itself supplies no derivation or pointer, which can make the quantitative claims hard to verify at first reading. We will therefore revise the abstract to include a one-sentence indication of the sensitivity-sampling connection for hyperplanes and the union-bound argument for trajectories. revision: yes
Circularity Check
No significant circularity identified
full rationale
The paper routes its hyperplane case through the external sensitivity-sampling framework to obtain the O(d/ε²) bound, rather than deriving it from any internally fitted or self-defined quantity. Non-hyperplane results are explicitly conditioned on the independent parameters ρ and L; the stated sample-size expressions follow directly from these restrictions without reducing to a fit or renaming of the target quantity itself. No load-bearing self-citations, uniqueness theorems, or ansatzes appear in the derivation chain.
Axiom & Free-Parameter Ledger
free parameters (4)
- ε
- ρ
- L
- k
axioms (1)
- standard math The ambient space is Euclidean with the standard Euclidean norm.
Reference graph
Works this paper leans on
- [1]
-
[2]
M. Anthony and P. L. Bartlett.Neural Network Learning: Theoretical Foundations. Cambridge University Press, 1999
work page 1999
-
[3]
C. Boutsidis, M. W. Mahoney, and P. Drineas. An improved approximation algorithm for the column subset selection problem. In Proceedings of the 20th Annual ACM-SIAM Symposium on Discrete Algorithms, 2009
work page 2009
-
[4]
V. Braverman, D. Feldman, and H. Lang. New frameworks for offline and streaming coreset constructions. arXiv preprint arXiv:1612.00889, 2016
-
[5]
F. Chazal and D. Cohen-Steiner. Geometric inference.https://geometrica.saclay.inria.fr/team/ Fred.Chazal/papers/GeomInference5.pdf
- [6]
-
[7]
F. Chazal and A. Lieutier. The “λ-medial axis”.Graphical Models, 67:304–331, 2005
work page 2005
-
[8]
D. Chen and J. M. Phillips. Relative error embeddings for the gaussian kernel distance. InAlgorithmic Learning Theory, 2017
work page 2017
-
[9]
M. B. Cohen, C. Musco, and C. Musco. Input sparsity time low-rank approximation via ridge leverage score sampling. InACM-SIAM Symposium on Discrete Algorithms, 2017
work page 2017
-
[10]
M. B. Cohen, C. Musco, and J. Pachocki. Online row sampling. In International Workshop on Approximation, Randomization, and Combinatorial Optimization, 2016
work page 2016
-
[11]
A. Driemel, J. M. Phillips, and I. Psarros. On the vc dimension of metric balls under frechet and hausdorff distances. InInternational Symposium on Computational Geometry, 2019
work page 2019
-
[12]
P. Drineas, M. Magdon-Ismail, M. W. Mahoney, and D. P. Woodruff. Fast approximation of statistical leverage. Journal of Machine Learning Research, 13:3475–3506, 2012
work page 2012
-
[13]
P. Drineas, M. W. Mahoney, and S. Muthukrishnan. Relative-error CUR matrix decompositions.SIAM Journal of MAtrix Analysis and Applications, 30:844–881, 2008
work page 2008
-
[14]
H. Edelsbrunner and E. P. Mücke. Three-dimensional alpha shapes.ACM Transactions on Graphics, 13:43–72, 1994
work page 1994
-
[15]
D. Feldman and M. Langberg. A unified framework for approximating and clustering data. InProceedings ACM Symposium on Theory of Computing, 2011
work page 2011
-
[16]
D. Feldman, M. Schmidt, and C. Sohler. Turning big data into tiny data: Constant-size coresets for k-means, PCA, and projective clustering. InProceedings 24th ACM-SIAM Symposium on Discrete Algorithms, 2013
work page 2013
-
[17]
D. Feldman and L. J. Schulman. Data reduction for weighted and outlier-resistant clustering. InProc. ACM-SIAM Symposium on Discrete Algorithms, 2012
work page 2012
-
[18]
D. C.-S. Frederic Chazal and A. Lieutier. A sampling theory for compact sets in euclidean space.DCG, 41:461–479, 2009
work page 2009
- [19]
-
[20]
W. B. Johnson and J. Lindenstrauss. Extensions of Lipschitz maps into a Hilbert space.Contemporary Mathematics, 26:189–206, 1984
work page 1984
-
[21]
M. Langberg and L. J. Schulman. Universalε-approximators for integrals. InSODA, pages 598–607, 2010
work page 2010
-
[22]
D. Lopaz-Paz, K. Muandet, B. Schölkopf, and I. Tolstikhin. Towards a learning theory of cause-effect inference. In International Conference on Machine Learning, 2015
work page 2015
-
[23]
Scalable Spatial Scan Statistics for Trajectories
M. Matheny, D. Xie, and J. M. Phillips. Scalable spatial scan statistics for trajectories. Technical report, arXiv:1906.01693, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1906
-
[24]
K. Muandet, K. Fukumizu, B. Sriperumbudur, and B. Schölkopf. Kernel mean embedding of distributions: A review and beyond.Foundations and Trends in Machine Learning, 10:1–141, 2017
work page 2017
-
[25]
C. Musco and C. Musco. Recursive sampling for the Nyström method. InNIPS, 2017
work page 2017
- [26]
-
[27]
J. M. Phillips and P. Tang. Simple distances for trajectories via landmarks. Technical report, arXiv:1804.11284, 2019
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[28]
J. M. Phillips, B. Wang, and Y. Zheng. Geomtric inference on kernel density estimates. InSOCG, 2015
work page 2015
-
[29]
On the Sensitivity of Shape Fitting Problems
K. Varadarajan and X. Xiao. On the sensitivity of shape fitting problems. InProceedings International Conference on Foundations of Software Technology and Theoretical Computer Science. arxiv:1209.4893, 2012. 19 A A Lemma Used in Section 4 Lemma A.1. Suppose Q ⊂ Rd,X 1 ⊂ Rd1,X 2 ⊂ Rd2, and R1 = {{q ∈ Q| g1(q,x ) ≤ 0}| x ∈ X1}, R2 ={{q∈ R2| g2(q,x )≤ 0}| x∈ ...
work page internal anchor Pith review Pith/arXiv arXiv 2012
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.