pith. sign in

arxiv: 1907.02171 · v2 · pith:GCVRWUESnew · submitted 2019-07-04 · 💻 cs.CG

Sketched MinDist

Pith reviewed 2026-05-25 02:43 UTC · model grok-4.3

classification 💻 cs.CG
keywords min dist sketchesgeometric approximationhyperplanestrajectoriesrelative errorsketchingsensitivity samplingreconstruction
0
0 comments X

The pith

Sketched minimum distances to O(d/ε²) points preserve relative distances between hyperplanes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that vectors recording the minimum distance from a geometric object to each point in a set Q can serve as effective sketches. The Euclidean distance between two such vectors approximates the original distance between the objects. For hyperplanes, only O(d/ε²) points are needed to achieve relative error ε in any dimension. For general shapes with a minimum separation ρ in a domain of size L, the requirement in two dimensions is Õ((L/ρ) / ε²) points. For trajectories composed of at most k pieces, Õ((L/ρ) k³ / ε²) points suffice for approximations that hold for all pairs, and the same size permits exact reconstruction of the trajectories.

Core claim

Collecting the vector of minimum distances to points in Q induces the Euclidean distance on these vectors as a proxy for the true distance between objects J. For hyperplanes this proxy has relative error with |Q| = O(d/ε²). For other shapes a min distance ρ and domain L are enforced, yielding |Q| = Õ((L/ρ) 1/ε²) in d=2. For k-piece trajectories the bound is Õ((L/ρ) k³ / ε²) for for-all approximations, and exact reconstruction is possible with similar |Q|.

What carries the argument

The minDist sketch vector v(J) where each coordinate is the infimum distance from points in J to a query point q_i in Q; the Euclidean norm of the difference between two such vectors.

If this is right

  • Relative error preservation for hyperplane distances using linear in d sample size.
  • Sample size linear in the aspect ratio L/ρ for planar shapes.
  • Polynomial dependence on number of pieces k for trajectory approximations.
  • Exact recovery of k-piece trajectories from the sketch vectors.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • These sketches might enable sublinear time algorithms for geometric queries by reducing to vector operations.
  • The approach could extend to higher dimensions if the L/ρ term is handled differently.
  • Exact reconstruction implies the sketches capture all information needed for certain discrete objects.
  • Applications in trajectory analysis where storage is limited.

Load-bearing premise

The sample size bounds for non-hyperplane shapes require assuming a positive minimum distance ρ between objects and a bounded domain of size L.

What would settle it

A counterexample where the relative error exceeds ε when using fewer than O(d/ε²) points for some set of hyperplanes, or failure of exact reconstruction for trajectories when |Q| is below the stated bound.

Figures

Figures reproduced from arXiv: 1907.02171 by Jeff M. Phillips, Pingfan Tang.

Figure 1
Figure 1. Figure 1: Q is the set of blue points, γ1 is the red curve, γ2 is the green curve, and they coincide with each other on the boundary of the square. Suppose Q is a set of n points in R 2 and no two points are at the same location, then for any q0 ∈ Q we can draw two curves γ1, γ2 as shown in [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Left: Case 1, r = M 8 ≤ τ , and q 0 ∈ B(q, r). Right: Case 2, r = M 8 > τ , and q 0 ∈ B(q, τ + r). Case 1: τ ≥ M 8 . For any q 0 ∈ B(q, M 8 ) := {q 0 ∈ R d | kq 0 − qik ≤ M 8 }, we have τ + M = dist(q, S2) ≤ dist(q, q0 ) + dist(q 0 , S2) ≤ M 8 + dist(q 0 , S2), which implies for all q 0 ∈ B(q, M 8 ) dist(q 0 , S2) ≥ τ + M − M 8 = τ + 7 8 M. Similarly dist(q 0 , S1) ≤ dist(q 0 , q) + dist(q, S1) ≤ M 8 + τ f… view at source ↗
Figure 3
Figure 3. Figure 3: Illustration of the dist(q, sj ) from point q to segment sj . For a trajectory γ defined by critical points c0, c1, . . . , ck for j ∈ [k] define sj as the segment between cj−1, cj and `j as the line extension of that segment. The distance between q and a segment sj is illustrated in [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Left: l is tangent to Ci . Rotate l around Ci until it is tangent to some Cj . Center: c is an endpoint of γ. Right: c is an internal critical point of γ. In center and right figures, no tangent line of Ci can go through Bi,3η without intersecting with the pink curve. 13 [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Left: {c} = Ci1 ∩ Ci2 ∩ Ci3 and Bi1 ⊂ Bi2 ∪ Bi3 . Center: the angle between s and s 0 is at most π 4 and {c} = Ci1 ∩ Ci2 ∩ Ci3 and Bi1 ⊂ Bi2 ∪ Bi3 . Right: Ci1 , Ci2 are tangent to s, and Ci3 , Ci4 are tangent to s 0 , For each one of these four circles, any tangent line segment, except s, s0 , cannot be extended outside Bi,8η without intersecting with any other circle. Lemma 5.2. Suppose c 0 ∈ Bi,3η is a … view at source ↗
read the original abstract

We consider sketch vectors of geometric objects $J$ through the \mindist function \[ v_i(J) = \inf_{p \in J} \|p-q_i\| \] for $q_i \in Q$ from a point set $Q$. Collecting the vector of these sketch values induces a simple, effective, and powerful distance: the Euclidean distance between these sketched vectors. This paper shows how large this set $Q$ needs to be under a variety of shapes and scenarios. For hyperplanes we provide direct connection to the sensitivity sample framework, so relative error can be preserved in $d$ dimensions using $Q = O(d/\varepsilon^2)$. However, for other shapes, we show we need to enforce a minimum distance parameter $\rho$, and a domain size $L$. For $d=2$ the sample size $Q$ then can be $\tilde{O}((L/\rho) \cdot 1/\varepsilon^2)$. For objects (e.g., trajectories) with at most $k$ pieces this can provide stronger \emph{for all} approximations with $\tilde{O}((L/\rho)\cdot k^3 / \varepsilon^2)$ points. Moreover, with similar size bounds and restrictions, such trajectories can be reconstructed exactly using only these sketch vectors.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper introduces sketch vectors for geometric objects J defined via the min-dist function v_i(J) = inf_{p ∈ J} ||p - q_i|| over points q_i in a set Q, with the Euclidean distance on these vectors serving as the induced distance. It claims that for hyperplanes this connects directly to the sensitivity sampling framework to achieve relative-error preservation with |Q| = O(d/ε²). For other shapes it requires a minimum-distance parameter ρ and bounded domain L, yielding |Q| = Õ((L/ρ) · 1/ε²) in d=2; for k-piece trajectories the bound becomes Õ((L/ρ)·k³ / ε²) for for-all approximations, and exact reconstruction is possible under similar restrictions.

Significance. If the stated sample-size bounds are rigorously established, the approach supplies a simple, parameter-light sketching primitive for min-dist distances on geometric objects that leverages an existing sensitivity-sampling result for the hyperplane case and gives explicit dependence on shape complexity k for trajectories. The for-all approximation and exact-reconstruction claims for piecewise-linear objects would be of interest in computational geometry applications.

major comments (1)
  1. [Abstract] Abstract: the central sample-size claims (Q = O(d/ε²) for hyperplanes; Õ((L/ρ)·k³ / ε²) for k-piece trajectories) are asserted without any derivation, proof sketch, or error analysis supplied in the text, rendering the load-bearing quantitative statements unverifiable from the given material.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback. Below we address the sole major comment regarding the presentation of the central sample-size claims.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central sample-size claims (Q = O(d/ε²) for hyperplanes; Õ((L/ρ)·k³ / ε²) for k-piece trajectories) are asserted without any derivation, proof sketch, or error analysis supplied in the text, rendering the load-bearing quantitative statements unverifiable from the given material.

    Authors: The abstract is intended as a high-level summary. The hyperplane bound follows from a direct reduction to sensitivity sampling (detailed in Section 3), where the min-dist function on hyperplanes is shown to have per-point sensitivity O(1) after appropriate normalization, immediately yielding the O(d/ε²) size via the standard sensitivity-sampling theorem of Braverman et al. The trajectory bound is obtained in Section 5 by (i) applying the 2D shape bound to each linear piece, (ii) taking a union bound over the k pieces, and (iii) incurring an extra k² factor from a discretization argument that controls the interaction between pieces; the resulting Õ((L/ρ)·k³/ε²) size is therefore explicit. Nevertheless, we agree that the abstract itself supplies no derivation or pointer, which can make the quantitative claims hard to verify at first reading. We will therefore revise the abstract to include a one-sentence indication of the sensitivity-sampling connection for hyperplanes and the union-bound argument for trajectories. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper routes its hyperplane case through the external sensitivity-sampling framework to obtain the O(d/ε²) bound, rather than deriving it from any internally fitted or self-defined quantity. Non-hyperplane results are explicitly conditioned on the independent parameters ρ and L; the stated sample-size expressions follow directly from these restrictions without reducing to a fit or renaming of the target quantity itself. No load-bearing self-citations, uniqueness theorems, or ansatzes appear in the derivation chain.

Axiom & Free-Parameter Ledger

4 free parameters · 1 axioms · 0 invented entities

The stated bounds are parameterized by user-chosen quantities ε, ρ, L, and k that define the approximation quality and domain restrictions; these enter the sample-size expressions directly and are not derived from the data or from a uniqueness theorem.

free parameters (4)
  • ε
    Allowed relative error that appears in the denominator of every sample-size bound.
  • ρ
    Minimum-distance restriction imposed on the geometric objects to obtain the (L/ρ) scaling.
  • L
    Diameter of the domain containing all objects; scales the sample size linearly.
  • k
    Maximum number of linear pieces allowed in each trajectory object; appears as k³ in the bound.
axioms (1)
  • standard math The ambient space is Euclidean with the standard Euclidean norm.
    The definition v_i(J) = inf_{p in J} ||p - q_i|| and the subsequent vector Euclidean distance both rely on this background geometry.

pith-pipeline@v0.9.0 · 5752 in / 1494 out tokens · 32336 ms · 2026-05-25T02:43:32.091444+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages · 3 internal anchors

  1. [1]

    Amenta, S

    N. Amenta, S. Choi, and R. K. Kolluri. The power crust. InProceedings of the sixth ACM symposium on Solid modeling and applications, 2001

  2. [2]

    Anthony and P

    M. Anthony and P. L. Bartlett.Neural Network Learning: Theoretical Foundations. Cambridge University Press, 1999

  3. [3]

    Boutsidis, M

    C. Boutsidis, M. W. Mahoney, and P. Drineas. An improved approximation algorithm for the column subset selection problem. In Proceedings of the 20th Annual ACM-SIAM Symposium on Discrete Algorithms, 2009

  4. [4]

    Braverman, D

    V. Braverman, D. Feldman, and H. Lang. New frameworks for offline and streaming coreset constructions. arXiv preprint arXiv:1612.00889, 2016

  5. [5]

    Chazal and D

    F. Chazal and D. Cohen-Steiner. Geometric inference.https://geometrica.saclay.inria.fr/team/ Fred.Chazal/papers/GeomInference5.pdf

  6. [6]

    Chazal, D

    F. Chazal, D. Cohen-Steiner, and Q. Mérigot. Geometric inference for probability measures.Foundations of Computational Mathematics, pages 1–19, 2010

  7. [7]

    λ-medial axis

    F. Chazal and A. Lieutier. The “λ-medial axis”.Graphical Models, 67:304–331, 2005

  8. [8]

    Chen and J

    D. Chen and J. M. Phillips. Relative error embeddings for the gaussian kernel distance. InAlgorithmic Learning Theory, 2017

  9. [9]

    M. B. Cohen, C. Musco, and C. Musco. Input sparsity time low-rank approximation via ridge leverage score sampling. InACM-SIAM Symposium on Discrete Algorithms, 2017

  10. [10]

    M. B. Cohen, C. Musco, and J. Pachocki. Online row sampling. In International Workshop on Approximation, Randomization, and Combinatorial Optimization, 2016

  11. [11]

    Driemel, J

    A. Driemel, J. M. Phillips, and I. Psarros. On the vc dimension of metric balls under frechet and hausdorff distances. InInternational Symposium on Computational Geometry, 2019

  12. [12]

    Drineas, M

    P. Drineas, M. Magdon-Ismail, M. W. Mahoney, and D. P. Woodruff. Fast approximation of statistical leverage. Journal of Machine Learning Research, 13:3475–3506, 2012

  13. [13]

    Drineas, M

    P. Drineas, M. W. Mahoney, and S. Muthukrishnan. Relative-error CUR matrix decompositions.SIAM Journal of MAtrix Analysis and Applications, 30:844–881, 2008

  14. [14]

    Edelsbrunner and E

    H. Edelsbrunner and E. P. Mücke. Three-dimensional alpha shapes.ACM Transactions on Graphics, 13:43–72, 1994

  15. [15]

    Feldman and M

    D. Feldman and M. Langberg. A unified framework for approximating and clustering data. InProceedings ACM Symposium on Theory of Computing, 2011

  16. [16]

    Feldman, M

    D. Feldman, M. Schmidt, and C. Sohler. Turning big data into tiny data: Constant-size coresets for k-means, PCA, and projective clustering. InProceedings 24th ACM-SIAM Symposium on Discrete Algorithms, 2013

  17. [17]

    Feldman and L

    D. Feldman and L. J. Schulman. Data reduction for weighted and outlier-resistant clustering. InProc. ACM-SIAM Symposium on Discrete Algorithms, 2012

  18. [18]

    D. C.-S. Frederic Chazal and A. Lieutier. A sampling theory for compact sets in euclidean space.DCG, 41:461–479, 2009

  19. [19]

    Har-Peled

    S. Har-Peled. Geometric Approximation Algorithms. Mathematical Surveys and Monographs. American Mathematical Society, 2011. 18

  20. [20]

    W. B. Johnson and J. Lindenstrauss. Extensions of Lipschitz maps into a Hilbert space.Contemporary Mathematics, 26:189–206, 1984

  21. [21]

    Langberg and L

    M. Langberg and L. J. Schulman. Universalε-approximators for integrals. InSODA, pages 598–607, 2010

  22. [22]

    Lopaz-Paz, K

    D. Lopaz-Paz, K. Muandet, B. Schölkopf, and I. Tolstikhin. Towards a learning theory of cause-effect inference. In International Conference on Machine Learning, 2015

  23. [23]

    Scalable Spatial Scan Statistics for Trajectories

    M. Matheny, D. Xie, and J. M. Phillips. Scalable spatial scan statistics for trajectories. Technical report, arXiv:1906.01693, 2019

  24. [24]

    Muandet, K

    K. Muandet, K. Fukumizu, B. Sriperumbudur, and B. Schölkopf. Kernel mean embedding of distributions: A review and beyond.Foundations and Trends in Machine Learning, 10:1–141, 2017

  25. [25]

    Musco and C

    C. Musco and C. Musco. Recursive sampling for the Nyström method. InNIPS, 2017

  26. [26]

    J. M. Phillips and W. M. Tai. Relative error rkhs embeddings for gaussian kernels. Technical report, arXiv:1811.04136, 2018

  27. [27]

    J. M. Phillips and P. Tang. Simple distances for trajectories via landmarks. Technical report, arXiv:1804.11284, 2019

  28. [28]

    J. M. Phillips, B. Wang, and Y. Zheng. Geomtric inference on kernel density estimates. InSOCG, 2015

  29. [29]

    On the Sensitivity of Shape Fitting Problems

    K. Varadarajan and X. Xiao. On the sensitivity of shape fitting problems. InProceedings International Conference on Foundations of Software Technology and Theoretical Computer Science. arxiv:1209.4893, 2012. 19 A A Lemma Used in Section 4 Lemma A.1. Suppose Q ⊂ Rd,X 1 ⊂ Rd1,X 2 ⊂ Rd2, and R1 = {{q ∈ Q| g1(q,x ) ≤ 0}| x ∈ X1}, R2 ={{q∈ R2| g2(q,x )≤ 0}| x∈ ...