Tight Sensitivity Bounds For Smaller Coresets
Pith reviewed 2026-05-25 11:04 UTC · model grok-4.3
The pith
Algorithms compute tight sensitivity bounds for each row to produce smaller coresets for affine subspace approximations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We provide algorithms that compute provably tight bounds for the sensitivity of each input row. It is based on two ingredients: (i) iterative algorithm that computes the exact sensitivity of each point up to arbitrary small precision for (non-affine) k-subspaces, and (ii) a general reduction of independent interest from computing sensitivity for the family of affine k-subspaces in R^d to (non-affine) (k+1) subspaces in R^{d+1}.
What carries the argument
The reduction from affine k-subspace sensitivity in R^d to non-affine (k+1)-subspace sensitivity in R^{d+1}, together with the iterative exact sensitivity computation for non-affine cases.
If this is right
- Smaller coresets that are data-dependent and near-linear in the sum of the tight sensitivities.
- Applicable to hyper-parameter tuning and problems including low-rank approximation, k-PCA, Lasso, Ridge, and linear regression.
- Handles streaming, dynamic, and distributed big data in parallel with reduced size.
- The reduction can be used independently for other sensitivity-based sampling tasks.
Where Pith is reading between the lines
- Implementing the reduction allows reusing the non-affine algorithm for affine problems by increasing dimension by one.
- Testing on more datasets could reveal how much smaller the coresets are in different domains.
- The precision parameter in the iterative algorithm trades off accuracy for computation time in practice.
Load-bearing premise
The general reduction preserves the exact sensitivity values of rows when transforming the affine problem to the non-affine one in higher dimension.
What would settle it
Computing sensitivities exactly for a small synthetic dataset by enumerating all possible subspaces and comparing to the output of the proposed algorithm; any significant discrepancy would show the bounds are not tight.
Figures
read the original abstract
An $\varepsilon$-coreset for Least-Mean-Squares (LMS) of a matrix $A\in{\mathbb{R}}^{n\times d}$ is a small weighted subset of its rows that approximates the sum of squared distances from its rows to every affine $k$-dimensional subspace of ${\mathbb{R}}^d$, up to a factor of $1\pm\varepsilon$. Such coresets are useful for hyper-parameter tuning and solving many least-mean-squares problems such as low-rank approximation ($k$-SVD), $k$-PCA, Lassso/Ridge/Linear regression and many more. Coresets are also useful for handling streaming, dynamic and distributed big data in parallel. With high probability, non-uniform sampling based on upper bounds on what is known as importance or sensitivity of each row in $A$ yields a coreset. The size of the (sampled) coreset is then near-linear in the total sum of these sensitivity bounds. We provide algorithms that compute provably \emph{tight} bounds for the sensitivity of each input row. It is based on two ingredients: (i) iterative algorithm that computes the exact sensitivity of each point up to arbitrary small precision for (non-affine) $k$-subspaces, and (ii) a general reduction of independent interest from computing sensitivity for the family of affine $k$-subspaces in ${\mathbb{R}}^d$ to (non-affine) $(k+1)$- subspaces in ${\mathbb{R}}^{d+1}$. Experimental results on real-world datasets, including the English Wikipedia documents-term matrix, show that our bounds provide significantly smaller and data-dependent coresets also in practice. Full open source is also provided.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to provide algorithms computing provably tight per-row sensitivity bounds for constructing smaller ε-coresets approximating sum-of-squared-distances to every affine k-subspace. The method rests on (i) an iterative procedure yielding exact sensitivity (to arbitrary precision) for non-affine k-subspaces and (ii) a reduction mapping sensitivity computation over affine k-subspaces in R^d to non-affine (k+1)-subspaces in R^{d+1}. Experiments on real matrices, including the English Wikipedia term-document matrix, are reported to yield smaller data-dependent coresets; open-source code is supplied.
Significance. If the reduction preserves exact per-row sensitivities, the work would improve coreset sizes for LMS problems (k-SVD, regression, PCA) in streaming and distributed settings. The open-source implementation and real-data experiments constitute concrete strengths for reproducibility and practical impact.
major comments (2)
- [reduction from affine k-subspaces to (k+1)-subspaces] The reduction (ingredient ii, described after the abstract and developed in the main technical section): the manuscript asserts that the homogenization map yields identical per-row sensitivity values, yet supplies no explicit argument that the supremum of the normalized squared-distance term is preserved after the lift. The total sum of squared distances (the normalization factor) and the feasible set of subspaces both change under the map; without a direct equality proof, the claim of 'provably tight' bounds for affine subspaces rests on an unverified step.
- [iterative sensitivity algorithm] Iterative algorithm for non-affine k-subspaces (ingredient i): while the procedure is stated to converge to arbitrary precision, no iteration bound or convergence rate is given, leaving open whether the 'exact' sensitivities can be obtained in time polynomial in the input parameters for the claimed applications.
minor comments (2)
- [abstract] Abstract: 'Lassso' is a typo for 'Lasso'.
- [preliminaries] Notation for the sensitivity definition should be introduced once with a displayed equation before being used in the reduction argument.
Simulated Author's Rebuttal
We thank the referee for the careful review and positive assessment of the work's significance, reproducibility, and practical impact. We address the two major comments point by point below.
read point-by-point responses
-
Referee: [reduction from affine k-subspaces to (k+1)-subspaces] The reduction (ingredient ii, described after the abstract and developed in the main technical section): the manuscript asserts that the homogenization map yields identical per-row sensitivity values, yet supplies no explicit argument that the supremum of the normalized squared-distance term is preserved after the lift. The total sum of squared distances (the normalization factor) and the feasible set of subspaces both change under the map; without a direct equality proof, the claim of 'provably tight' bounds for affine subspaces rests on an unverified step.
Authors: We agree that an explicit proof is needed to rigorously establish preservation of per-row sensitivities. In the revision we will insert a dedicated lemma proving that the homogenization map yields identical sensitivity values: the supremum of the normalized squared-distance term is unchanged because the lift maps affine k-subspaces in R^d bijectively to linear (k+1)-subspaces in R^{d+1} while scaling both the numerator and the total sum by the same homogeneous factor, so the ratio (and thus the sensitivity) is invariant. revision: yes
-
Referee: [iterative sensitivity algorithm] Iterative algorithm for non-affine k-subspaces (ingredient i): while the procedure is stated to converge to arbitrary precision, no iteration bound or convergence rate is given, leaving open whether the 'exact' sensitivities can be obtained in time polynomial in the input parameters for the claimed applications.
Authors: We agree that the manuscript provides no iteration bound or convergence-rate analysis. The procedure is a practical fixed-point iteration whose fast empirical convergence is demonstrated on the reported datasets (including Wikipedia). In the revision we will add a short discussion of observed iteration counts and a note that the method is intended for offline pre-processing rather than a worst-case polynomial-time claim; a formal rate analysis lies beyond the current scope. revision: partial
Circularity Check
No circularity; independent algorithmic reduction and iteration
full rationale
The paper introduces two new ingredients—an iterative procedure for exact per-point sensitivity on non-affine k-subspaces and a homogenization-style reduction lifting affine k-subspaces in R^d to linear (k+1)-subspaces in R^{d+1}—presented as original machinery that directly yields tight sensitivity bounds. No equations, fitted parameters, or self-citations are shown to define the output sensitivities in terms of themselves; the claims rest on the correctness of the stated reduction and iteration rather than on any renaming or self-referential construction.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Regression analysis in small- n-large-p using interactive prior elicitation of pairwise similarities
[APK16] Homayun Afrabandpey, Tomi Peltola, and Samuel Kaski. Regression analysis in small- n-large-p using interactive prior elicitation of pairwise similarities. InFILM 2016, NIPS Workshop on Future of Interactive Learning Machines ,
work page 2016
-
[2]
k-Means for Streaming and Distributed Big Sparse Data
[BF15] Artem Barger and Dan Feldman. k-means for streaming and distributed big sparse data. CoRR, abs/1511.08990,
work page internal anchor Pith review Pith/arXiv arXiv
-
[3]
New frameworks for offline and streaming coreset constructions
[BFL16] Vladimir Braverman, Dan Feldman, and Harry Lang. New frameworks for offline and streaming coreset constructions. CoRR, abs/1612.00889,
-
[4]
Dimensionality Reduction for k-Means Clustering and Low Rank Approximation
[CEM+14] Michael B. Cohen, Sam Elder, Cameron Musco, Christopher Musco, and Madalina Persu. Dimensionality reduction for k-means clustering and low rank approximation. CoRR, abs/1410.6801,
work page internal anchor Pith review Pith/arXiv arXiv
- [5]
-
[6]
Turning Big data into tiny data: Constant-size coresets for k-means, PCA and projective clustering
Society for Industrial and Applied Mathematics. [FSS18] Dan Feldman, Melanie Schmidt, and Christian Sohler. Turning big data into tiny data: Constant-size coresets for k-means, PCA and projective clustering. CoRR, abs/1807.04518,
work page internal anchor Pith review Pith/arXiv arXiv
-
[7]
SciPy: Open source scientific tools for Python, 2001–
[JOP+ ] Eric Jones, Travis Oliphant, Pearu Peterson, et al. SciPy: Open source scientific tools for Python, 2001–. [Online; accessed ¡today¿]. [KLJ11] Byung Kang, Woosang Lim, and Kyomin Jung. Scalable kernel k-means via centroid approximation. In Proc. NIPS,
work page 2001
-
[8]
Efficient Estimation of Word Representations in Vector Space
[MCCD13] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 ,
work page internal anchor Pith review Pith/arXiv arXiv
-
[9]
Fast and accurate least-mean- squares solvers
[MJF19] Alaa Maalouf, Ibrahim Jubran, and Dan Feldman. Fast and accurate least-mean- squares solvers. arXiv preprint arXiv:1906.04705 ,
-
[10]
Improved approximation algorithms for large matrices via random pro- jections
[Sar06] Tamas Sarlos. Improved approximation algorithms for large matrices via random pro- jections. In 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS’06), pages 143–152. IEEE,
work page 2006
-
[11]
Determinantal point processes for coresets
[TBA18] Nicolas Tremblay, Simon Barthelm´ e, and Pierre-Olivier Amblard. Determinantal point processes for coresets. arXiv preprint arXiv:1803.08700 ,
-
[12]
On the Sensitivity of Shape Fitting Problems
[VX12] Kasturi R. Varadarajan and Xin Xiao. On the sensitivity of shape fitting problems. CoRR, abs/1209.4893,
work page internal anchor Pith review Pith/arXiv arXiv
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.