Recognition: unknown
Pack only the essentials: Adaptive dictionary learning for kernel ridge regression
Pith reviewed 2026-05-08 10:09 UTC · model grok-4.3
The pith
SQUEAK samples kernel columns with unnormalized ridge leverage scores to keep space complexity only a constant factor above exact RLS sampling.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SQUEAK processes the dataset incrementally, maintains unnormalized ridge leverage scores, and constructs a Nystrom approximation on the fly, delivering space complexity that is only a constant factor worse than exact ridge leverage score sampling without needing to estimate the effective dimension or the largest eigenvalue of the kernel matrix.
What carries the argument
Incremental updates of unnormalized ridge leverage scores to select columns for the Nystrom approximation of the kernel matrix.
If this is right
- The number of columns m required for epsilon-accurate approximation scales with the effective dimension rather than with n or the maximum degree of freedom.
- Space usage grows linearly with the effective dimension plus a small constant factor instead of quadratically with the number of samples.
- The algorithm eliminates the need to track or estimate the largest eigenvalue during the incremental pass.
- Accuracy guarantees of RLS-based Nystrom are preserved up to the constant overhead introduced by using unnormalized scores.
Where Pith is reading between the lines
- The same unnormalized sampling idea could extend to streaming or online versions of other kernel-based methods such as support vector machines.
- In practice the constant factor might be reduced further by periodic re-normalization steps triggered only when coherence appears high.
- Similar dictionary-learning logic might apply directly to Gaussian process regression where memory limits are equally binding.
Load-bearing premise
Unnormalized ridge leverage scores still produce Nystrom approximations whose error remains controlled by the effective dimension without normalization.
What would settle it
Run SQUEAK on a high-coherence dataset with m set to a small constant times the effective dimension and check whether the approximation error stays within the claimed bound of exact RLS sampling.
read the original abstract
One of the major limits of kernel ridge regression (KRR) is that storing and manipulating the kernel matrix K_n for n samples requires O(n^2) space, which rapidly becomes unfeasible for large n. Nystrom approximations reduce the space complexity to O(nm) by sampling m columns from K_n. Uniform sampling preserves KRR accuracy (up to epsilon) only when m is proportional to the maximum degree of freedom of K_n, which may require O(n) columns for datasets with high coherence. Sampling columns according to their ridge leverage scores (RLS) gives accurate Nystrom approximations with m proportional to the effective dimension, but computing exact RLS also requires O(n^2) space. (Calandriello et al. 2016) propose INK-Estimate, an algorithm that processes the dataset incrementally and updates RLS, effective dimension, and Nystrom approximations on-the-fly. Its space complexity scales with the effective dimension but introduces a dependency on the largest eigenvalue of K_n, which in the worst case is O(n). In this paper we introduce SQUEAK, a new algorithm that builds on INK-Estimate but uses unnormalized RLS. As a consequence, the algorithm is simpler, does not need to estimate the effective dimension for normalization, and achieves a space complexity that is only a constant factor worse than exact RLS sampling.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces SQUEAK, an incremental algorithm for adaptive Nystrom dictionary construction in kernel ridge regression. Building on INK-Estimate, it replaces normalized ridge leverage scores with unnormalized versions, thereby avoiding explicit estimation of the effective dimension and the largest eigenvalue of the kernel matrix. The central claim is that this yields a space complexity O(m) with m only a constant factor larger than the exact-RLS optimum while preserving the same approximation guarantees up to that constant.
Significance. If the constant-factor claim holds, the result would simplify online dictionary learning for KRR and remove a practical bottleneck (eigenvalue estimation) that scales with n in the worst case. The approach is directly relevant to large-scale kernel methods where space, rather than time, is the limiting resource.
major comments (2)
- [§4.2, Theorem 2] §4.2, Theorem 2 and the subsequent error bound: the argument that unnormalized RLS probabilities produce an m that remains O(d_eff) with a universal constant (independent of λ_max) is not fully explicit. The proof sketch absorbs the scaling discrepancy into a single C, but it is unclear whether C can be bounded without additional assumptions on the spectrum or coherence; a concrete counter-example or a worst-case bound on the ratio of unnormalized to normalized scores would be needed to substantiate the 'constant factor worse' claim.
- [§5.1, Algorithm 1] §5.1, Algorithm 1 (SQUEAK pseudocode) and the update rule for the unnormalized scores: the incremental update appears to drop the normalization step present in INK-Estimate, yet the space-complexity analysis in §5.3 still invokes the same effective-dimension bound. It is not shown how the absence of the λ_max estimate is compensated without increasing the retained dictionary size beyond a fixed multiple of d_eff on all datasets.
minor comments (2)
- [§2] Notation: the symbol d_eff is used both for the exact effective dimension and for its estimate; a brief clarification in §2 would avoid confusion when comparing to prior work.
- [Figure 2] Figure 2 caption: the legend labels 'SQUEAK (unnormalized)' and 'INK-Estimate' but does not indicate whether the plotted m values are averaged over multiple runs or single realizations; adding this detail would improve reproducibility.
Simulated Author's Rebuttal
We thank the referee for the careful and constructive review. The comments help clarify the presentation of our theoretical results and algorithmic simplifications. We address each major point below and will revise the manuscript accordingly to strengthen the exposition.
read point-by-point responses
-
Referee: [§4.2, Theorem 2] §4.2, Theorem 2 and the subsequent error bound: the argument that unnormalized RLS probabilities produce an m that remains O(d_eff) with a universal constant (independent of λ_max) is not fully explicit. The proof sketch absorbs the scaling discrepancy into a single C, but it is unclear whether C can be bounded without additional assumptions on the spectrum or coherence; a concrete counter-example or a worst-case bound on the ratio of unnormalized to normalized scores would be needed to substantiate the 'constant factor worse' claim.
Authors: We agree that the current proof sketch in §4.2 absorbs the scaling into C without an explicit derivation. The unnormalized scores are defined directly from the diagonal kernel entries and the ridge parameter, and their ratio to the normalized RLS (as used in INK-Estimate) can be bounded by a universal constant independent of λ_max. We will revise the section to include a new supporting lemma that derives this bound rigorously from the definitions of ridge leverage scores, without requiring extra assumptions on the spectrum or coherence. This will make the constant-factor claim fully explicit. revision: yes
-
Referee: [§5.1, Algorithm 1] §5.1, Algorithm 1 (SQUEAK pseudocode) and the update rule for the unnormalized scores: the incremental update appears to drop the normalization step present in INK-Estimate, yet the space-complexity analysis in §5.3 still invokes the same effective-dimension bound. It is not shown how the absence of the λ_max estimate is compensated without increasing the retained dictionary size beyond a fixed multiple of d_eff on all datasets.
Authors: The referee is correct that SQUEAK removes the normalization and λ_max estimation steps present in INK-Estimate. The space-complexity argument in §5.3 continues to hold because the sampling probabilities remain proportional to the (unnormalized) ridge leverages; the expected dictionary size is therefore still bounded by a fixed multiple of d_eff, with the multiple coming from the same ratio addressed in the first comment. The lack of λ_max estimation is the intended simplification and does not inflate the size beyond this constant factor. We will add a clarifying paragraph in §5.3 that explicitly connects the algorithmic change to the complexity bound. revision: yes
Circularity Check
No circularity: SQUEAK space bound follows from independent unnormalized RLS analysis
full rationale
The paper introduces SQUEAK by modifying the 2016 INK-Estimate procedure to drop normalization by effective dimension and lambda_max. The central claim that space complexity remains only a constant factor worse than exact RLS sampling is presented as a direct consequence of this change, supported by new error bounds on the resulting Nystrom approximation. No equation or quantity is defined in terms of itself, no parameter is fitted to a subset and then relabeled as a prediction, and the self-citation to prior work supplies the base incremental update but does not carry the load-bearing constant-factor argument. The derivation therefore remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Ahmed El Alaoui and Michael W. Mahoney. Fast randomized kernel methods with statistical guarantees. InNeural Information Processing Systems, 2015
2015
-
[2]
Sharp analysis of low-rank kernel matrix approximations
Francis Bach. Sharp analysis of low-rank kernel matrix approximations. InConference on Learning Theory, 2013
2013
-
[3]
Analysis of Nyström method with sequential ridge leverage scores
Daniele Calandriello, Alessandro Lazaric, and Michal Valko. Analysis of Nyström method with sequential ridge leverage scores. InUncertainty in Artificial Intelligence, 2016
2016
-
[4]
Revisiting the Nyström method for improved large-scale machine learning
Alex Gittens and Michael W Mahoney. Revisiting the Nyström method for improved large-scale machine learning. InInternational Conference on Machine Learning, 2013
2013
-
[5]
Less is more: Nyström computa- tional regularization
Alessandro Rudi, Raffaello Camoriano, and Lorenzo Rosasco. Less is more: Nyström computa- tional regularization. InNeural Information Processing Systems, 2015. 5
2015
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.