Mercer Large-Scale Kernel Machines from Ridge Function Perspective
Pith reviewed 2026-05-24 07:21 UTC · model grok-4.3
The pith
Ridge function theory shows which kernels admit approximation by sums of cosine products and identifies the obstacles to this approach.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Results on the fundamentality of ridge functions can be transferred to examine the approximation of kernels by sums of products of cosine functions, revealing that this approach encounters obstacles and does not apply without restrictions on the kernel or the associated measure.
What carries the argument
The representation of the kernel as a sum of products of cosine functions with arguments depending on x and y, which acts as the explicit approximator derived from the ridge function perspective.
If this is right
- Only kernels satisfying conditions inherited from ridge function density results can be approximated in this way.
- Obstacles arise when the sampling measure cannot be chosen to match the requirements of the ridge function expansion.
- The resulting procedure applies directly to image classification tasks via the one-versus-rest strategy.
Where Pith is reading between the lines
- New kernel families could be constructed to satisfy the conditions needed for the cosine product representation.
- Similar obstacles may appear when applying ridge function ideas to other high-dimensional approximation problems.
- Numerical checks on standard kernels would reveal whether the obstacles manifest in practice for common choices.
Load-bearing premise
The density and approximation results for ridge functions transfer directly to the random-feature construction without additional restrictions on the kernel or the sampling measure.
What would settle it
A concrete kernel for which no finite or infinite sum of such cosine products can approximate it to arbitrary accuracy under the given sampling would show that the transfer of ridge function results does not hold.
read the original abstract
To present Mercer large-scale kernel machines from a ridge function perspective, we recall the results by Lin and Pinkus from {\it Fundamentality of ridge functions}. We consider the main result of the recent paper by Rachimi and Recht, 2008, {\it Random features for large-scale kernel machines} from the Approximation Theory point of view. We study which kernels could be approximated by a sum of products of cosine functions with arguments depending on $x$ and $y$ and present the obstacles of such an approach. The results of this article are applied to Image Processing by procedure "one-vs-rest".
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript recalls the fundamentality results for ridge functions due to Lin and Pinkus, re-examines the random-feature construction of Rahimi and Recht from an approximation-theoretic viewpoint, identifies the class of kernels that admit approximation by finite sums of products of cosines whose arguments depend on both x and y, enumerates obstacles to such approximations, and illustrates the framework on an image-processing task via a one-vs-rest procedure.
Significance. If the obstacles are stated with explicit conditions on the kernel and the sampling measure, the perspective could usefully constrain the design of random-feature maps for kernels outside the scope of standard Fourier constructions. The work does not advance new positive approximation theorems or machine-checked proofs, so its primary contribution lies in the synthesis and the identification of barriers rather than in strengthened guarantees.
minor comments (3)
- [Abstract] Abstract: the claim that the results are 'applied to Image Processing' is not accompanied by any dataset description, baseline comparison, or quantitative metric, making the practical illustration difficult to evaluate.
- [Section 3 (or equivalent)] The manuscript cites Lin-Pinkus and Rahimi-Recht but does not state whether the cosine-product form is shown to be dense under the same hypotheses as the original ridge-function results or whether additional restrictions on the kernel or measure are required.
- [Section 4] Notation: the distinction between the random-feature Monte-Carlo estimator and the deterministic ridge-function approximation is not always maintained when obstacles are listed; a short clarifying paragraph would help.
Simulated Author's Rebuttal
We thank the referee for the summary, significance assessment, and recommendation for minor revision. We respond point-by-point to the observations on explicit conditions and the nature of the contribution.
read point-by-point responses
-
Referee: If the obstacles are stated with explicit conditions on the kernel and the sampling measure, the perspective could usefully constrain the design of random-feature maps for kernels outside the scope of standard Fourier constructions.
Authors: We agree. In the revision we will add explicit statements of the conditions on the Mercer kernel (continuity, positive-definiteness) and on the sampling measure under which the approximation obstacles hold, directly linking them to the Lin-Pinkus fundamentality results. revision: yes
-
Referee: The work does not advance new positive approximation theorems or machine-checked proofs, so its primary contribution lies in the synthesis and the identification of barriers rather than in strengthened guarantees.
Authors: We concur with this characterization. The manuscript deliberately focuses on re-examining Rahimi-Recht random features via ridge functions, recalling existing results, and enumerating obstacles rather than deriving new approximation rates or formal proofs. revision: no
Circularity Check
No significant circularity; relies on external citations
full rationale
The paper recalls Lin-Pinkus density results for ridge functions and the Rahimi-Recht random-feature construction as external inputs, then examines which kernels admit cosine-product approximations while noting obstacles. No load-bearing step reduces to a self-citation chain, fitted parameter renamed as prediction, or self-definitional equivalence. All central claims rest on cited approximation-theory results from independent authors, with the present work applying those results to a new question about cosine sums without re-deriving or fitting the inputs. This is the normal case of a self-contained derivation against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math Ridge functions are dense in the space of continuous functions on compact sets under suitable conditions (Lin-Pinkus).
Reference graph
Works this paper leans on
-
[1]
M. Belkin, Fit without fear: Remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numerica, 30, 203-248, 2021
work page 2021
- [2]
-
[3]
A. Christmann and I. Steinwart, Support Vector Machines . Springer, Berlin, 2008
work page 2008
-
[4]
Cucker, F. and S. Smale On the mathematical foundations of learning . Bull. of the Amer. Math. Soc. 29 (1), 1–49. 16
-
[5]
E. De Vito, L. Rosasco, and A. Rudi Regularization: From Inverse Problems to Large-Scale Machine Learning 245-296. Harmonic and Applied Analysis From Radon Transforms to Machine Learning Book Series: Applied and Numerical Harmonic Analysis Springer International Publishing 2021
work page 2021
- [6]
-
[7]
Micchelli, Charles A.; Xu, Yuesheng; Zhang, Haizhang Universal kernels. J. Mach. Learn. Res. 7 (2006), 2651-2667
work page 2006
-
[8]
J. Mercer, Functions of positive and negative type and their connection with the theory of integral equa- tions, Philosophical Transactions of the Royal Society A, 209 (441–458): 415–446, 1909
work page 1909
-
[9]
V. Ya. Lin, A. Pinkus, Fundamentality of ridge functions. J. Approx. Theory 75 (1993), no. 3, 295-311
work page 1993
-
[10]
Pinkus, Approximation theory of the MLP model in neural networks
A. Pinkus, Approximation theory of the MLP model in neural networks . Acta numerica, 1999, 143-195, Acta Numer., 8, Cambridge Univ. Press, Cambridge, 1999
work page 1999
-
[11]
A. Rahimi and B. Recht, Random features for large-scale kernel machines , Advances in Neural Infor- mation Processing Systems, pages 1177-1184, 2008
work page 2008
-
[12]
Smale, Steve; Yao, Yuan Online learning algorithms , Found. Comput. Math. 6 (2006), no. 2, 145-170. Faculty of Applied Mathematics, The Gda ´nsk University of Technology, ul. G. Naru- towicza 11/12, 80-952 Gda ´nsk, Poland Email address : karol.dziedziul@pg.edu.pl Faculty of Applied Mathematics, The Gda ´nsk University of Technology, ul. G. Naru- towicza ...
work page 2006
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.