Mercer Large-Scale Kernel Machines from Ridge Function Perspective

Karol Dziedziul; Pawe{\l} Wieczy\'nski; Sergey Kryzhevich

arxiv: 2307.11925 · v3 · pith:TPM7KNCZnew · submitted 2023-07-21 · 💻 cs.LG · math.CA

Mercer Large-Scale Kernel Machines from Ridge Function Perspective

Karol Dziedziul , Sergey Kryzhevich , Pawe{\l} Wieczy\'nski This is my paper

Pith reviewed 2026-05-24 07:21 UTC · model grok-4.3

classification 💻 cs.LG math.CA

keywords Mercer kernelsridge functionskernel approximationcosine productslarge-scale kernel machinesimage processingone-versus-rest

0 comments

The pith

Ridge function theory shows which kernels admit approximation by sums of cosine products and identifies the obstacles to this approach.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper links large-scale kernel machines to results on ridge functions to analyze approximations of Mercer kernels. It examines the conditions under which a kernel can be expressed as a sum of products of cosine functions depending on the inputs and output. The analysis identifies concrete obstacles that prevent this representation from holding for arbitrary kernels. This matters because the scope of the approximation determines which kernels can be used efficiently in applications such as image processing with a one-versus-rest procedure.

Core claim

Results on the fundamentality of ridge functions can be transferred to examine the approximation of kernels by sums of products of cosine functions, revealing that this approach encounters obstacles and does not apply without restrictions on the kernel or the associated measure.

What carries the argument

The representation of the kernel as a sum of products of cosine functions with arguments depending on x and y, which acts as the explicit approximator derived from the ridge function perspective.

If this is right

Only kernels satisfying conditions inherited from ridge function density results can be approximated in this way.
Obstacles arise when the sampling measure cannot be chosen to match the requirements of the ridge function expansion.
The resulting procedure applies directly to image classification tasks via the one-versus-rest strategy.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

New kernel families could be constructed to satisfy the conditions needed for the cosine product representation.
Similar obstacles may appear when applying ridge function ideas to other high-dimensional approximation problems.
Numerical checks on standard kernels would reveal whether the obstacles manifest in practice for common choices.

Load-bearing premise

The density and approximation results for ridge functions transfer directly to the random-feature construction without additional restrictions on the kernel or the sampling measure.

What would settle it

A concrete kernel for which no finite or infinite sum of such cosine products can approximate it to arbitrary accuracy under the given sampling would show that the transfer of ridge function results does not hold.

read the original abstract

To present Mercer large-scale kernel machines from a ridge function perspective, we recall the results by Lin and Pinkus from {\it Fundamentality of ridge functions}. We consider the main result of the recent paper by Rachimi and Recht, 2008, {\it Random features for large-scale kernel machines} from the Approximation Theory point of view. We study which kernels could be approximated by a sum of products of cosine functions with arguments depending on $x$ and $y$ and present the obstacles of such an approach. The results of this article are applied to Image Processing by procedure "one-vs-rest".

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper re-examines Rahimi-Recht random features through Lin-Pinkus ridge functions, flags obstacles to cosine-product approximations, but derives no new results.

read the letter

The core move is to take the 2008 random-feature construction and view it through the existing Lin-Pinkus density results on ridge functions. The paper asks which kernels can be approximated by sums of products of cosines whose arguments depend on both x and y, then lists obstacles to that route. It closes with a standard one-vs-rest run on image data. That framing is reasonable and the decision to emphasize obstacles rather than claim an unconditional transfer is honest; it avoids overstating what the recalled theorem gives for Monte-Carlo feature maps. The link between ridge functions and the cosine features is a direct one worth noting for people already working in that corner of approximation theory. Beyond the re-framing, nothing new appears: no fresh density statement, no explicit conditions derived from the ridge-function theorem, no counter-example constructed, and no change to the sampling procedure. The image-processing experiment is the usual baseline application and supplies no additional support for the theoretical claims. The work stays inside the narrow subfield of large-scale kernel approximation and will mainly interest readers who already know both the Rahimi-Recht paper and the Lin-Pinkus results. It does not look strong enough to justify sending out for serious refereeing; the incremental nature and lack of new derivations or falsifiable predictions make it a desk-reject candidate.

Referee Report

0 major / 3 minor

Summary. The manuscript recalls the fundamentality results for ridge functions due to Lin and Pinkus, re-examines the random-feature construction of Rahimi and Recht from an approximation-theoretic viewpoint, identifies the class of kernels that admit approximation by finite sums of products of cosines whose arguments depend on both x and y, enumerates obstacles to such approximations, and illustrates the framework on an image-processing task via a one-vs-rest procedure.

Significance. If the obstacles are stated with explicit conditions on the kernel and the sampling measure, the perspective could usefully constrain the design of random-feature maps for kernels outside the scope of standard Fourier constructions. The work does not advance new positive approximation theorems or machine-checked proofs, so its primary contribution lies in the synthesis and the identification of barriers rather than in strengthened guarantees.

minor comments (3)

[Abstract] Abstract: the claim that the results are 'applied to Image Processing' is not accompanied by any dataset description, baseline comparison, or quantitative metric, making the practical illustration difficult to evaluate.
[Section 3 (or equivalent)] The manuscript cites Lin-Pinkus and Rahimi-Recht but does not state whether the cosine-product form is shown to be dense under the same hypotheses as the original ridge-function results or whether additional restrictions on the kernel or measure are required.
[Section 4] Notation: the distinction between the random-feature Monte-Carlo estimator and the deterministic ridge-function approximation is not always maintained when obstacles are listed; a short clarifying paragraph would help.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the summary, significance assessment, and recommendation for minor revision. We respond point-by-point to the observations on explicit conditions and the nature of the contribution.

read point-by-point responses

Referee: If the obstacles are stated with explicit conditions on the kernel and the sampling measure, the perspective could usefully constrain the design of random-feature maps for kernels outside the scope of standard Fourier constructions.

Authors: We agree. In the revision we will add explicit statements of the conditions on the Mercer kernel (continuity, positive-definiteness) and on the sampling measure under which the approximation obstacles hold, directly linking them to the Lin-Pinkus fundamentality results. revision: yes
Referee: The work does not advance new positive approximation theorems or machine-checked proofs, so its primary contribution lies in the synthesis and the identification of barriers rather than in strengthened guarantees.

Authors: We concur with this characterization. The manuscript deliberately focuses on re-examining Rahimi-Recht random features via ridge functions, recalling existing results, and enumerating obstacles rather than deriving new approximation rates or formal proofs. revision: no

Circularity Check

0 steps flagged

No significant circularity; relies on external citations

full rationale

The paper recalls Lin-Pinkus density results for ridge functions and the Rahimi-Recht random-feature construction as external inputs, then examines which kernels admit cosine-product approximations while noting obstacles. No load-bearing step reduces to a self-citation chain, fitted parameter renamed as prediction, or self-definitional equivalence. All central claims rest on cited approximation-theory results from independent authors, with the present work applying those results to a new question about cosine sums without re-deriving or fitting the inputs. This is the normal case of a self-contained derivation against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper rests on the density properties of ridge functions proved by Lin and Pinkus and on the random-feature construction of Rahimi and Recht; no new free parameters, invented entities, or ad-hoc axioms are introduced in the abstract.

axioms (1)

standard math Ridge functions are dense in the space of continuous functions on compact sets under suitable conditions (Lin-Pinkus).
Invoked to justify viewing random-feature approximations as ridge-function sums.

pith-pipeline@v0.9.0 · 5633 in / 1182 out tokens · 31618 ms · 2026-05-24T07:21:41.029962+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages

[1]

Belkin, Fit without fear: Remarkable mathematical phenomena of deep learning through the prism of interpolation

M. Belkin, Fit without fear: Remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numerica, 30, 203-248, 2021

work page 2021
[2]

Binev, A

P. Binev, A. Cohen, W. Dahmen, R. DeVore and V. Temlyakov, Universal algorithms for learning theory. Part I : piecewise constant functions, J. Machine Learning Res. 6 (2005), 1297–1321

work page 2005
[3]

Christmann and I

A. Christmann and I. Steinwart, Support Vector Machines . Springer, Berlin, 2008

work page 2008
[4]

Cucker, F. and S. Smale On the mathematical foundations of learning . Bull. of the Amer. Math. Soc. 29 (1), 1–49. 16

work page
[5]

De Vito, L

E. De Vito, L. Rosasco, and A. Rudi Regularization: From Inverse Problems to Large-Scale Machine Learning 245-296. Harmonic and Applied Analysis From Radon Transforms to Machine Learning Book Series: Applied and Numerical Harmonic Analysis Springer International Publishing 2021

work page 2021
[6]

LeCun, B

Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard and L. D. Jackel: Back- propagation Applied to Handwritten Zip Code Recognition , Neural Computation, 1(4):541-551, Winter 1989

work page 1989
[7]

Micchelli, Charles A.; Xu, Yuesheng; Zhang, Haizhang Universal kernels. J. Mach. Learn. Res. 7 (2006), 2651-2667

work page 2006
[8]

J. Mercer, Functions of positive and negative type and their connection with the theory of integral equa- tions, Philosophical Transactions of the Royal Society A, 209 (441–458): 415–446, 1909

work page 1909
[9]

V. Ya. Lin, A. Pinkus, Fundamentality of ridge functions. J. Approx. Theory 75 (1993), no. 3, 295-311

work page 1993
[10]

Pinkus, Approximation theory of the MLP model in neural networks

A. Pinkus, Approximation theory of the MLP model in neural networks . Acta numerica, 1999, 143-195, Acta Numer., 8, Cambridge Univ. Press, Cambridge, 1999

work page 1999
[11]

Rahimi and B

A. Rahimi and B. Recht, Random features for large-scale kernel machines , Advances in Neural Infor- mation Processing Systems, pages 1177-1184, 2008

work page 2008
[12]

Smale, Steve; Yao, Yuan Online learning algorithms , Found. Comput. Math. 6 (2006), no. 2, 145-170. Faculty of Applied Mathematics, The Gda ´nsk University of Technology, ul. G. Naru- towicza 11/12, 80-952 Gda ´nsk, Poland Email address : karol.dziedziul@pg.edu.pl Faculty of Applied Mathematics, The Gda ´nsk University of Technology, ul. G. Naru- towicza ...

work page 2006

[1] [1]

Belkin, Fit without fear: Remarkable mathematical phenomena of deep learning through the prism of interpolation

M. Belkin, Fit without fear: Remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numerica, 30, 203-248, 2021

work page 2021

[2] [2]

Binev, A

P. Binev, A. Cohen, W. Dahmen, R. DeVore and V. Temlyakov, Universal algorithms for learning theory. Part I : piecewise constant functions, J. Machine Learning Res. 6 (2005), 1297–1321

work page 2005

[3] [3]

Christmann and I

A. Christmann and I. Steinwart, Support Vector Machines . Springer, Berlin, 2008

work page 2008

[4] [4]

Cucker, F. and S. Smale On the mathematical foundations of learning . Bull. of the Amer. Math. Soc. 29 (1), 1–49. 16

work page

[5] [5]

De Vito, L

E. De Vito, L. Rosasco, and A. Rudi Regularization: From Inverse Problems to Large-Scale Machine Learning 245-296. Harmonic and Applied Analysis From Radon Transforms to Machine Learning Book Series: Applied and Numerical Harmonic Analysis Springer International Publishing 2021

work page 2021

[6] [6]

LeCun, B

Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard and L. D. Jackel: Back- propagation Applied to Handwritten Zip Code Recognition , Neural Computation, 1(4):541-551, Winter 1989

work page 1989

[7] [7]

Micchelli, Charles A.; Xu, Yuesheng; Zhang, Haizhang Universal kernels. J. Mach. Learn. Res. 7 (2006), 2651-2667

work page 2006

[8] [8]

J. Mercer, Functions of positive and negative type and their connection with the theory of integral equa- tions, Philosophical Transactions of the Royal Society A, 209 (441–458): 415–446, 1909

work page 1909

[9] [9]

V. Ya. Lin, A. Pinkus, Fundamentality of ridge functions. J. Approx. Theory 75 (1993), no. 3, 295-311

work page 1993

[10] [10]

Pinkus, Approximation theory of the MLP model in neural networks

A. Pinkus, Approximation theory of the MLP model in neural networks . Acta numerica, 1999, 143-195, Acta Numer., 8, Cambridge Univ. Press, Cambridge, 1999

work page 1999

[11] [11]

Rahimi and B

A. Rahimi and B. Recht, Random features for large-scale kernel machines , Advances in Neural Infor- mation Processing Systems, pages 1177-1184, 2008

work page 2008

[12] [12]

Smale, Steve; Yao, Yuan Online learning algorithms , Found. Comput. Math. 6 (2006), no. 2, 145-170. Faculty of Applied Mathematics, The Gda ´nsk University of Technology, ul. G. Naru- towicza 11/12, 80-952 Gda ´nsk, Poland Email address : karol.dziedziul@pg.edu.pl Faculty of Applied Mathematics, The Gda ´nsk University of Technology, ul. G. Naru- towicza ...

work page 2006