A Universal Reproducing Kernel Hilbert Space from Polynomial Alignment and IMQ Distance

Taha Bouhsine

arxiv: 2605.03262 · v1 · submitted 2026-05-05 · 💻 cs.LG

A Universal Reproducing Kernel Hilbert Space from Polynomial Alignment and IMQ Distance

Taha Bouhsine This is my paper

Pith reviewed 2026-05-07 17:36 UTC · model grok-4.3

classification 💻 cs.LG

keywords Yat kerneluniversal RKHScharacteristic kernelsinverse multiquadricneural network layersRademacher boundslearned centersfinite expansions

0 comments

The pith

The Yat kernel generates a universal characteristic RKHS in which shared-parameter neural layers become finite expansions of learned centers with closed-form norms and Rademacher bounds.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces the Yat kernel formed by the square of a linear alignment term over an inverse-multiquadric distance plus epsilon. For positive bias the kernel dominates a scaled IMQ kernel in the Loewner order, which immediately yields universality, characteristicness, and strict positive definiteness on every compact domain. Because any IMQ atom can be recovered exactly from three Yat atoms via a second finite difference in the bias, a trained layer with shared b and epsilon is simply a finite sum of kernel sections centered at learned points inside one fixed RKHS. The expansion carries an explicit norm given by alpha transpose K alpha together with an explicit diagonal term that directly supplies a Rademacher generalization bound. The polynomial numerator additionally supplies non-radial directional far-field behavior absent from pure IMQ expansions.

Core claim

The Yat kernel k_{b,ε}(w,x) = (w^T x + b)^2 / (||x-w||^2 + ε) is positive semi-definite for b ≥ 0. When b > 0 it dominates a scaled IMQ kernel in the Loewner order on every compact domain, granting fixed-kernel universality and characteristicness. Any single IMQ atom is recovered exactly as a linear combination of three positive-bias Yat atoms by taking the second finite difference in the bias parameter. A trained shared-(b,ε) Yat layer is therefore a finite learned-center expansion inside this universal characteristic RKHS, equipped with the closed-form norm α^T K α and the explicit diagonal (||x||^2 + b)^2 / ε that drives a Rademacher generalization bound.

What carries the argument

The Yat kernel, a rational function whose quadratic polynomial numerator supplies non-radial alignment channels while its IMQ-style denominator produces Loewner domination over scaled inverse-multiquadric kernels.

If this is right

A trained layer can be read directly as an RKHS expansion whose norm is computed from the Gram matrix without further approximation.
Rademacher generalization bounds follow immediately from the explicit diagonal term once the layer weights are known.
Exact algebraic recovery of IMQ atoms from three Yat atoms permits direct substitution or mixing with existing IMQ-based kernel methods.
The far-field directional trace (u^T w)^2 encodes alignment preferences that standard radial IMQ expansions cannot produce.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The bias parameter supplies a continuous dial between pure IMQ behavior at large b and stronger polynomial alignment at moderate b.
Replacing standard RBF layers with shared-parameter Yat layers could give kernel machines both universality and explicit norm control in one architecture.
The same finite-difference relation may let other rational kernels be decomposed into or built from Yat primitives.

Load-bearing premise

The kernel is positive semi-definite for every b at least zero and dominates the scaled IMQ kernel in the Loewner sense for b greater than zero on all compact domains.

What would settle it

Form the Gram matrix for a small collection of distinct points using b=1 and ε=1 and compute its eigenvalues; any negative eigenvalue falsifies positive semi-definiteness.

read the original abstract

We introduce the Yat kernel $$k_{b,\varepsilon}(\mathbf{w},\mathbf{x})=\frac{(\mathbf{w}^\top\mathbf{x}+b)^2}{\|\mathbf{x}-\mathbf{w}\|^2+\varepsilon},\qquad b\ge 0,\ \varepsilon>0,$$ a rational hidden-unit primitive whose units are Mercer sections over a shared input/weight space. For $b\ge 0$ the kernel is PSD; for $b>0$ it dominates a scaled inverse-multiquadric (IMQ) in the Loewner order, yielding fixed-kernel universality, characteristicness, and strict positive definiteness on every compact domain. The polynomial numerator opens nonradial alignment channels absent from finite IMQ expansions, witnessed by the directional far-field trace $T_\infty g_\varepsilon(\cdot;\mathbf{w},b)(\mathbf{u})=(\mathbf{u}^\top\mathbf{w})^2$. Algebraically, a second finite difference in the bias recovers any IMQ atom from three positive-bias Yat atoms exactly, sharp at three atoms in every dimension at exact pointwise equality. A trained shared-$(b,\varepsilon)$ Yat layer is therefore a finite learned-center expansion in a fixed universal characteristic RKHS, with closed-form norm $\boldsymbol{\alpha}^\top\mathbf{K}\boldsymbol{\alpha}$ and explicit diagonal $(\|\mathbf{x}\|^2+b)^2/\varepsilon$ driving a Rademacher generalization bound.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The Yat kernel gives an exact three-atom finite-difference recovery of any IMQ atom plus a claimed Loewner domination that would transfer universality, but the domination step is asserted without visible derivation.

read the letter

The paper defines the Yat kernel as (w^T x + b)^2 over (||x - w||^2 + eps) with b at least zero. Its clearest new piece is the algebraic identity: any single IMQ atom equals the second finite difference of three Yat atoms at distinct positive biases. This equality is pointwise, holds in every dimension, and is stated to be sharp at three atoms. That construction is clean and does not appear in the cited IMQ or polynomial kernel literature.

Referee Report

2 major / 1 minor

Summary. The paper introduces the Yat kernel k_{b,ε}(w,x) = (w^T x + b)^2 / (||x - w||^2 + ε) for b ≥ 0, ε > 0. It claims this kernel is positive semi-definite for b ≥ 0; for b > 0 it dominates a scaled inverse-multiquadric (IMQ) kernel in the Loewner order on every compact domain, implying the kernel is universal, characteristic, and strictly positive definite. The polynomial numerator supplies non-radial alignment channels (e.g., far-field trace T_∞ g_ε(u) = (u^T w)^2). Algebraically, any IMQ atom is recovered exactly from three positive-bias Yat atoms by a second finite difference. Consequently, a trained shared-(b,ε) Yat layer is a finite learned-center expansion in a fixed universal characteristic RKHS with closed-form norm α^T K α and an explicit diagonal term driving a Rademacher bound.

Significance. If the PSD property and Loewner domination hold, the construction supplies a new universal kernel that augments IMQ with explicit polynomial alignment channels while preserving an exact algebraic link to IMQ atoms. The closed-form RKHS norm and Rademacher bound would be concrete assets for generalization analysis of certain neural-network layers. The finite-difference recovery identity is a clean, dimension-independent algebraic result that strengthens the connection to existing kernels.

major comments (2)

[Abstract] Abstract: the assertion that k_{b,ε} dominates a scaled IMQ kernel in the Loewner order (K_Yat ≽ c K_IMQ for some c > 0) on every compact domain is stated without an explicit derivation of the matrix inequality or verification that the difference operator is positive semi-definite on finite point sets. Because this domination is the sole route used to transfer universality and characteristicness from the known IMQ kernel, the central claim that the trained layer lives in a fixed universal RKHS rests on an unshown step.
[Abstract] Abstract: no explicit construction of the Gram matrix or numerical check confirming positive semi-definiteness for b ≥ 0 (or the domination for b > 0) is supplied, even though the abstract presents these as established facts; a short self-contained proof or counter-example-free verification on low-dimensional point clouds would be required to make the PSD and domination claims load-bearing.

minor comments (1)

[Abstract] Abstract: the directional far-field trace T_∞ g_ε(·;w,b)(u) is introduced without an inline definition or reference to its derivation; a one-sentence reminder of its meaning would aid readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and for highlighting the need for explicit support of the central PSD and Loewner-domination claims. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: the assertion that k_{b,ε} dominates a scaled IMQ kernel in the Loewner order (K_Yat ≽ c K_IMQ for some c > 0) on every compact domain is stated without an explicit derivation of the matrix inequality or verification that the difference operator is positive semi-definite on finite point sets. Because this domination is the sole route used to transfer universality and characteristicness from the known IMQ kernel, the central claim that the trained layer lives in a fixed universal RKHS rests on an unshown step.

Authors: We agree that the abstract presents the domination result without the supporting derivation. In the revision we will add an explicit algebraic proof that the difference kernel yields a positive semi-definite Gram matrix on any finite point set drawn from a compact domain, together with the choice of c that makes the inequality hold. This will make the transfer of universality and characteristicness fully rigorous and self-contained. revision: yes
Referee: [Abstract] Abstract: no explicit construction of the Gram matrix or numerical check confirming positive semi-definiteness for b ≥ 0 (or the domination for b > 0) is supplied, even though the abstract presents these as established facts; a short self-contained proof or counter-example-free verification on low-dimensional point clouds would be required to make the PSD and domination claims load-bearing.

Authors: We accept the referee's point that the current manuscript lacks an explicit Gram-matrix construction and numerical verification. The revised version will contain a short, self-contained proof of positive semi-definiteness for b ≥ 0, the corresponding Loewner-order argument for b > 0, and counter-example-free numerical checks on low-dimensional random point clouds (e.g., 2-D and 3-D) confirming that the minimal eigenvalues of the relevant Gram matrices remain non-negative. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation relies on explicit kernel definition and algebraic identities

full rationale

The paper defines the Yat kernel explicitly as a rational function of inputs and parameters, then states its PSD property for b ≥ 0 and Loewner domination of scaled IMQ for b > 0 as asserted properties that transfer known universality from IMQ. The algebraic recovery of IMQ atoms via second finite difference in bias is presented as an exact pointwise identity, not a fit or renaming. The closed-form norm αᵀKα follows directly from the Mercer kernel definition without post-hoc adjustment or self-reference. No self-citations, uniqueness theorems imported from prior author work, or fitted parameters renamed as predictions appear in the derivation chain. The central claim remains independent of its inputs once the kernel form and domination are granted.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claims rest on standard facts about Mercer kernels and Loewner order plus the new kernel definition itself; no additional free parameters are fitted inside the derivation, and no new physical entities are postulated.

axioms (2)

standard math A kernel is positive semi-definite if and only if it is a Mercer kernel inducing a valid RKHS.
Invoked implicitly when asserting PSD for b ≥ 0.
domain assumption Loewner-order domination of a scaled IMQ kernel implies universality and characteristicness on compact domains.
Used to transfer known IMQ properties to the Yat kernel.

invented entities (1)

Yat kernel no independent evidence
purpose: New rational kernel primitive combining polynomial alignment with IMQ distance.
Defined directly in the abstract; no independent existence claim beyond the mathematical construction.

pith-pipeline@v0.9.0 · 5550 in / 1501 out tokens · 103408 ms · 2026-05-07T17:36:48.929127+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages

[1]

universal ) characteristic ) SPD

treats layered networks as compositions of Mercer-section objects and is the closest an- tecedent to our prefix-pullback formulation in Section 6; deep kernel learning [ Wilson et al. , 2016] places a GP on top of a learned feature extractor; the pullback formalism we use in Section 6 is closely related to the convolutional kernel networks of Mairal et al...

work page 2016
[2]

> 0 and every compact X Rd, the IMQ span FIMQ,ε = spanf(k wk2 +

surveys the design space of kernel families including non-stationary and anisotropic constructions, Paciorek and Schervish [2006] develop a class of non-stationary covariance functions parameterised by spatially varying length scales, and Wilson and Adams [2013] introduce spectral mixture kernels that obtain alignment-style modulation through a sum over f...

work page 2006
[3]

factors over C as (z w ip

Subcase w = w0: D =D0, so N = . The degree- 2 homogeneous part of N is (c1 +c2)(x⊤w0)2. For d 2 and w0 6= 0 this vanishes only if c1 +c2 = 0; the degree- 1 part then forces c1b1 +c2b2 = 0, so b1 =b2, and N =b2 1(c1 +c2) = 0 , contradicting 6= 0. Subcase w 6= w0: by Lemma 2, D and D0 are distinct irreducibles in C[x], hence coprime. FromND 0 = D: D j ND 0 ...

work page 2006
[4]

d/N ). G Far-Field Separations We write kb,ε(w; x) = (w⊤x +b)2 kx wk2 +

delivers fill-distance approximation rates that are exponential in 1/hΞ for native- space targets, in contrast to the polynomial hk Ξ rates of C k-Sobolev kernels; under the bias finite-difference reduction these exponential rates transfer to Fⵟ with at most a factor- 3 overhead. In the opposite direction, the codimension-one zero set kⵟ( ; w)jw⊥ = 0 prod...

work page 2012
[5]

(Banach algebra.) H s(U ) is a Banach algebra: there exists Calg 1 such that kfg kH s(U ) Calgkf kH s(U )kgkH s(U ) for all f;g 2H s(U )

work page
[6]

ℓ : Assume the uniform budgets kwℓ,jk2 Wℓ; jbℓj Bℓ; 0<

(Nemytskii inverse stability.) If r 2H s(U ) satisfiesr(x) > 0 for every x 2U , then 1/r 2 H s(U ). Moreover, there exists a nondecreasing function Γs : (0; 1) [0; 1) ! [0; 1) such that k1/rkH s(U ) Γs( −1; krkH s(U )). Proof. Part (1): the Sobolev multiplication theorem for s > d0/2 on Lipschitz extension domains [Adams and Fournier, 2003, Thm. 4.39]. Pa...

work page 2003
[7]

( maxR2 ). Set 2 := Λ maxR2

For 2 p dR2(dR2 +")/ the second term is at most /(4dR2), and for 8R3d3/2/ the first term is at most /(4dR2); hence j j /(2dR2) 1/2 (for dR2, otherwise the conclusion follows trivially by taking even larger). The bound j1/(1 + ) 1j 2j j for j j 1/2 gives kⵟ( w⋆; x) (w⊤ ⋆ x)2 (w⊤ ⋆ x)2 2j j dR2 2 /(2dR2) = : Proposition 9 (Spectral Yat approximation of any ...

work page 2020
[8]

Claims Question: Do the main claims made in the abstract and introduction accurately reflect the paper’s contributions and scope? Answer: [Yes] Justification: The four contributions in the introduction (PSD/Mercer + Loewner- domination universality, nonradial alignment via directional trace, exact IMQ re- covery with three-atom tightness, and layer-local ...

work page
[9]

Each is paired with the scope-bounded remedy or the open problem it implies

Limitations Question: Does the paper discuss the limitations of the work performed by the authors? Answer: [Yes] 50 Justification: Section 6 contains a dedicated limitations discussion: shared (b;" ) requirement for the exact RKHS norm, qualitative (not rate-form) universality, two- tier rather than exact deep-stack theory, extrapolative exterior-shell se...

work page
[10]

Theory assumptions and proofs Question: For each theoretical result, does the paper provide the full set of assump- tions and a complete (and correct) proof? Answer: [Yes] Justification: Every theorem, proposition, lemma, and corollary states its assump- tions inline (b 0, "> 0, compactness of X , etc.); notation is consolidated in Ap- pendix A, generic R...

work page
[11]

All three are diagnostic experiments accompanying a theoretical paper, not bench- mark contributions; the paper makes no performance-superiority claim from them

Experimental result reproducibility Question: Does the paper fully disclose all the information needed to reproduce the main experimental results of the paper? Answer: [Yes] Justification: The CLIP probe (Appendix R), the directional-tail benchmark (Ap- pendix S), and the 261M causal-LM proof-of-concept (Appendix T) report the archi- tecture, optimizer, l...

work page
[12]

Open access to data and code Question: Does the paper provide open access to the data and code, with suﬀicient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: Reproduction scripts for both diagnostic experiments are referenced in the appendix; CLIP features are public ...

work page
[13]

The deliberate under-tuning of RBF/IMQ baselines is disclosed explicitly

Experimental setting/details Question: Does the paper specify all the training and test details (e.g., data splits, hyperparameters, how they were chosen, type of optimizer, etc.) necessary to un- derstand the results? Answer: [Yes] Justification: Appendix R specifies the optimizer (Adam), six-point learning-rate grid, three-seed averaging, and the per-va...

work page
[14]

Experiment statistical significance Question: Does the paper report error bars suitably and correctly defined or other appropriate information about the statistical significance of the experiments? Answer: [Yes] Justification: The CLIP probe table reports mean standard deviation across three seeds; the per-class Spearman correlation is reported with its s...

work page
[15]

Experiments compute resources 51 Question: For each experiment, does the paper provide suﬀicient information on the computer resources (type of compute workers, memory, time of execution) needed to reproduce the experiments? Answer: [Yes] Justification: The directional-tail benchmark runs in approximately 6 s on Apple Silicon M-series; the CLIP probe is s...

work page
[16]

Code of ethics Question: Does the research conducted in the paper conform, in every respect, with the NeurIPS Code of Ethics? Answer: [Yes] Justification: The paper is a theoretical contribution about a kernel construction; no human subjects, no personal data, no harmful applications

work page
[17]

Broader impacts Question: Does the paper discuss both potential positive societal impacts and neg- ative societal impacts of the work performed? Answer: [NA] Justification: The paper develops mathematical foundations for a hidden-unit prim- itive; broader societal impact is not directly applicable to this contribution

work page
[18]

Safeguards Question: Does the paper describe safeguards that have been put in place for re- sponsible release of data or models with a high risk for misuse? Answer: [NA] Justification: No new datasets or pretrained models are released

work page
[19]

, 2021] (MIT license) and ImageNet-1k validation features [ Deng et al

Licenses for existing assets Question: Are the creators or original owners of assets (e.g., code, data, models), used in the paper, properly credited and are the license and terms of use explicitly mentioned and properly respected? Answer: [Yes] Justification: The CLIP ViT-B/32 model [ Radford et al. , 2021] (MIT license) and ImageNet-1k validation featur...

work page 2021
[20]

New assets Question: Are new assets introduced in the paper well documented and is the documentation provided alongside the assets? Answer: [NA] Justification: The paper introduces a kernel construction and theoretical results, not new assets requiring documentation

work page
[21]

Crowdsourcing and research with human subjects Question: For crowdsourcing experiments and research with human subjects, does the paper include the full text of instructions given to participants and screenshots, if applicable, as well as details about compensation (if any)? Answer: [NA] Justification: No human subjects

work page
[22]

Institutional review board (IRB) approvals or equivalent for research with human subjects 52 Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country ...

work page
[23]

Declaration of LLM usage Question: Does the paper describe the usage of LLMs if it is an important, original, or non-standard component of the core methods in this research? Answer: [NA] Justification: LLMs are not part of the methodology of this paper. 53

work page

[1] [1]

universal ) characteristic ) SPD

treats layered networks as compositions of Mercer-section objects and is the closest an- tecedent to our prefix-pullback formulation in Section 6; deep kernel learning [ Wilson et al. , 2016] places a GP on top of a learned feature extractor; the pullback formalism we use in Section 6 is closely related to the convolutional kernel networks of Mairal et al...

work page 2016

[2] [2]

> 0 and every compact X Rd, the IMQ span FIMQ,ε = spanf(k wk2 +

surveys the design space of kernel families including non-stationary and anisotropic constructions, Paciorek and Schervish [2006] develop a class of non-stationary covariance functions parameterised by spatially varying length scales, and Wilson and Adams [2013] introduce spectral mixture kernels that obtain alignment-style modulation through a sum over f...

work page 2006

[3] [3]

factors over C as (z w ip

Subcase w = w0: D =D0, so N = . The degree- 2 homogeneous part of N is (c1 +c2)(x⊤w0)2. For d 2 and w0 6= 0 this vanishes only if c1 +c2 = 0; the degree- 1 part then forces c1b1 +c2b2 = 0, so b1 =b2, and N =b2 1(c1 +c2) = 0 , contradicting 6= 0. Subcase w 6= w0: by Lemma 2, D and D0 are distinct irreducibles in C[x], hence coprime. FromND 0 = D: D j ND 0 ...

work page 2006

[4] [4]

d/N ). G Far-Field Separations We write kb,ε(w; x) = (w⊤x +b)2 kx wk2 +

delivers fill-distance approximation rates that are exponential in 1/hΞ for native- space targets, in contrast to the polynomial hk Ξ rates of C k-Sobolev kernels; under the bias finite-difference reduction these exponential rates transfer to Fⵟ with at most a factor- 3 overhead. In the opposite direction, the codimension-one zero set kⵟ( ; w)jw⊥ = 0 prod...

work page 2012

[5] [5]

(Banach algebra.) H s(U ) is a Banach algebra: there exists Calg 1 such that kfg kH s(U ) Calgkf kH s(U )kgkH s(U ) for all f;g 2H s(U )

work page

[6] [6]

ℓ : Assume the uniform budgets kwℓ,jk2 Wℓ; jbℓj Bℓ; 0<

(Nemytskii inverse stability.) If r 2H s(U ) satisfiesr(x) > 0 for every x 2U , then 1/r 2 H s(U ). Moreover, there exists a nondecreasing function Γs : (0; 1) [0; 1) ! [0; 1) such that k1/rkH s(U ) Γs( −1; krkH s(U )). Proof. Part (1): the Sobolev multiplication theorem for s > d0/2 on Lipschitz extension domains [Adams and Fournier, 2003, Thm. 4.39]. Pa...

work page 2003

[7] [7]

( maxR2 ). Set 2 := Λ maxR2

For 2 p dR2(dR2 +")/ the second term is at most /(4dR2), and for 8R3d3/2/ the first term is at most /(4dR2); hence j j /(2dR2) 1/2 (for dR2, otherwise the conclusion follows trivially by taking even larger). The bound j1/(1 + ) 1j 2j j for j j 1/2 gives kⵟ( w⋆; x) (w⊤ ⋆ x)2 (w⊤ ⋆ x)2 2j j dR2 2 /(2dR2) = : Proposition 9 (Spectral Yat approximation of any ...

work page 2020

[8] [8]

Claims Question: Do the main claims made in the abstract and introduction accurately reflect the paper’s contributions and scope? Answer: [Yes] Justification: The four contributions in the introduction (PSD/Mercer + Loewner- domination universality, nonradial alignment via directional trace, exact IMQ re- covery with three-atom tightness, and layer-local ...

work page

[9] [9]

Each is paired with the scope-bounded remedy or the open problem it implies

Limitations Question: Does the paper discuss the limitations of the work performed by the authors? Answer: [Yes] 50 Justification: Section 6 contains a dedicated limitations discussion: shared (b;" ) requirement for the exact RKHS norm, qualitative (not rate-form) universality, two- tier rather than exact deep-stack theory, extrapolative exterior-shell se...

work page

[10] [10]

Theory assumptions and proofs Question: For each theoretical result, does the paper provide the full set of assump- tions and a complete (and correct) proof? Answer: [Yes] Justification: Every theorem, proposition, lemma, and corollary states its assump- tions inline (b 0, "> 0, compactness of X , etc.); notation is consolidated in Ap- pendix A, generic R...

work page

[11] [11]

All three are diagnostic experiments accompanying a theoretical paper, not bench- mark contributions; the paper makes no performance-superiority claim from them

Experimental result reproducibility Question: Does the paper fully disclose all the information needed to reproduce the main experimental results of the paper? Answer: [Yes] Justification: The CLIP probe (Appendix R), the directional-tail benchmark (Ap- pendix S), and the 261M causal-LM proof-of-concept (Appendix T) report the archi- tecture, optimizer, l...

work page

[12] [12]

Open access to data and code Question: Does the paper provide open access to the data and code, with suﬀicient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: Reproduction scripts for both diagnostic experiments are referenced in the appendix; CLIP features are public ...

work page

[13] [13]

The deliberate under-tuning of RBF/IMQ baselines is disclosed explicitly

Experimental setting/details Question: Does the paper specify all the training and test details (e.g., data splits, hyperparameters, how they were chosen, type of optimizer, etc.) necessary to un- derstand the results? Answer: [Yes] Justification: Appendix R specifies the optimizer (Adam), six-point learning-rate grid, three-seed averaging, and the per-va...

work page

[14] [14]

Experiment statistical significance Question: Does the paper report error bars suitably and correctly defined or other appropriate information about the statistical significance of the experiments? Answer: [Yes] Justification: The CLIP probe table reports mean standard deviation across three seeds; the per-class Spearman correlation is reported with its s...

work page

[15] [15]

Experiments compute resources 51 Question: For each experiment, does the paper provide suﬀicient information on the computer resources (type of compute workers, memory, time of execution) needed to reproduce the experiments? Answer: [Yes] Justification: The directional-tail benchmark runs in approximately 6 s on Apple Silicon M-series; the CLIP probe is s...

work page

[16] [16]

Code of ethics Question: Does the research conducted in the paper conform, in every respect, with the NeurIPS Code of Ethics? Answer: [Yes] Justification: The paper is a theoretical contribution about a kernel construction; no human subjects, no personal data, no harmful applications

work page

[17] [17]

Broader impacts Question: Does the paper discuss both potential positive societal impacts and neg- ative societal impacts of the work performed? Answer: [NA] Justification: The paper develops mathematical foundations for a hidden-unit prim- itive; broader societal impact is not directly applicable to this contribution

work page

[18] [18]

Safeguards Question: Does the paper describe safeguards that have been put in place for re- sponsible release of data or models with a high risk for misuse? Answer: [NA] Justification: No new datasets or pretrained models are released

work page

[19] [19]

, 2021] (MIT license) and ImageNet-1k validation features [ Deng et al

Licenses for existing assets Question: Are the creators or original owners of assets (e.g., code, data, models), used in the paper, properly credited and are the license and terms of use explicitly mentioned and properly respected? Answer: [Yes] Justification: The CLIP ViT-B/32 model [ Radford et al. , 2021] (MIT license) and ImageNet-1k validation featur...

work page 2021

[20] [20]

New assets Question: Are new assets introduced in the paper well documented and is the documentation provided alongside the assets? Answer: [NA] Justification: The paper introduces a kernel construction and theoretical results, not new assets requiring documentation

work page

[21] [21]

Crowdsourcing and research with human subjects Question: For crowdsourcing experiments and research with human subjects, does the paper include the full text of instructions given to participants and screenshots, if applicable, as well as details about compensation (if any)? Answer: [NA] Justification: No human subjects

work page

[22] [22]

Institutional review board (IRB) approvals or equivalent for research with human subjects 52 Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country ...

work page

[23] [23]

Declaration of LLM usage Question: Does the paper describe the usage of LLMs if it is an important, original, or non-standard component of the core methods in this research? Answer: [NA] Justification: LLMs are not part of the methodology of this paper. 53

work page