Approximating Simple ReLU Networks based on Spectral Decomposition of Fisher Information
Pith reviewed 2026-05-19 13:43 UTC · model grok-4.3
The pith
In two-layer ReLU networks with random hidden weights, 97.7 percent of the Fisher information trace concentrates in the first three eigenspaces corresponding to spherical harmonics of order at most 2.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central discovery is that for two-layer ReLU networks with random hidden weights, the Fisher information matrix exhibits strong concentration of its eigenvalues in the first three eigenspaces, which account for 97.7% of the trace independently of the number of parameters. These eigenspaces are precisely the spherical harmonics of degree not greater than 2.
What carries the argument
The spectral decomposition of the Fisher information matrix, which isolates the dominant eigenspaces and maps them onto the spherical harmonic functions of orders 0, 1, and 2.
If this is right
- The effective dimension of the network's statistical model is bounded by the dimension of low-order spherical harmonics.
- This concentration explains why the Fisher matrix properties do not depend on the width of the network.
- The result provides an explicit basis for approximating the network using only quadratic and lower spherical harmonics.
- It links the Fisher information spectrum directly to the eigenfunctions in the Mercer expansion of the neural tangent kernel.
Where Pith is reading between the lines
- This suggests that optimization or sampling in these networks primarily operates in a low-order harmonic subspace, which could lead to better initialization strategies.
- Similar spectral analysis might apply to other activation functions or network depths, potentially revealing analogous low-order structures.
- One testable extension is to verify the concentration percentage for networks with non-random weights or different random distributions.
Load-bearing premise
The hidden-layer weights are drawn randomly from a fixed distribution and the network consists of exactly two layers with ReLU activations.
What would settle it
For a concrete two-layer ReLU network with randomly chosen hidden weights, compute its Fisher information matrix, extract the leading eigenvectors, and test whether they align with the spherical harmonics of order at most 2 while summing to about 97.7 percent of the matrix trace.
Figures
read the original abstract
Properties of Fisher information matrices of 2-layer neural ReLU networks with random hidden weights are studied. For these networks, it is known that the eigenvalue distribution highly concentrates on several eigenspaces approximately. In particular, the eigenvalues for the first three eigenspaces account for 97.7% of the trace of the Fisher information matrix, independently of the number of parameters. In this paper, we identify the function spaces which correspond to those major eigenspaces. This function space consists of the spherical harmonic functions whose orders are not greater than 2. This result relates to the Mercer decomposition of the neural tangent kernels.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript analyzes the Fisher information matrix of two-layer ReLU networks whose hidden-layer weights are drawn from a fixed random distribution. It establishes that the eigenvalue spectrum concentrates on the first three eigenspaces, which together account for 97.7% of the trace independently of the total number of parameters, and identifies the corresponding function spaces with spherical harmonics of order at most 2. The identification is obtained through the Mercer decomposition of the neural tangent kernel induced by the random ReLU network.
Significance. If the stated concentration and harmonic identification hold under the paper's assumptions, the result supplies an explicit, low-dimensional characterization of the dominant directions in the Fisher metric for this simple architecture. This could support reduced-order approximations or analyses of curvature in the random-weight regime and strengthens the link between NTK spectral theory and information geometry for ReLU networks.
major comments (2)
- [§3] §3 (main theorem on trace concentration): the claim that the 97.7% trace fraction is independent of the number of parameters is stated without an explicit limit statement. The derivation appears to rely on averaging over the random hidden weights or the infinite-width regime; a finite-width counter-example or a precise statement of the asymptotic regime is needed to support the parameter-count independence asserted in the abstract.
- [§4] §4 (identification with spherical harmonics): the mapping of the dominant eigenspaces to harmonics of order ≤2 is obtained via the Mercer kernel of the NTK under random Gaussian or spherical hidden weights. The proof sketch should explicitly verify that the ReLU-induced kernel eigenfunctions coincide with the low-order spherical harmonics on the sphere; without this step the identification remains formal rather than constructive.
minor comments (2)
- [Notation] Notation for the Fisher matrix and the NTK should be unified across sections; currently the same symbol appears to be overloaded for the finite-sample and population versions.
- [Figure 2] Figure 2 (eigenvalue histogram) would benefit from an inset showing the cumulative trace fraction up to the third eigenspace for several widths to illustrate the claimed independence.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive comments. We address each major comment below and have revised the manuscript to incorporate clarifications on the asymptotic regime and to expand the explicit verification in the harmonic identification.
read point-by-point responses
-
Referee: [§3] §3 (main theorem on trace concentration): the claim that the 97.7% trace fraction is independent of the number of parameters is stated without an explicit limit statement. The derivation appears to rely on averaging over the random hidden weights or the infinite-width regime; a finite-width counter-example or a precise statement of the asymptotic regime is needed to support the parameter-count independence asserted in the abstract.
Authors: We appreciate the referee highlighting the need for precision here. The 97.7% trace concentration is derived exactly in the infinite-width limit m → ∞ (with input dimension d fixed), where the Fisher information reduces to the NTK and higher-degree contributions vanish by orthogonality of spherical harmonics. We have added an explicit limit statement to Theorem 1 and Section 3 in the revision, clarifying that the parameter-count independence holds asymptotically in this regime. Finite-m numerical results in the manuscript already show the fraction remains close to 97.7% for moderate widths, consistent with the limit. revision: yes
-
Referee: [§4] §4 (identification with spherical harmonics): the mapping of the dominant eigenspaces to harmonics of order ≤2 is obtained via the Mercer kernel of the NTK under random Gaussian or spherical hidden weights. The proof sketch should explicitly verify that the ReLU-induced kernel eigenfunctions coincide with the low-order spherical harmonics on the sphere; without this step the identification remains formal rather than constructive.
Authors: We agree that an explicit verification improves clarity. The NTK induced by random ReLU weights is a zonal kernel on the sphere, admitting a Mercer expansion in Legendre polynomials P_k(cos θ), whose associated eigenfunctions are the spherical harmonics of degree k. For the ReLU NTK, the coefficients for k ≥ 3 are identically zero in the relevant inner-product computation on the unit sphere. The revised Section 4 now includes the full expansion and direct verification that the dominant eigenspaces are spanned exactly by harmonics of degree ≤ 2. revision: yes
Circularity Check
No circularity: derivation relies on standard Mercer decomposition of NTK under explicit random-weight assumptions
full rationale
The paper states that the 97.7% trace concentration is already known for random-hidden-weight two-layer ReLU networks and then identifies the corresponding eigenspaces with spherical harmonics of order ≤2 via the Mercer decomposition of the induced neural tangent kernel. No equation is shown to be equivalent to its own input by construction, no fitted parameter is relabeled as a prediction, and no load-bearing uniqueness theorem or ansatz is imported solely via self-citation. The central identification is a direct consequence of the kernel's eigenfunction properties under the stated randomness and architecture; the result remains falsifiable by direct computation on finite networks and does not reduce to a tautology or self-referential fit.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Hidden-layer weights are drawn independently from a rotationally invariant distribution (implicitly standard normal or uniform on the sphere).
- domain assumption The network is exactly two layers with ReLU activations and the output is linear in the final weights.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
the eigenvalues for the first three eigenspaces account for 97.7% of the trace... spherical harmonic functions whose orders are not greater than 2
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
J := E[X(x)X^T(x)] ... Mercer decomposition of the neural tangent kernels
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Deep learning: a statistical viewpoint,
P. L. Bartlett, A. Montanari, and A. Rakhlin, “Deep learning: a statistical viewpoint,”Acta Numerica, vol. 30, pp. 87–201, 2021
work page 2021
-
[2]
C. M. Bishop, Pattern Recognition and Machine Learning, Springer, 2006
work page 2006
-
[3]
A review on neural networks with random weights,
Weipeng Cao, Xizhao Wang, Zhong Ming, Jinzhu Gao, “A review on neural networks with random weights,” Neurocomputing, V olume 275, 2018, Pages 278-287
work page 2018
-
[4]
Neural tangent kernel: Convergence and generalization in neural networks.CoRR, abs/1806.07572, 2018
A. Jacot, F. Gabriel, and C. Hongler, “Neural tangent kernel: Convergence and generalization in neural net- works,” Presented at the 32nd Conference on Neural Information Processing Systems, arXiv:1806.07572v3, 2018
-
[5]
Approximate Spectral Decomposition of Fisher Information Matrix for Simple ReLU Networks,
Y . Takeishi, M. Iida, and J. Takeuchi, “Approximate Spectral Decomposition of Fisher Information Matrix for Simple ReLU Networks,” arXiv:2111.15256, 2021. 8
-
[6]
Approximate Spectral Decomposition of Fisher Information Matrix for Simple ReLU Networks,
Y . Takeishi, M. Iida, and J. Takeuchi, “Approximate Spectral Decomposition of Fisher Information Matrix for Simple ReLU Networks,”Neural Networks, vol. 164, pp. 691-706, July, 2023
work page 2023
-
[7]
Y . Takeishi and J. Takeuchi, “Risk Bounds on MDL Estimators for Linear Regression Models with Application to Simple ReLU Neural Networks,” arXiv:2407.03854, 2024
-
[8]
S. Mei and A. Montanari, "The Generalization Error of Random Features Regression: Precise Asymptotics and the Double Descent Curve," Communications on Pure and Applied Mathematics, vol. 75, pp. 667-766, April, 2022
work page 2022
-
[9]
Random Features for Large-Scale Kernel Machines,
A. Rahimi and B. Recht, "Random Features for Large-Scale Kernel Machines," Presented at the 20th Conference on Neural Information Processing Systems, 2007 9 Appendix A Proof of Theorem 3.1 Let v(0) := ∥W (1)∥/ √ d, ..., ∥W (m)∥/ √ d , with W (i) ∈ Rd for i = 1, ..., m. First note that for any positive constant a > 0, σ(ax) = aσ(x). By rewriting and applyi...
work page 2007
-
[10]
, where B(·, ·) is the beta function. Hence, E h σ(x⊤ ˆZ) i = E h σ ∥x∥ ˆZ1 i = ∥x∥E h σ( ˆZ1) i = ∥x∥ Z 1 −1 σ(u)(1 − u2) d−1 2 −1 B( d−1 2 , 1
-
[11]
du = ∥x∥ Z 1 0 u(1 − u2) d−1 2 −1 B( d−1 2 , 1
-
[12]
du = ∥x∥ " −(1 − u2) d−1 2 (d − 1)B( d−1 2 , 1 2) #1 0 = ∥x∥ (d − 1)B( d−1 2 , 1 2) = 1 2π B d 2 , 1 2 ∥x∥, where σ ∥x∥ ˆZ1 = ∥x∥σ( ˆZ1) follows from the non-negativity of ∥x∥. The second term E ∥Z∥2 √ d = √ d is straightforward since ∥Z∥2 is χ-squared distributed with d degrees of freedom. Combining gives X ⊤v(0) p − → √ d 2π B d 2 , 1 2 ∥x∥. 10 B Proof ...
-
[13]
du = d √ d + 2|xγ| B( d−1 2 , 1 2) ( u2(1 − u2)(d−1)/2 −(d − 1) 1 0 + Z 1 0 2u(1 − u2)(d−1)/2 d − 1 du ) = d √ d + 2|xγ| B( d−1 2 , 1 2) −2(1 − u2)(d+1)/2 (d − 1)(d + 1) 1 0 = 2d √ d + 2|xγ| (d2 − 1)B( d−1 2 , 1 2) = d √ d + 2 (d + 1)π B( d 2 , 1 2)|xγ|. Using (4) and (6), we obtain 13 X ⊤˜v(γ) p − →1√ 2 d √ d + 2 π(d + 1)B( d 2 , 1 2)|xγ| − r d + 2 d √ d...
-
[14]
du = Kγ Z 1 0 u(1 − u2) (d−2) 2 −1 B( d−2 2 , 1
-
[15]
du = Kγ " −(1 − u2) d−2 2 (d − 2)B( d−2 2 , 1 2) #1 0 = Kγ (d − 2)B( d−2 2 , 1
-
[16]
The original expression then becomes d √ d + 2E ˆZγ h E n σ x⊤ −γ ˆZ−γ ˆZ 2 γ | ˆZγ oi = d √ d + 2∥x−γ∥ (d − 2)B( d−2 2 , 1
-
[17]
E ˆZ 2 γ q 1 − ˆZ 2γ = d √ d + 2∥x−γ∥ (d − 2)B( d−2 2 , 1 2) B( d 2 , 3 2) B( d−1 2 , 1 2) = d √ d + 2 2π(d + 1)B( d 2 , 1 2)∥x−γ∥. Finally, using (4) and (6), we obtain X ⊤˜v(γ) p − →1√ 2 d √ d + 2 2π(d + 1)B( d 2 , 1 2)∥x−γ∥ − r d + 2 d √ d 2π B( d 2 , 1 2)∥x∥ ! = √ d + 2 2π √ 2 B( d 2 , 1 2)∥x∥ d d + 1 − 1 = − √ d + 2 2π(d + 1) √ 2 B( d 2 , 1 2)∥x∥ as ...
-
[18]
Denoting Cγ := ( |xγ|∥ ˆZγ∥)/(∥x−γ∥∥ ˆZ−γ∥), we then take expectation with respect to cos(ϕ) conditioned on ∥ ˆZ−γ∥. If 0 ≤ Cγ < 1, then E h σ (Cγ + cos(ϕ)) + σ (−Cγ + cos(ϕ)) |∥ ˆZ−γ∥ i = Z Cγ −Cγ (Cγ + u)(1 − u2) d−4 2 B( d−2 2 , 1
-
[19]
du + 2 Z 1 Cγ u(1 − u2) d−4 2 B( d−2 2 , 1
-
[20]
du =Cγ Z Cγ −Cγ (1 − u2) d−4 2 B( d−2 2 , 1
-
[21]
−(1 − u2) d−2 2 (d − 2)B( d−2 2 , 1 2) #Cγ −Cγ + 2
du + " −(1 − u2) d−2 2 (d − 2)B( d−2 2 , 1 2) #Cγ −Cγ + 2 " −(1 − u2) d−2 2 (d − 2)B( d−2 2 , 1 2) #1 Cγ =Cγ Z Cγ −Cγ (1 − u2) d−4 2 B( d−2 2 , 1
-
[22]
du + 2 (1 − C 2 γ) d−2 2 (d − 2)B( d−2 2 , 1
-
[23]
ˆZ 2 γ ∥x−γ∥∥ ˆZ−γ∥ 2B( d−2 2 , 1 2) 2 d − 2 + C 2 γ + Rγ 1 | ˆZγ| < ∥x−γ∥ ∥x∥ # + E
By Taylor expansion of the second term and the integrand of the first term, we have 1 B( d−2 2 , 1 2) Cγ Z Cγ −Cγ (1 − u2) d−4 2 du + 2(1 − C 2 γ) d−2 2 (d − 2) ! = 1 B( d−2 2 , 1 2) 2 d − 2 + C 2 γ + Rγ, where Rγ = R(Cγ) = O(C 4 γ) (as Cγ tends to 0) is the remainder term. It is important that Rγ is bounded over Cγ ∈ [0, 1), because both LHS and the firs...
-
[24]
ˆZ 2 γ ∥x−γ∥∥ ˆZ−γ∥ 2B( d−2 2 , 1 2) 2 d − 2 + C 2 γ + Rγ 1 | ˆZγ| < ∥x−γ∥ ∥x∥ # + E
Cγ1 | ˆZγ| ≥ ∥x−γ∥ ∥x∥ # =E " ˆZ 2 γ ∥x−γ∥∥ ˆZ−γ∥ 2B( d−2 2 , 1 2) 2 d − 2 + C 2 γ + Rγ 1 | ˆZγ| < ∥x−γ∥ ∥x∥ # + E " ˆZ 2 γ ∥x−γ∥∥ ˆZ−γ∥ 2B( d−2 2 , 1
-
[25]
ˆZ 2 γ ∥x−γ∥∥ ˆZ−γ∥ 2B( d−2 2 , 1 2) 2 d − 2 + C 2 γ 1 | ˆZγ| < ∥x−γ∥ ∥x∥ # =E
Cγ1 | ˆZγ| ≥ ∥x−γ∥ ∥x∥ # . 15 Also, as E " ˆZ 2 γ ∥x−γ∥∥ ˆZ−γ∥ 2B( d−2 2 , 1 2) 2 d − 2 + C 2 γ 1 | ˆZγ| < ∥x−γ∥ ∥x∥ # =E " ˆZ 2 γ ∥x−γ∥∥ ˆZ−γ∥ 2B( d−2 2 , 1 2) 2 d − 2 + C 2 γ 1 − 1 | ˆZγ| ≥ ∥x−γ∥ ∥x∥ # = ∥x−γ∥ 2B( d−2 2 , 1 2)B( d−1 2 , 1 2) 2 d − 2 B( d 2 , 3
-
[26]
+ x2 γ ∥x−γ∥2 B( d − 2 2 , 5 2) ! − E " ˆZ 2 γ ∥x−γ∥∥ ˆZ−γ∥ 2B( d−2 2 , 1 2) 2 d − 2 + C 2 γ 1 | ˆZγ| ≥ ∥x−γ∥ ∥x∥ # = ∥x−γ∥B( d−2 2 , 5 2) 2B( d−2 2 , 1 2)B( d−1 2 , 1 2) 2 3 + x2 γ ∥x−γ∥2 ! − E " ˆZ 2 γ ∥x−γ∥∥ ˆZ−γ∥ 2B( d−2 2 , 1 2) 2 d − 2 + C 2 γ 1 | ˆZγ| ≥ ∥x−γ∥ ∥x∥ # = ∥x−γ∥B( d 2 , 1 2) 2π(d + 1) 1 + 3x2 γ 2∥x−γ∥2 ! − E " ˆZ 2 γ ∥x−γ∥∥ ˆZ−γ∥ 2B( d−2...
-
[27]
Rγ1 | ˆZγ| < ∥x−γ∥ ∥x∥ # + E " ˆZ 2 γ ∥x−γ∥∥ ˆZ−γ∥ 2B( d−2 2 , 1 2) − 2 d − 2 + Cγ − C 2 γ 1 | ˆZγ| ≥ ∥x−γ∥ ∥x∥ # = O(∥x∥r2 γ). (13) Here, it is useful to note that by denoting rγ := |xγ|2/∥x∥2, then ∥x−γ∥ = ∥x∥(1−rγ)1/2 = ∥x∥(1− rγ 2 )+O(∥x∥r2 γ) and x2 γ ∥x−γ∥2 = rγ(1−rγ)−1 = rγ+O(r2 γ). (14) First Term: Since Rγ is bounded on Cγ ∈ [0, 1), we get by dir...
-
[28]
Rγ1 | ˆZγ| < ∥x−γ∥ ∥x∥ # = O(∥x∥r2 γ) Second Term: E " ˆZ 2 γ ∥x−γ∥∥ ˆZ−γ∥ 2B( d−2 2 , 1 2) − 2 d − 2 1 | ˆZγ| ≥ ∥x−γ∥ ∥x∥ # = − 2∥x−γ∥ (d − 2)B( d−2 2 , 1 2) Z 1 ∥x−γ ∥/∥x∥ u2 p 1 − u2 (1 − u2)(d−3)/2 B( d−1 2 , 1
-
[29]
du = − ∥x−γ∥ (d − 2)B( d−2 2 , 1 2) Z x2 γ /∥x∥2 0 √ 1 − t t(d−2)/2 B( d−1 2 , 1
-
[30]
16 Third Term: E " ˆZ 2 γ ∥x−γ∥∥ ˆZ−γ∥ 2B( d−2 2 , 1
dt =O(∥x∥rd/2 γ ). 16 Third Term: E " ˆZ 2 γ ∥x−γ∥∥ ˆZ−γ∥ 2B( d−2 2 , 1
-
[31]
Cγ1 | ˆZγ| ≥ ∥x−γ∥ ∥x∥ # = |xγ| B( d−2 2 , 1 2) Z 1 ∥x−γ ∥/∥x∥ u2 p 1 − u2 u√ 1 − u2 (1 − u2)(d−3)/2 B( d−1 2 , 1
-
[32]
du = |xγ| 2B( d−2 2 , 1 2) Z x2 γ /∥x∥2 0 (1 − t) t(d−3)/2 B( d−1 2 , 1
-
[33]
Fourth Term: E " ˆZ 2 γ ∥x−γ∥∥ ˆZ−γ∥ 2B( d−2 2 , 1
dt =O(∥x∥rd/2 γ ). Fourth Term: E " ˆZ 2 γ ∥x−γ∥∥ ˆZ−γ∥ 2B( d−2 2 , 1
-
[34]
(−C 2 γ)1 | ˆZγ| ≥ ∥x−γ∥ ∥x∥ # = − |xγ|2 ∥x−γ∥B( d−2 2 , 1 2) Z 1 ∥x−γ ∥/∥x∥ u2 p 1 − u2 u2 1 − u2 (1 − u2)(d−3)/2 B( d−1 2 , 1
-
[35]
du = − |xγ|2 2B( d−2 2 , 1 2)∥x−γ∥ Z x2 γ /∥x∥2 0 (1 − t)3/2 t(d−4)/2 B( d−1 2 , 1
-
[36]
Therefore (13) holds for d ≥ 2
dt =O(∥x∥rd/2 γ ). Therefore (13) holds for d ≥ 2. By combining all the results, we obtain X ⊤v(γ,γ ) p − →d √ d + 2 2π(d + 1)∥x−γ∥B( d 2 , 1 2) 1 + 3x2 γ 2∥x−γ∥2 ! + √ 2hγ (∥x∥, rγ) , where hγ (∥x∥, rγ) = O(∥x∥r2 γ). By (4) and (6), X ⊤˜v(γ) p − →1√ 2 d √ d + 2 2π(d + 1)∥x−γ∥B( d 2 , 1 2) 1 + 3x2 γ 2∥x−γ∥2 ! − r d + 2 d √ d 2π B( d 2 , 1 2)∥x∥ ! + hγ ∥x∥...
-
[37]
· 2 3 xα cos3(ϕ) + xβ sin3(ϕ) = d − 2 6π B( d − 2 2 , 5 2) xαx3 β ∥x∥3 + x3 αxβ ∥x∥3 ! = d − 2 6π B( d − 2 2 , 5
-
[38]
xαxβ ∥x∥ = 1 2π(d + 1)B( d 2 , 1
-
[39]
Therefore, X ⊤v(α,β) p − →d √ d + 2 2π(d + 1)B( d 2 , 1
xαxβ ∥x∥ . Therefore, X ⊤v(α,β) p − →d √ d + 2 2π(d + 1)B( d 2 , 1
-
[40]
xαxβ ∥x∥ . General case ( xα, xβ, xαβ ̸= 0): Since the angle θ between xαβ and ˆZαβ is uniformly distributed on [−π, π), by considering the expectation conditioned on ˆZ−αβ, we obtain E[σ(x⊤ αβ ˆZαβ + x⊤ −αβ ˆZ−αβ) ˆZα ˆZβ| ˆZ−αβ] =∥xαβ∥E " σ ∥ ˆZαβ∥ cos(θ) + x⊤ −αβ ˆZ−αβ ∥xαβ∥ ! ˆZα ˆZβ| ˆZ−αβ # =∥xαβ∥E σ ∥ ˆZαβ∥ cos(θ) + x⊤ −αβ ˆZ−αβ ∥xαβ∥ ! ∥ ˆZαβ∥2 ∥x...
-
[41]
When C ≥ 1, we instead have Z 1 −1 1 − C −2u2 3/2 (1 − u2)(d−5)/2du := I(C)
du When 0 ≤ C < 1, via a Taylor expansion, we have Z C −C 1 − C −2u2 3/2 (1 − u2)(d−5)/2du = 3πC 8 1 − (d − 5)C 2 12 + Rαβ, where Rαβ = R(C) = O(C 4). When C ≥ 1, we instead have Z 1 −1 1 − C −2u2 3/2 (1 − u2)(d−5)/2du := I(C). Note that 0 < C < 1 is equivalent to ∥ ˆZαβ∥2 < 1 − ∥xαβ∥2/∥x∥2 := 1 − rαβ. Also, ∥x−αβ∥ = ∥x∥(1 − rαβ)1/2 = ∥x∥(1 − rαβ 2 ) + O(...
-
[42]
E ∥xαβ∥ ∥x−αβ∥ ∥ ˆZαβ∥4 q 1 − ∥ ˆZαβ∥2 1 − d − 5 12 ∥xαβ∥2 ∥x−αβ∥2 ∥ ˆZαβ∥2 1 − ∥ ˆZαβ∥2 ! − xαxβ 3π∥xαβ∥B( d−3 2 , 1
-
[43]
E 3π 8 ∥ ˆZαβ∥3C 1 − (d − 5)C 2 12 1{∥ ˆZαβ∥2 ≥ 1 − rαβ} + O xαxβ ∥xαβ∥ r2 αβ = xαxβ 8∥x−αβ∥B( d−3 2 , 1 2) d − 2 2 B( d − 3 2 , 3) − d − 5 12 B( d − 5 2 , 4)rαβ + O xαxβ ∥xαβ∥ r2 αβ = d − 2 16B( d−3 2 , 1
-
[44]
B( d − 3 2 , 3) xαxβ ∥x−αβ∥ 1 − 1 2 rαβ + O xαxβ ∥xαβ∥ r2 αβ = 1 2(d + 1)π B( d 2 , 1
-
[45]
xαxβ ∥x−αβ∥ 1 − 1 2 rαβ + O xαxβ ∥xαβ∥ r2 αβ = 1 2(d + 1)π B( d 2 , 1
-
[46]
xαxβ ∥x∥ 1 − 1 2 rαβ 1 + 1 2 rαβ + O xαxβ ∥xαβ∥ r2 αβ = 1 2(d + 1)π B( d 2 , 1
-
[47]
xαxβ ∥x∥ + O xαxβ ∥xαβ∥ r2 αβ . where E 3π 8 ∥ ˆZαβ∥3C 1 − (d − 5)C 2 12 + Rαβ 1{∥ ˆZαβ∥2 ≥ 1 − rαβ} = O(r2 αβ) follows from directly integrating, similar to the proof of Theorem 3.3 for d ≥ 6. Finally, combining these results gives X ⊤v(α,β) p − →d √ d + 2 2(d + 1)π B( d 2 , 1
-
[48]
xαxβ ∥x∥ + hαβ xαxβ ∥xαβ∥ , rαβ , where hαβ xαxβ ∥xαβ ∥ , rαβ = O xαxβ ∥xαβ ∥ r2 αβ . 21
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.