Perturbative methods for non-parametric instrumental variable

Arthur Gretton; Wei Bu

arxiv: 2606.00322 · v1 · pith:CUE6UQRZnew · submitted 2026-05-29 · 💻 cs.LG · stat.ML

Perturbative methods for non-parametric instrumental variable

Wei Bu , Arthur Gretton This is my paper

Pith reviewed 2026-06-28 23:14 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords nonparametric instrumental variablesperturbation theorykernel ridge regressionhigh-dimensional estimationill-defined operatorsexpectation integral operatoreigenmode mixing

0 comments

The pith

Perturbative corrections reduce prediction error by up to 99% for nonparametric instrumental variables in high dimensions

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a perturbative approach for nonparametric instrumental variable estimation by extending kernel ridge regression with higher-order corrections from perturbation theory. These corrections introduce mixing between eigenmodes of the expectation integral operator, addressing ill-defined cases caused by high dimensionality. The dimensionality is parameterized by β where d equals n to the power of β. First-order corrections achieve up to 99% reduction in prediction error when β exceeds 0.7 compared to standard methods. Readers interested in robust estimation under the curse of dimensionality would find this relevant as it maintains performance improvements across increasing dimensions.

Core claim

We introduce a perturbative approach for nonparametric instrumental variable estimation. By drawing inspiration from perturbation theory in physics, we extend standard kernel ridge methods with systematic higher perturbation order corrections that significantly improve estimation accuracy. Spectrally, the perturbation introduces mixing between different eigenmodes of the expectation integral operator, which becomes especially useful when the integral equation is ill-defined. One source for such ill-definedness can be the curse of dimensionality. Our method performs across various dimensionality regimes, particularly when the dimensionality parameter β which is defined through the number of s

What carries the argument

The perturbative corrections that introduce mixing between different eigenmodes of the expectation integral operator in the kernel ridge estimator.

Load-bearing premise

The assumption that perturbation theory can be systematically extended to the expectation integral operator in NPIV such that higher-order corrections remain stable and unbiased when the operator is ill-defined due to high dimensionality.

What would settle it

An experiment showing that first-order perturbative corrections fail to reduce or increase prediction error in NPIV settings with β > 0.7 compared to standard ridge regression would falsify the claim.

Figures

Figures reproduced from arXiv: 2606.00322 by Arthur Gretton, Wei Bu.

**Figure 1.** Figure 1: Causal diagram: Z is an instrument for X, while U represents unobserved confounding between X and Y . that causally affects Y , which is the outcome of interest. Finally, U represents unobserved confounding that affects both X and Y . The key assumptions are: Z affects Y only through X (exclusion restriction); Z has a non-zero effect on X; Z is independent of the unobserved confounders U: U ⊥⊥ Z and E[U|Z]… view at source ↗

**Figure 3.** Figure 3: Example fractional Brownian kernel fitting In addition to these, we further include empirical analysis on the performance of the algorithm on IV datasets with weak instrumentals in appendix F.5, a sensitivity analysis on the parameters used in the algorithm (regularization parameter γ, ridge regularization parameter λ and maximum order of perturbation Nmax) in appendix F.6 and experiments with larger sampl… view at source ↗

**Figure 2.** Figure 2: Example RBF kernel fitting In appendix F.4, we also include further standard NPIV datasets: Newey-Powell (Newey & Powell, 2003), weak/strong instrumental datasets, heteroscedastic dataset, nonlinear instrumental dataset and sparse signal dataset. The performance result is summerized here, note that we only focus on the improvement against regularized kernel IV (order 0) using fractional Brownian kernel, D… view at source ↗

**Figure 4.** Figure 4: MSE comparison across dimension and RBF kernel bandwidth. 34 [PITH_FULL_IMAGE:figures/full_fig_p034_4.png] view at source ↗

read the original abstract

We introduce a perturbative approach for nonparametric instrumental variable (NPIV) estimation. By drawing inspiration from perturbation theory in physics, we extend standard kernel ridge methods with systematic higher perturbation order corrections that significantly improve estimation accuracy. Spectrally, the perturbation introduces mixing between different eigenmodes of the expectation integral operator, which becomes especially useful when the integral equation is ill-defined. One source for such ill-definedness can be the curse of dimensionality. Our method performs across various dimensionality regimes, particularly when the dimensionality parameter $\beta$ which is defined through the number of samples $n$ and dimension $d$ as $n^\beta = d$, becomes large. Experimental results show that our first-order perturbative corrections can reduce prediction error by up to 99\% in high-dimensional ill-defined cases ($\beta > 0.7$) compared to standard ridge regression approaches. The performance improvement is maintained across a wide range of dimensions, with the advantage becoming more pronounced as dimensionality increases.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Perturbative NPIV corrections claim big error drops in high dimensions but lack any shown derivation that first-order terms stay valid when the operator is severely ill-posed.

read the letter

The main takeaway is that this paper tries to fix a real pain point in nonparametric IV estimation by borrowing perturbation expansions from physics and adding them to kernel ridge regression, but the key assumption that those corrections remain accurate for large β is not justified.

What is new is the explicit use of eigenmode mixing induced by the perturbation on the expectation integral operator T. The authors argue this helps when the inverse problem becomes ill-defined because of the curse of dimensionality, and they position the method as giving systematic higher-order improvements over plain ridge. That framing is a step beyond routine kernel applications.

The work does identify why standard NPIV methods degrade as d grows with n^β = d and offers a concrete way to extend them. If the analysis holds, it could matter for causal pipelines that need to handle high-dimensional instruments.

The soft spots are in the justification and evidence. The stress-test concern lands: perturbation theory normally expands around an operator whose spectrum is bounded away from zero, yet here the base operator T is already compact and ill-conditioned for β > 0.7. No derivation appears showing that the remainder after the first-order term is o(1) uniformly or that the resulting estimator stays unbiased for the structural function. The abstract reports up to 99% error reduction but supplies no experimental protocol, baselines, or variance numbers, so the performance claim cannot be assessed. The full text is referenced but the provided material does not close this gap.

This is aimed at researchers working on kernel methods for causal inference or high-dimensional IV problems. A reader already familiar with NPIV and spectral regularization might extract an idea worth testing, but only if the perturbation analysis is filled in.

It deserves peer review so that referees can check whether the stability argument can be made rigorous and whether the experiments actually support the claims.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces a perturbative approach for nonparametric instrumental variable (NPIV) estimation. It extends standard kernel ridge methods by adding systematic higher-order perturbation corrections that introduce mixing between eigenmodes of the expectation integral operator, with the goal of improving accuracy in ill-defined regimes. The central claim is that first-order corrections yield up to 99% reduction in prediction error relative to ridge regression when the dimensionality parameter β (defined via n^β = d) exceeds 0.7.

Significance. If the perturbative corrections can be shown to remain accurate and the reported gains can be reproduced under standard experimental controls, the approach could provide a practical route to mitigating the effects of rapid eigenvalue decay in high-dimensional NPIV. No machine-checked proofs, reproducible code, or parameter-free derivations are presented.

major comments (2)

[Abstract] Abstract: the claim that first-order perturbative corrections reduce prediction error by up to 99% in the regime β > 0.7 supplies no experimental protocol, baseline specifications, error bars, or statistical tests. This absence renders the central performance claim impossible to assess and is load-bearing for the paper's main contribution.
[Abstract] Abstract: the assertion that the perturbation introduces useful eigenmode mixing for the compact expectation integral operator T when its singular values decay rapidly (high β) is not accompanied by any derivation showing that the remainder after the first-order term is o(1) uniformly in this regime, nor that the resulting estimator remains unbiased for the structural function. Standard perturbation theory requires the unperturbed operator to have spectrum bounded away from zero; here the unperturbed operator is already severely ill-conditioned.

minor comments (1)

[Abstract] The definition of the dimensionality parameter β via n^β = d is introduced without reference to prior literature on effective dimension in nonparametric estimation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and constructive comments. We address each major comment below and outline planned revisions to the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that first-order perturbative corrections reduce prediction error by up to 99% in the regime β > 0.7 supplies no experimental protocol, baseline specifications, error bars, or statistical tests. This absence renders the central performance claim impossible to assess and is load-bearing for the paper's main contribution.

Authors: The experimental protocol, baselines (standard kernel ridge regression), error bars from repeated trials, and statistical comparisons are presented in Section 4 of the manuscript. To address the concern that the abstract renders the claim difficult to assess, we will revise the abstract to include a brief reference to the experimental setup, the definition of the β regime, and the observed maximum reduction. revision: yes
Referee: [Abstract] Abstract: the assertion that the perturbation introduces useful eigenmode mixing for the compact expectation integral operator T when its singular values decay rapidly (high β) is not accompanied by any derivation showing that the remainder after the first-order term is o(1) uniformly in this regime, nor that the resulting estimator remains unbiased for the structural function. Standard perturbation theory requires the unperturbed operator to have spectrum bounded away from zero; here the unperturbed operator is already severely ill-conditioned.

Authors: We agree that the standard conditions for perturbation expansions are violated when the spectrum of T decays rapidly. The manuscript does not contain a derivation establishing that the first-order remainder is o(1) uniformly or that the estimator is unbiased. In the revision we will add a discussion subsection clarifying that the approach is motivated by eigenmode mixing and supported by empirical results rather than by a full perturbative error analysis under the classical assumptions. revision: partial

Circularity Check

0 steps flagged

No circularity; claims rest on external experimental evaluation

full rationale

The abstract presents a perturbative extension to kernel ridge regression for NPIV estimation, with performance claims grounded in experimental error reductions (up to 99% for β > 0.7) rather than any closed-form derivation that reduces to fitted parameters or self-citations. No equations, ansatzes, or uniqueness theorems are exhibited that would trigger self-definitional, fitted-input, or self-citation patterns. The dimensionality parameter β is introduced as a simple definition (n^β = d) without circular reuse, and the method is positioned as an extension inspired by external physics concepts. This leaves the central results self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated. The perturbation orders and the definition of beta may implicitly require choices not detailed here.

pith-pipeline@v0.9.1-grok · 5687 in / 1030 out tokens · 20605 ms · 2026-06-28T23:14:06.154367+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

13 extracted references · 5 canonical work pages

[1]

org/CorpusID:207063850

URL https://api.semanticscholar. org/CorpusID:207063850. Carrasco, M., Florens, J.-P., and Renault, E. Chapter 77 linear inverse problems in structural economet- rics estimation based on spectral decomposition and regularization. volume 6 ofHandbook of Econometrics, pp. 5633–5751. Elsevier, 2007. doi: https://doi.org/10.1016/S1573-4412(07)06077-1. URL htt...

work page doi:10.1016/s1573-4412(07)06077-1 2007
[2]

Dorigoni, D

URL https://proceedings.mlr.press/ v139/donhauser21a.html. Dorigoni, D. An introduction to resurgence, trans-series and alien calculus.Annals of Physics, 409:167914, 2019. doi: 10.1016/j.aop.2019.167914. Accessible survey for physicists covering resurgent analysis and alien calculus. Dyson, F. J. Divergence of perturbation theory in quantum electrodynamic...

work page doi:10.1016/j.aop.2019.167914 2019
[3]

URL http:// dx.doi.org/10.1002/prop.201400005

doi: 10.1002/prop.201400005. URL http:// dx.doi.org/10.1002/prop.201400005. Meunier, D., Moulin, A., Wornbard, J., Kostic, V . R., and Gretton, A. Demystifying spectral feature learning for instrumental variable regression, 2025. URL https: //arxiv.org/abs/2506.10899. Miao, W., Geng, Z., and Tchetgen, E. T. Identifying causal effects with proxy variables ...

work page doi:10.1002/prop.201400005 2025
[4]

Peskin and Daniel V

ISBN 978-0-201-50397-5, 978-0-429-50355-9, 978-0-429-49417-8. doi: 10.1201/9780429503559. Rizzo, M. L. and Sz ´ekely, G. J. Energy distance.WIREs Comput. Stat., 8(1):27–38, January 2016. ISSN 1939- 5108. Sch¨olkopf, B., Herbrich, R., and Smola, A. J. A generalized representer theorem. In Helmbold, D. and Williamson, B. (eds.),Computational Learning Theory...

work page doi:10.1201/9780429503559 2016
[5]

URLhttp://www

ISSN 00129682, 14680262. URLhttp://www. jstor.org/stable/2171753. Steinwart, I. and Christmann, A.Support vector machines. Information science and statistics. Springer, New York, NY , 2008. ISBN 978-0-387-77241-7 and 978-1-4899- 8963-5 and 978-6-611-92704-2 and 978-0-387-77242-4. Stock, J. H., Wright, J. H., and Yogo, M. A survey of weak instruments and w...

work page doi:10.1103/physrevd.81.105008 2008
[6]

For allx∈ X, the functionK(·, x)belongs toH
[7]

Definition 5(Conditional Expectation Operators).Let X and Z be random variables with joint distribution PX,Z

For allx∈ Xand allf∈ H, the reproducing property holds:f(x) =⟨f, K(·, x)⟩ H. Definition 5(Conditional Expectation Operators).Let X and Z be random variables with joint distribution PX,Z. We define:
[8]

The conditional expectation operatorT:H →L 2(PZ)as(T f)(z) =E[f(X)|Z=z]
[9]

The adjoint operatorT ∗ :L 2(PZ)→ Hsatisfies⟨T f, g⟩ L2(PZ) =⟨f, T ∗g⟩H for allf∈ Handg∈L 2(PZ). We consider the nonparametric instrumental variable (NPIV) problem with a cubic interaction term: S[f] =S 0[f] +γS 1[f](93) =E Z h (E[Y|Z]−E[f(X)|Z]) 2 i +λ∥f∥ 2 H +γS 1[f],(94) whereS 1[f]represents the non-independent three-point interaction: S1[f] = 2 3 EZ ...
[10]

Couple(l 1, l2) = (1,5) =⇒intermediate angular momentuml 12 ∈ {4,5,6}
[11]

Let’s targetl12 = 4

We now seek an evenLthat allows coupling(l 3, L) = (1, L)to one of thesel 12 values. Let’s targetl12 = 4
[12]

The coupling rule requires|l 3 −L| ≤l 12 ≤l 3 +L, which for our values becomes|1−L| ≤4≤1 +L
[13]

Order 0” is standard kernel ridge IV; “Best Pert

The inequality4≤1 +LimpliesL≥3. The inequality|1−L| ≤4implies−3≤L≤5. The conditions require L to be in the range [3,5] . We can choose the even value L= 4 . The expansion of g(ˆω)contains a non-zero C4,M term. Therefore, a coupling pathway exists via the L= 4 channel, and the integral can be non-zero. This demonstrates that the triangle inequality on(l 1,...

2021

[1] [1]

org/CorpusID:207063850

URL https://api.semanticscholar. org/CorpusID:207063850. Carrasco, M., Florens, J.-P., and Renault, E. Chapter 77 linear inverse problems in structural economet- rics estimation based on spectral decomposition and regularization. volume 6 ofHandbook of Econometrics, pp. 5633–5751. Elsevier, 2007. doi: https://doi.org/10.1016/S1573-4412(07)06077-1. URL htt...

work page doi:10.1016/s1573-4412(07)06077-1 2007

[2] [2]

Dorigoni, D

URL https://proceedings.mlr.press/ v139/donhauser21a.html. Dorigoni, D. An introduction to resurgence, trans-series and alien calculus.Annals of Physics, 409:167914, 2019. doi: 10.1016/j.aop.2019.167914. Accessible survey for physicists covering resurgent analysis and alien calculus. Dyson, F. J. Divergence of perturbation theory in quantum electrodynamic...

work page doi:10.1016/j.aop.2019.167914 2019

[3] [3]

URL http:// dx.doi.org/10.1002/prop.201400005

doi: 10.1002/prop.201400005. URL http:// dx.doi.org/10.1002/prop.201400005. Meunier, D., Moulin, A., Wornbard, J., Kostic, V . R., and Gretton, A. Demystifying spectral feature learning for instrumental variable regression, 2025. URL https: //arxiv.org/abs/2506.10899. Miao, W., Geng, Z., and Tchetgen, E. T. Identifying causal effects with proxy variables ...

work page doi:10.1002/prop.201400005 2025

[4] [4]

Peskin and Daniel V

ISBN 978-0-201-50397-5, 978-0-429-50355-9, 978-0-429-49417-8. doi: 10.1201/9780429503559. Rizzo, M. L. and Sz ´ekely, G. J. Energy distance.WIREs Comput. Stat., 8(1):27–38, January 2016. ISSN 1939- 5108. Sch¨olkopf, B., Herbrich, R., and Smola, A. J. A generalized representer theorem. In Helmbold, D. and Williamson, B. (eds.),Computational Learning Theory...

work page doi:10.1201/9780429503559 2016

[5] [5]

URLhttp://www

ISSN 00129682, 14680262. URLhttp://www. jstor.org/stable/2171753. Steinwart, I. and Christmann, A.Support vector machines. Information science and statistics. Springer, New York, NY , 2008. ISBN 978-0-387-77241-7 and 978-1-4899- 8963-5 and 978-6-611-92704-2 and 978-0-387-77242-4. Stock, J. H., Wright, J. H., and Yogo, M. A survey of weak instruments and w...

work page doi:10.1103/physrevd.81.105008 2008

[6] [6]

For allx∈ X, the functionK(·, x)belongs toH

[7] [7]

Definition 5(Conditional Expectation Operators).Let X and Z be random variables with joint distribution PX,Z

For allx∈ Xand allf∈ H, the reproducing property holds:f(x) =⟨f, K(·, x)⟩ H. Definition 5(Conditional Expectation Operators).Let X and Z be random variables with joint distribution PX,Z. We define:

[8] [8]

The conditional expectation operatorT:H →L 2(PZ)as(T f)(z) =E[f(X)|Z=z]

[9] [9]

The adjoint operatorT ∗ :L 2(PZ)→ Hsatisfies⟨T f, g⟩ L2(PZ) =⟨f, T ∗g⟩H for allf∈ Handg∈L 2(PZ). We consider the nonparametric instrumental variable (NPIV) problem with a cubic interaction term: S[f] =S 0[f] +γS 1[f](93) =E Z h (E[Y|Z]−E[f(X)|Z]) 2 i +λ∥f∥ 2 H +γS 1[f],(94) whereS 1[f]represents the non-independent three-point interaction: S1[f] = 2 3 EZ ...

[10] [10]

Couple(l 1, l2) = (1,5) =⇒intermediate angular momentuml 12 ∈ {4,5,6}

[11] [11]

Let’s targetl12 = 4

We now seek an evenLthat allows coupling(l 3, L) = (1, L)to one of thesel 12 values. Let’s targetl12 = 4

[12] [12]

The coupling rule requires|l 3 −L| ≤l 12 ≤l 3 +L, which for our values becomes|1−L| ≤4≤1 +L

[13] [13]

Order 0” is standard kernel ridge IV; “Best Pert

The inequality4≤1 +LimpliesL≥3. The inequality|1−L| ≤4implies−3≤L≤5. The conditions require L to be in the range [3,5] . We can choose the even value L= 4 . The expansion of g(ˆω)contains a non-zero C4,M term. Therefore, a coupling pathway exists via the L= 4 channel, and the integral can be non-zero. This demonstrates that the triangle inequality on(l 1,...

2021