Second-Order Least Squares as a Special Case of the Polynomial Maximization Method

Serhii Zabolotnii

arxiv: 2606.11421 · v1 · pith:MVFIYOPFnew · submitted 2026-06-09 · 📊 stat.ME · math.ST· stat.CO· stat.TH

Second-Order Least Squares as a Special Case of the Polynomial Maximization Method

Serhii Zabolotnii This is my paper

Pith reviewed 2026-06-27 12:09 UTC · model grok-4.3

classification 📊 stat.ME math.STstat.COstat.TH

keywords second-order least squarespolynomial maximization methodlinear regressionestimating equationsasymptotic variancenon-Gaussian errorsconditional homoskedasticity

0 comments

The pith

Optimally weighted second-order least squares equals degree-two polynomial maximization for linear regression under conditional homoskedasticity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that for linear regression with errors having constant conditional variance but non-Gaussian distribution, optimally weighted second-order least squares and the degree-two generalized polynomial maximization method produce the identical population estimating equation. Both select the same optimal linear combination of the first two centered residual moments, solve the same normal equation, share the same influence function, and achieve the asymptotic variance c2 g2 / N with g2 depending on error skewness and excess kurtosis. Feasible plug-in versions of the two methods are therefore first-order equivalent. The equivalence is sharp and fails under heteroskedasticity, where the methods separate and PMM can lose consistency for asymmetric errors.

Core claim

Optimally weighted second-order least squares (SLS) and the degree-two generalized polynomial maximization method (PMM) are the same population estimating equation for linear regression with conditionally homoskedastic non-Gaussian errors: they choose the same optimal linear combination of the first two centered residual moments, solve one population normal system, share one influence function, and attain the common asymptotic variance c2 g2 / N -- the ordinary-least-squares slope-variance factor c2 scaled by the PMM variance-reduction coefficient g2=1-γ3²/(2+γ4).

What carries the argument

The shared population estimating equation formed by the optimal linear combination of the first two centered residual moments under conditional homoskedasticity.

If this is right

Feasible plug-in implementations of optimally weighted SLS and degree-two PMM are first-order equivalent.
Under heteroskedasticity the unconditional PMM and conditional SLS weighting separate, costing efficiency for symmetric errors and consistency for asymmetric errors.
Beyond degree two, PMM holds an efficiency reserve unreachable by SLS within its second-moment span.
For symmetric platykurtic errors SLS collapses to ordinary least squares for the slope while degree-three PMM exploits kurtosis information outside the SLS moment span.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Researchers applying either method to gain efficiency with non-normal errors are using equivalent procedures at the population level when conditional homoskedasticity holds.
The separation under heteroskedasticity suggests that tests for conditional variance constancy may be needed before relying on either estimator for efficiency gains.
Higher-degree PMM versions could be examined for further efficiency gains in settings where SLS remains restricted to second moments.

Load-bearing premise

The regression errors have constant conditional variance given the predictors.

What would settle it

An empirical or simulated dataset with heteroskedastic errors where the two methods produce statistically different slope estimates or where one is consistent and the other is not.

Figures

Figures reproduced from arXiv: 2606.11421 by Serhii Zabolotnii.

**Figure 2.** Figure 2: PMM3 captures a degree-3 efficiency reserve that SLS and GMM miss under symmetric [PITH_FULL_IMAGE:figures/full_fig_p020_2.png] view at source ↗

**Figure 3.** Figure 3: Control. Under Gaussian errors PMM2 and SLS incur no asymptotic efficiency loss [PITH_FULL_IMAGE:figures/full_fig_p021_3.png] view at source ↗

read the original abstract

We prove that optimally weighted second-order least squares (SLS) and the degree-two generalized polynomial maximization method (PMM) are the same population estimating equation for linear regression with conditionally homoskedastic non-Gaussian errors: they choose the same optimal linear combination of the first two centered residual moments, solve one population normal system, share one influence function, and attain the common asymptotic variance $c_2g_2/N$ -- the ordinary-least-squares slope-variance factor $c_2$ scaled by the PMM variance-reduction coefficient $g_2=1-\gamma_3^2/(2+\gamma_4)$ (with $\gamma_3,\gamma_4$ the error skewness and excess kurtosis). Feasible plug-in implementations are therefore first-order equivalent, with only higher-order finite-sample differences. The identity is sharp: under heteroskedasticity the unconditional PMM body and the conditional SLS weighting separate, costing efficiency for symmetric errors and consistency for asymmetric errors. Beyond degree two, PMM holds an efficiency reserve that SLS cannot reach within its second-moment span. For symmetric platykurtic errors SLS collapses to ordinary least squares for the slope, while degree-three PMM exploits kurtosis information outside the SLS moment span through a closed-form coefficient $g_3$; for canonical asymmetric laws this reserve is $30$--$50\%$ within the degree-three polynomial moment class. The Lean 4 development machine-checks the degree-specific algebraic core -- the closed forms for $g_2$ and $g_3$, the $g_2\le1$ result, the design cancellations, and the symmetric collapse -- while the general monotonicity $g_{S+1}\le g_S\le1$ is proved analytically by nesting. A Monte Carlo study illustrates the equivalence, the reserve, and the heteroskedastic boundary at finite samples.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows that optimally weighted SLS and degree-2 PMM coincide exactly as population estimating equations under conditional homoskedasticity, with Lean-verified closed forms for the efficiency factors.

read the letter

The core result is that these two moment-based procedures for linear regression pick the same linear combination of the first two centered residual moments when errors are conditionally homoskedastic. They share the normal equations, the influence function, and the asymptotic variance c2 g2 / N, where g2 = 1 - γ3² / (2 + γ4). The paper also gives an explicit g3 for the degree-three case and proves the monotonicity g_{S+1} ≤ g_S ≤ 1 analytically.

What stands out is the direct algebraic derivation from the population moments, with no fitted parameters or self-reference. The Lean 4 checks cover the closed forms, the design cancellations, the g2 ≤ 1 bound, and the symmetric collapse. The heteroskedasticity caveat is stated up front, and the Monte Carlo simply shows the finite-sample behavior inside the stated scope.

The limitation is the narrow setting. Everything is conditional on homoskedasticity and stays inside linear regression; once that assumption drops, the unconditional PMM and conditional SLS weighting diverge. The efficiency reserve for higher-degree PMM is shown, but the paper does not test whether that reserve holds up under misspecification or in finite samples beyond the illustrations.

This is for people already working with moment estimators in regression who need the explicit coefficients and the verified identities. A reader comparing SLS to polynomial maximization methods will find the unification useful. The formal verification and clear scope boundaries make it worth a referee's time.

Referee Report

0 major / 1 minor

Summary. The manuscript proves that optimally weighted second-order least squares (SLS) and the degree-two generalized polynomial maximization method (PMM) coincide as population estimating equations for linear regression under conditional homoskedasticity with non-Gaussian errors. They select the same optimal linear combination of the first two centered residual moments, solve identical normal equations, share an influence function, and attain the common asymptotic variance c₂g₂/N where g₂ = 1 - γ₃²/(2 + γ₄). The equivalence is shown to be sharp under heteroskedasticity (where unconditional PMM and conditional SLS weighting diverge), while higher-degree PMM retains efficiency reserves beyond the SLS second-moment span. The algebraic core (closed forms for g₂, g₃, g₂ ≤ 1, design cancellations, symmetric collapse) is machine-checked in Lean 4; monotonicity g_{S+1} ≤ g_S ≤ 1 is proved analytically by nesting; a Monte Carlo study illustrates finite-sample behavior and the heteroskedastic boundary.

Significance. If the result holds, the paper unifies two moment-based estimators by exhibiting an explicit algebraic identity at the population level, with the machine-checked Lean 4 proofs of the closed forms for g₂ and g₃, the g₂ ≤ 1 bound, design cancellations, and symmetric collapse constituting a verifiable strength. The analytic nesting proof for monotonicity and the Monte Carlo confirmation of the equivalence, reserve, and heteroskedasticity caveat further support the contribution. This clarifies when SLS is a special case of PMM and quantifies the efficiency gains available from higher polynomial moments for non-Gaussian errors.

minor comments (1)

§3, after Eq. (12): the notation for the feasible plug-in estimators could be clarified by explicitly distinguishing the population g₂ from its sample analogue ĝ₂ in the statement of first-order equivalence.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the careful reading of the manuscript and for the positive assessment. We are pleased that the referee recommends acceptance.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper presents a direct population-level algebraic proof that optimally weighted SLS and degree-2 PMM coincide as estimating equations under conditional homoskedasticity, deriving the shared influence function, normal equations, and asymptotic variance c2 g2/N from the same linear combination of centered residual moments. The derivation relies on explicit moment conditions and design cancellations without fitted parameters renamed as predictions or self-referential definitions. The Lean 4 machine-check verifies the closed forms for g2, g3, the g2 ≤ 1 bound, and symmetric collapse, while the general monotonicity is proved by analytic nesting; these are independent verifications rather than load-bearing self-citations. The heteroskedasticity caveat is stated as a boundary condition with Monte Carlo confirmation. No step reduces to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the linear regression model with conditionally homoskedastic errors and the existence of error moments up to order four; no free parameters are introduced or fitted, and no new entities are postulated.

axioms (2)

domain assumption Linear regression model Y = Xβ + ε with E(ε|X) = 0 and Var(ε|X) = σ² (conditional homoskedasticity)
Invoked throughout the abstract as the setting where the SLS-PMM identity holds; the abstract explicitly states the identity is sharp under heteroskedasticity.
domain assumption Existence of third and fourth moments of the error distribution (skewness γ3 and excess kurtosis γ4)
Required to define the efficiency coefficient g2 = 1 - γ3²/(2 + γ4) and the higher-degree reserve g3.

pith-pipeline@v0.9.1-grok · 5882 in / 1686 out tokens · 31378 ms · 2026-06-27T12:09:44.471545+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

23 extracted references · 7 canonical work pages

[1]

Peter J. Bickel. On adaptive estimation.The Annals of Statistics, 10(3):647–671, 1982. doi: 10.1214/aos/1176345863

work page doi:10.1214/aos/1176345863 1982
[2]

Asymptotic efficiency in estimation with conditional moment restrictions

Gary Chamberlain. Asymptotic efficiency in estimation with conditional moment restrictions. Journal of Econometrics, 34(3):305–334, 1987

1987
[3]

On linear and quadratic estimating functions.Biometrika, 74(3):591–597,

Martin Crowder. On linear and quadratic estimating functions.Biometrika, 74(3):591–597,
[4]

doi: 10.1093/biomet/74.3.591

work page doi:10.1093/biomet/74.3.591
[5]

V. P. Godambe. An optimum property of regular maximum likelihood estimation.The Annals of Mathematical Statistics, 31(4):1208–1211, 1960

1960
[6]

V. P. Godambe and M. E. Thompson. An extension of quasi-likelihood estimation.Journal of Statistical Planning and Inference, 22(2):137–152, 1989

1989
[7]

Large sample properties of generalized method of moments estimators

Lars Peter Hansen. Large sample properties of generalized method of moments estimators. Econometrica, 50(4):1029–1054, 1982

1982
[8]

Heyde.Quasi-Likelihood and Its Application: A General Approach to Optimal Parameter Estimation

Christopher C. Heyde.Quasi-Likelihood and Its Application: A General Approach to Optimal Parameter Estimation. Springer, New York, 1997

1997
[9]

J. R. M. Hosking. L-moments: analysis and estimation of distributions using linear combi- nations of order statistics.Journal of the Royal Statistical Society, Series B, 52(1):105–124, 1990

1990
[10]

Huber.Robust Statistics

Peter J. Huber.Robust Statistics. Wiley, 1981

1981
[11]

The efficiency of the second-order nonlinear least squares estimator and its extension.Annals of the Institute of Statistical Mathematics, 64:751–764,

Mijeong Kim and Yanyuan Ma. The efficiency of the second-order nonlinear least squares estimator and its extension.Annals of the Institute of Statistical Mathematics, 64:751–764,
[12]

doi: 10.1007/s10463-011-0332-y

work page doi:10.1007/s10463-011-0332-y
[13]

Kunchenko.Polynomial Parameter Estimations of Close to Gaussian Random Vari- ables

Yuriy P. Kunchenko.Polynomial Parameter Estimations of Close to Gaussian Random Vari- ables. Shaker Verlag, Aachen, 2002

2002
[14]

Jerry M. Mendel. Tutorial on higher-order statistics (spectra) in signal processing and system theory.Proceedings of the IEEE, 79(3):278–305, 1991

1991
[15]

Whitney K. Newey. Efficient estimation of models with conditional moment restrictions. In G. S. Maddala, C. R. Rao, and H. D. Vinod, editors,Handbook of Statistics, Vol. 11: Econo- metrics, pages 419–454. North-Holland, Amsterdam, 1993

1993
[16]

Lindsay, and Bing Li

Annie Qu, Bruce G. Lindsay, and Bing Li. Improving generalised estimating equations using quadratic inference functions.Biometrika, 87(4):823–836, 2000. doi: 10.1093/biomet/87.4.823

work page doi:10.1093/biomet/87.4.823 2000
[17]

Estimation of nonlinear Berkson-type measurement error models.Statistica Sinica, 13:1201–1210, 2003

Liqun Wang. Estimation of nonlinear Berkson-type measurement error models.Statistica Sinica, 13:1201–1210, 2003. 25

2003
[18]

Second-order nonlinear least squares estimation.Annals of the Institute of Statistical Mathematics, 60:883–900, 2008

Liqun Wang and Alexandre Leblanc. Second-order nonlinear least squares estimation.Annals of the Institute of Statistical Mathematics, 60:883–900, 2008. doi: 10.1007/s10463-007-0139-z

work page doi:10.1007/s10463-007-0139-z 2008
[19]

EstemPMM: Polynomial maximization method estimation.https://cran

Serhii Zabolotnii. EstemPMM: Polynomial maximization method estimation.https://cran. r-project.org/package=EstemPMM, 2026. R package version 0.4.0

2026
[20]

Warsza, and Oleksandr Tkachenko

Serhii Zabolotnii, Zygmunt L. Warsza, and Oleksandr Tkachenko. Polynomial estimation of linear regression parameters for the asymmetric PDF of errors. InAutomation 2018, Advances in Intelligent Systems and Computing, pages 758–772, Cham, 2018. Springer. doi: 10.1007/ 978-3-319-77179-3_75

2018
[21]

Warsza, and Oleksandr Tkachenko

Serhii Zabolotnii, Zygmunt L. Warsza, and Oleksandr Tkachenko. Estimation of linear re- gression parameters of symmetric non-Gaussian errors by polynomial maximization method. InAutomation 2019, Advances in Intelligent Systems and Computing, pages 636–649, Cham,

2019
[22]

doi: 10.1007/978-3-030-13273-6_59

Springer. doi: 10.1007/978-3-030-13273-6_59

work page doi:10.1007/978-3-030-13273-6_59
[23]

Serhii Zabolotnii, Oleksandr Tkachenko, Waldemar Nowakowski, and Zygmunt L. Warsza. Ap- plication of the polynomial maximization method for estimating nonlinear regression param- eters with non-Gaussian asymmetric errors. InAutomation 2024, Lecture Notes in Networks and Systems, pages 342–356, Cham, 2024. Springer. doi: 10.1007/978-3-031-78266-4_30. 26

work page doi:10.1007/978-3-031-78266-4_30 2024

[1] [1]

Peter J. Bickel. On adaptive estimation.The Annals of Statistics, 10(3):647–671, 1982. doi: 10.1214/aos/1176345863

work page doi:10.1214/aos/1176345863 1982

[2] [2]

Asymptotic efficiency in estimation with conditional moment restrictions

Gary Chamberlain. Asymptotic efficiency in estimation with conditional moment restrictions. Journal of Econometrics, 34(3):305–334, 1987

1987

[3] [3]

On linear and quadratic estimating functions.Biometrika, 74(3):591–597,

Martin Crowder. On linear and quadratic estimating functions.Biometrika, 74(3):591–597,

[4] [4]

doi: 10.1093/biomet/74.3.591

work page doi:10.1093/biomet/74.3.591

[5] [5]

V. P. Godambe. An optimum property of regular maximum likelihood estimation.The Annals of Mathematical Statistics, 31(4):1208–1211, 1960

1960

[6] [6]

V. P. Godambe and M. E. Thompson. An extension of quasi-likelihood estimation.Journal of Statistical Planning and Inference, 22(2):137–152, 1989

1989

[7] [7]

Large sample properties of generalized method of moments estimators

Lars Peter Hansen. Large sample properties of generalized method of moments estimators. Econometrica, 50(4):1029–1054, 1982

1982

[8] [8]

Heyde.Quasi-Likelihood and Its Application: A General Approach to Optimal Parameter Estimation

Christopher C. Heyde.Quasi-Likelihood and Its Application: A General Approach to Optimal Parameter Estimation. Springer, New York, 1997

1997

[9] [9]

J. R. M. Hosking. L-moments: analysis and estimation of distributions using linear combi- nations of order statistics.Journal of the Royal Statistical Society, Series B, 52(1):105–124, 1990

1990

[10] [10]

Huber.Robust Statistics

Peter J. Huber.Robust Statistics. Wiley, 1981

1981

[11] [11]

The efficiency of the second-order nonlinear least squares estimator and its extension.Annals of the Institute of Statistical Mathematics, 64:751–764,

Mijeong Kim and Yanyuan Ma. The efficiency of the second-order nonlinear least squares estimator and its extension.Annals of the Institute of Statistical Mathematics, 64:751–764,

[12] [12]

doi: 10.1007/s10463-011-0332-y

work page doi:10.1007/s10463-011-0332-y

[13] [13]

Kunchenko.Polynomial Parameter Estimations of Close to Gaussian Random Vari- ables

Yuriy P. Kunchenko.Polynomial Parameter Estimations of Close to Gaussian Random Vari- ables. Shaker Verlag, Aachen, 2002

2002

[14] [14]

Jerry M. Mendel. Tutorial on higher-order statistics (spectra) in signal processing and system theory.Proceedings of the IEEE, 79(3):278–305, 1991

1991

[15] [15]

Whitney K. Newey. Efficient estimation of models with conditional moment restrictions. In G. S. Maddala, C. R. Rao, and H. D. Vinod, editors,Handbook of Statistics, Vol. 11: Econo- metrics, pages 419–454. North-Holland, Amsterdam, 1993

1993

[16] [16]

Lindsay, and Bing Li

Annie Qu, Bruce G. Lindsay, and Bing Li. Improving generalised estimating equations using quadratic inference functions.Biometrika, 87(4):823–836, 2000. doi: 10.1093/biomet/87.4.823

work page doi:10.1093/biomet/87.4.823 2000

[17] [17]

Estimation of nonlinear Berkson-type measurement error models.Statistica Sinica, 13:1201–1210, 2003

Liqun Wang. Estimation of nonlinear Berkson-type measurement error models.Statistica Sinica, 13:1201–1210, 2003. 25

2003

[18] [18]

Second-order nonlinear least squares estimation.Annals of the Institute of Statistical Mathematics, 60:883–900, 2008

Liqun Wang and Alexandre Leblanc. Second-order nonlinear least squares estimation.Annals of the Institute of Statistical Mathematics, 60:883–900, 2008. doi: 10.1007/s10463-007-0139-z

work page doi:10.1007/s10463-007-0139-z 2008

[19] [19]

EstemPMM: Polynomial maximization method estimation.https://cran

Serhii Zabolotnii. EstemPMM: Polynomial maximization method estimation.https://cran. r-project.org/package=EstemPMM, 2026. R package version 0.4.0

2026

[20] [20]

Warsza, and Oleksandr Tkachenko

Serhii Zabolotnii, Zygmunt L. Warsza, and Oleksandr Tkachenko. Polynomial estimation of linear regression parameters for the asymmetric PDF of errors. InAutomation 2018, Advances in Intelligent Systems and Computing, pages 758–772, Cham, 2018. Springer. doi: 10.1007/ 978-3-319-77179-3_75

2018

[21] [21]

Warsza, and Oleksandr Tkachenko

Serhii Zabolotnii, Zygmunt L. Warsza, and Oleksandr Tkachenko. Estimation of linear re- gression parameters of symmetric non-Gaussian errors by polynomial maximization method. InAutomation 2019, Advances in Intelligent Systems and Computing, pages 636–649, Cham,

2019

[22] [22]

doi: 10.1007/978-3-030-13273-6_59

Springer. doi: 10.1007/978-3-030-13273-6_59

work page doi:10.1007/978-3-030-13273-6_59

[23] [23]

Serhii Zabolotnii, Oleksandr Tkachenko, Waldemar Nowakowski, and Zygmunt L. Warsza. Ap- plication of the polynomial maximization method for estimating nonlinear regression param- eters with non-Gaussian asymmetric errors. InAutomation 2024, Lecture Notes in Networks and Systems, pages 342–356, Cham, 2024. Springer. doi: 10.1007/978-3-031-78266-4_30. 26

work page doi:10.1007/978-3-031-78266-4_30 2024