Recognition: unknown
Combining pre-trained models via localized model averaging
Pith reviewed 2026-05-14 17:53 UTC · model grok-4.3
The pith
Modeling averaging weights as functions of covariates yields asymptotically optimal in-sample and out-of-sample risks when combining pre-trained models.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We introduce localized model averaging where the weights are modeled as functions of the covariates, allowing the procedure to capture varying relative advantages of pre-trained models across heterogeneous contexts. Under a general loss framework, we establish asymptotic optimality for both in-sample and out-of-sample risks together with consistency of the estimated weights.
What carries the argument
Localized weights expressed as functions of covariates and learned under a general loss.
If this is right
- The averaging procedure adapts automatically to changes in input context.
- Both in-sample and out-of-sample risks converge to the best attainable level.
- The estimated weights are consistent for the true optimal local weights.
- The same framework applies across a wide range of prediction tasks via the general loss.
- No fixed set of weights is required when model rankings shift with covariates.
Where Pith is reading between the lines
- The same localized-weight idea could be tested on ensembles of fine-tuned models rather than only off-the-shelf pre-trained ones.
- Implementation would require only that the weight functions be parameterized flexibly enough to capture the relevant covariate effects.
- If the consistency result holds, practitioners could replace manual model selection with a single fitted weight surface.
- Extensions to streaming or non-stationary data would need to check whether the same asymptotic arguments still apply.
Load-bearing premise
The data conditions permit consistent estimation of the covariate-dependent local weights under the chosen general loss.
What would settle it
A dataset or simulation in which the estimated weights fail to converge to the optimal local weights or the achieved risk stays a fixed amount above the oracle risk as sample size grows.
Figures
read the original abstract
Many pre-trained models (PTMs) are available in modern applications. Because different PTMs are often trained on different datasets, their performances can vary substantially for different new tasks, and the ranking of the candidates may depend heavily on the input. Motivated by this, we propose a localized model averaging method with weights modeled as functions of the covariates, making it substantially more versatile than existing model averaging methods. This formulation allows the model averaging procedure to adaptively capture the varying relative advantages of different PTMs across heterogeneous contexts. Specifically, we learn flexible local weights under a general loss framework that accommodates a broad class of prediction tasks. We further establish the asymptotic optimality of the proposed method for both in-sample and out-of-sample risks, as well as the consistency of the estimated weights. Extensive numerical experiments further demonstrate the effectiveness of the proposed method.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a localized model averaging procedure for combining pre-trained models, in which the averaging weights are modeled as flexible functions of the covariates rather than global constants. Under a general loss framework, the authors claim to establish asymptotic optimality of the resulting estimator for both in-sample and out-of-sample risks together with consistency of the estimated local weights, and they support these claims with numerical experiments on synthetic and real data.
Significance. If the asymptotic results are rigorously established, the work would provide a statistically grounded method for adaptive combination of pre-trained models that respects heterogeneity in covariate space, extending classical model averaging to settings where relative model performance varies locally. The general-loss formulation and out-of-sample optimality claim would be particularly useful for modern prediction pipelines.
major comments (2)
- [§3.2, Theorem 3.2] §3.2, Theorem 3.2: the out-of-sample asymptotic optimality result requires uniform convergence of the nonparametric local-weight estimators over the entire covariate support, yet the stated regularity conditions do not explicitly include the Hölder smoothness order of the weight functions or the precise bandwidth rates needed to guarantee the uniform rate; without these, the oracle-risk property may fail in regions of low design density.
- [Assumption 2.3] Assumption 2.3 and the proof of consistency: the conditions allowing consistent estimation of the local weights under a general loss are given, but it is not shown that these conditions are sufficient to control the remainder term when the loss is non-smooth or when the covariate density is unbounded, which is load-bearing for the claimed out-of-sample optimality.
minor comments (2)
- [§2] The notation for the local weight functions w_k(x) is introduced without an explicit statement of the dimension of x or the support of the covariate distribution, which affects readability of the subsequent convergence arguments.
- [§5] In the numerical experiments, the tables reporting risk values do not include standard errors or the number of Monte Carlo replications, making it difficult to assess the statistical significance of the reported improvements.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive comments, which help strengthen the rigor of our asymptotic results. We address each major comment below and will revise the manuscript to incorporate the necessary clarifications and additions.
read point-by-point responses
-
Referee: [§3.2, Theorem 3.2] §3.2, Theorem 3.2: the out-of-sample asymptotic optimality result requires uniform convergence of the nonparametric local-weight estimators over the entire covariate support, yet the stated regularity conditions do not explicitly include the Hölder smoothness order of the weight functions or the precise bandwidth rates needed to guarantee the uniform rate; without these, the oracle-risk property may fail in regions of low design density.
Authors: We agree that the regularity conditions in the manuscript are incomplete for guaranteeing uniform convergence over the full covariate support. In the revised version, we will explicitly augment the assumptions to include the Hölder smoothness order α of the weight functions and specify the bandwidth rates (e.g., h_n = O(n^{-1/(2α + d)}) with n h_n^d → ∞) required for the uniform rate. We will add a supporting lemma establishing sup-norm convergence of the local-weight estimators, incorporating standard trimming or boundary corrections to handle low-density regions, thereby ensuring the oracle-risk property holds uniformly. revision: yes
-
Referee: [Assumption 2.3] Assumption 2.3 and the proof of consistency: the conditions allowing consistent estimation of the local weights under a general loss are given, but it is not shown that these conditions are sufficient to control the remainder term when the loss is non-smooth or when the covariate density is unbounded, which is load-bearing for the claimed out-of-sample optimality.
Authors: The referee correctly notes that the current proof does not explicitly bound the remainder term under non-smooth losses or unbounded densities. We will revise the proof of consistency under Assumption 2.3 to include these controls: we will add the assumption that the loss is uniformly Lipschitz continuous (standard for general losses and sufficient to handle non-smoothness) and restrict attention to compact sets where the covariate density is bounded away from zero and infinity, with a brief discussion of tail truncation for unbounded cases. These additions will make the out-of-sample optimality claim rigorous. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper proposes localized model averaging with covariate-dependent weights under a general loss, then claims to establish asymptotic optimality for in-sample/out-of-sample risks plus weight consistency via theoretical analysis. No steps reduce by construction to fitted inputs, self-definitions, or load-bearing self-citations; the optimality follows from standard consistency arguments under stated data conditions rather than renaming or smuggling ansatzes. The derivation remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption General loss framework accommodates broad class of prediction tasks
- domain assumption Data distribution permits consistent estimation of local weights
Reference graph
Works this paper leans on
-
[1]
The Annals of Statistics , volume=
Functional aggregation for nonparametric regression , author=. The Annals of Statistics , volume=. 2000 , publisher=
2000
-
[2]
Combining forecasting procedures:
Yang, Yuhong , journal=. Combining forecasting procedures:. 2004 , publisher=
2004
-
[3]
Journal of Econometrics , volume=
Adaptively combined forecasting for discrete response time series , author=. Journal of Econometrics , volume=. 2013 , publisher=
2013
-
[4]
Journal of the American Statistical Association , volume=
Adaboost semiparametric model averaging prediction for multiple categories , author=. Journal of the American Statistical Association , volume=. 2022 , publisher=
2022
-
[5]
Economics Letters , year=2017, volume=
Xie, Tian , title=. Economics Letters , year=2017, volume=
2017
-
[6]
Model averaging based on
Zhang, Xinyu and Zou, Guohua and Carroll, Raymond J , journal=. Model averaging based on. 2015 , publisher=
2015
-
[7]
Economics Letters , volume=
Prediction model averaging estimator , author=. Economics Letters , volume=. 2015 , publisher=
2015
-
[8]
Journal of Econometrics , volume=
Toward optimal model averaging in regression models with time series errors , author=. Journal of Econometrics , volume=. 2015 , publisher=
2015
-
[9]
Journal of Applied Econometrics , pages=
Feasible cross-validatory model selection for general stationary processes , author=. Journal of Applied Econometrics , pages=. 1997 , publisher=
1997
-
[10]
1952 , publisher=
Inequalities , author=. 1952 , publisher=
1952
-
[11]
, title =
Liu, C.-A. , title =. Journal of Econometrics , year =
-
[12]
Carroll , title =
Hua Liang, and Suojin Wang, and Raymond J. Carroll , title =. Biometrika , year =
-
[13]
and Linton, O
Li, D. and Linton, O. and Lu, Z. , title =. Journal of Econometrics , year =
-
[14]
Journal of Nonparametric Statistics , year =
Na Li, and Xingzhong Xu, and Pei Jin , title =. Journal of Nonparametric Statistics , year =
-
[15]
Hansen, B. E. , title =. Quantitative Economics , year =
-
[16]
IEEE Transactions on Information Theory , volume=
Information theory and mixing least-squares regressions , author=. IEEE Transactions on Information Theory , volume=. 2006 , publisher=
2006
-
[17]
Review of Finance , year =
Dieckmann, Stephan and Plank, Thomas , title =. Review of Finance , year =
-
[18]
Journal of the American Statistical Association , volume=
Semiparametric estimates of the relation between weather and electricity sales , author=. Journal of the American Statistical Association , volume=. 1986 , publisher=
1986
-
[19]
Journal of Multivariate Analysis , volume=
Local linear estimation in partly linear models , author=. Journal of Multivariate Analysis , volume=. 1997 , publisher=
1997
-
[20]
and Su, L
Lu, X. and Su, L. , year =. Jackknife model averaging for quantile regressions , journal =
-
[21]
2000 , publisher=
Partially linear models , author=. 2000 , publisher=
2000
-
[22]
Magnus, J. R. and Wan, A. T. K. and Zhang, X. , journal=. Weighted average least squares estimation with nonspherical disturbances and an application to the. 2011 , publisher=
2011
-
[23]
Theory of Probability & Its Applications , volume=
Bounds for the moments of linear and quadratic forms in independent variables , author=. Theory of Probability & Its Applications , volume=. 1960 , publisher=
work page 1960
-
[24]
Root-N-Consistent Semiparametric Regression , author =. Econometrica , volume =
-
[25]
Econometric Theory , year=2005, volume=
Juhl, Ted and Xiao, Zhijie , title=. Econometric Theory , year=2005, volume=
work page 2005
-
[26]
Journal of Econometrics , year=1996, volume=
Li, Qi and Stengos, Thanasis , title=. Journal of Econometrics , year=1996, volume=
work page 1996
-
[27]
Journal of Econometrics , year=2010, volume=
Su, Liangjun and Jin, Sainan , title=. Journal of Econometrics , year=2010, volume=
work page 2010
-
[28]
Econometric Theory , year=2010, volume=
Su, Liangjun and White, Halbert , title=. Econometric Theory , year=2010, volume=
2010
- [29]
-
[30]
Annals of Economics and Finance , year=2005, volume=
Yiguo Sun , title=. Annals of Economics and Finance , year=2005, volume=
2005
-
[31]
Econometric Theory , volume =
Lee,Sokbae , title =. Econometric Theory , volume =. 2003 , pages =
2003
-
[32]
Computational Statistics & Data Analysis , volume =
Hua Liang , title =. Computational Statistics & Data Analysis , volume =. 2006 , pages =
2006
-
[33]
Jerome M. Krief , title =. Econometric Theory , volume =. 2013 , pages =
work page 2013
-
[34]
Journal of the Royal Statistical Society
Spline Smoothing in a Partly Linear Model , author =. Journal of the Royal Statistical Society. Series B (Methodological) , volume =
- [35]
- [36]
-
[37]
Estimation in a semiparametric partially linear errors-in-variables model , author=
-
[38]
Household Gasoline Demand in the United States , author =. Econometrica , volume =
-
[39]
Variable Selection in Nonparametric Varying-Coefficient Models for Analysis of Repeated Measurements , author=
-
[40]
Moment-Based Method For Random Effects Selection In Linear Mixed Models , author=. Statistica Sinica , volume=
-
[41]
New Estimation and Model Selection Procedures for Semiparametric Modeling in Longitudinal Data Analysis , author=
-
[42]
Variable selection in semiparametric regression modeling , author=
-
[43]
Akaike , title =
H. Akaike , title =. 1973 , journal =
1973
-
[44]
Operations Research Quarterly , volume =
The Combination of Forecasts , author =. Operations Research Quarterly , volume =
-
[45]
Biometrics , volume=
Joint Variable Selection for Fixed and Random Effects in Linear Mixed-Effects Models , author=. Biometrics , volume=. 2010 , publisher=
2010
-
[46]
and Carroll, R
Claeskens, G. and Carroll, R. J. , year = 2007, title =
2007
-
[47]
and Croux, C
Claeskens, G. and Croux, C. and. Variable selection for logistic regression using a prediction-focused information criterion , journal = bioc, volume = 62, number=4, pages =
- [48]
-
[49]
P. J. Green, and Bernard. W. Silverman , title =
-
[50]
Donohue, M. C. and Overholser, R. and Xu, R. and Vaida, F. , title =. Biometrika , volume =. 2011 , pages =
2011
-
[51]
and Kneib, T
Greven, S. and Kneib, T. , title =. 2010 , pages =
2010
-
[52]
, title=
Chang, Roberto and Kaltani, Linda and Loayza, Norman V. , title=. Journal of Development Economics , year=2009, volume=
2009
-
[53]
Hjort, N. L. and Claeskens, G. , year = 2003, title =
work page 2003
-
[54]
Fixed and random effects selection in mixed effects models , author=. Biometrics , volume=. 2011 , publisher=
work page 2011
-
[55]
Gen. A forecast comparison of residential housing prices by parametric versus semiparametric conditional mean estimators , journal=
-
[56]
Andrews, D.W.K. , journal=. Asymptotic optimality of generalized. 1991 , publisher=
work page 1991
-
[57]
Single-index model selections , author=. Biometrika , volume=. 2001 , publisher=
work page 2001
-
[58]
and Wu, H
Liang, H. and Wu, H. and Zou, G. , year = 2008, title =
2008
-
[59]
and Zou, G
Liang, H. and Zou, G. and Wan, A. T. K. and Zhang, X. , year =. Optimal weight choice for frequentist model average estimators , journal = jasa, volume =
-
[60]
Journal of Applied Econometrics , volume =
Tan, Chih Ming , title =. Journal of Applied Econometrics , volume =
-
[61]
Magnus, and Wendun Wang , title =
Jan R. Magnus, and Wendun Wang , title =. Oxford Bulletin of Economics and Statistics , year =
-
[62]
Determinants of Long-Term Growth: A
Xavier. Determinants of Long-Term Growth: A. American Economic Review , volume =
-
[63]
D. McMahon and M. Lederman and D. W. Haas and R. Haubrich and J. Stanford and E. Cooney and J. Horton and D. Kelleher and L. Ross and A. Cutrell and D. Lee and W. Spreen and J. W. Mellors , title =. Antiviral Therapy , year =
-
[64]
Berger , title =
James O. Berger , title =
-
[65]
Miller, A. J. , title =
-
[66]
David Ruppert, and M. P. Wand, and R. J. Carroll , title =
-
[67]
Durlauf, S. N., and Johnson, P. A., and Temple, J. R. W , title =. Handbook of Economic Growth , address =
-
[68]
Adonis Yatchew , title =
-
[69]
Yatchew , title=
A. Yatchew , title=. Journal of Applied Econometrics , year=2000, volume=
2000
-
[70]
Econometrica , year=2001, volume=
Adonis Yatchew and Joungyeo Angela No , title=. Econometrica , year=2001, volume=
work page 2001
-
[71]
Pu, W. and Niu, X.-F. , title =. Journal of Multivariate Analysis , volume =. 2006 , pages =
work page 2006
-
[72]
Robinson, G. K. , title =. Statistical Science , volume =. 1991 , pages =
work page 1991
- [73]
-
[74]
Vaida, F. and Blanchard, S. , title =. Biometrika , volume =. 2005 , pages =
work page 2005
-
[75]
Wan, A. T. K. and Zhang, X. and Zou, G. , title =. Journal of Econometrics , year =
- [76]
- [77]
- [78]
-
[79]
Zhang, X. and Liang, H. , year =. Focused information criterion and model averaging for generalized additive partial linear models , journal = annals, volume =
-
[80]
Zhang, X. and Wan, A. T. K. and Zhou, S. Z. , title =. Journal of Business & Economic Statistics , volume =. 2012 , pages =
work page 2012
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.