Generalized optimal parameter-transfer learning through Mallows-type model averaging
Pith reviewed 2026-05-19 04:49 UTC · model grok-4.3
The pith
A Mallows-type criterion produces asymptotically optimal weights for combining source parameter estimates with a target model even when sources are misspecified.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The proposed Mallows-type model averaging obtains weights from a criterion that is unbiased for the target prediction risk up to a weight-independent term. These weights are asymptotically optimal for the target prediction risk when the target model is misspecified. When the target model is correctly specified, the weights asymptotically allocate positive weight only to informative source models. These results hold without any requirement that source models are correctly specified. The framework extends to semiparametric and panel data settings.
What carries the argument
The Mallows-type criterion for determining combination weights in the parameter-transfer framework, which extends the classical Mallows criterion by remaining unbiased for target prediction risk.
If this is right
- When the target model is misspecified, the weighted estimator achieves the lowest possible asymptotic prediction risk among all weight choices.
- When the target model is correctly specified, non-informative sources receive zero asymptotic weight.
- The approach applies to semiparametric models and panel data without requiring correct specification of any source.
- Only source parameter estimates need to be shared rather than raw datasets.
Where Pith is reading between the lines
- This weighting rule could reduce data-sharing requirements in privacy-sensitive applications.
- Similar unbiased criteria might be constructed for other loss functions or model classes.
- Finite-sample behavior in high-dimensional or small-target-sample regimes remains open for direct verification.
Load-bearing premise
The Mallows-type criterion provides an unbiased estimate of the target prediction risk up to a term that does not depend on the weights.
What would settle it
In large samples with a misspecified target model, if the prediction risk of the Mallows-weighted estimator exceeds the risk of the single best source estimator, the asymptotic optimality claim would be false.
Figures
read the original abstract
In many economic applications, multiple source datasets are available, but their effective combination is challenging due to heterogeneity across datasets. To address this problem, we study a parameter-transfer framework that shares only source-side estimates and propose a Mallows-type model averaging method for combining target and source models in the parametric setting. The weights are obtained from a Mallows-type criterion that is unbiased for the target prediction risk up to a weight-independent term, extending the classical Mallows criterion to the parameter-transfer framework. We establish that the proposed weights are asymptotically optimal when the target model is misspecified, and asymptotically allocate weights only to informative sources when the target model is correctly specified. These guarantees do not require any source model to be correctly specified. We also consider extensions of the framework to semiparametric and panel data settings. Simulation studies and house price application further demonstrate the effectiveness of our approach.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a Mallows-type model averaging procedure for parameter-transfer learning in the parametric setting. Weights are obtained from a criterion that is unbiased for target prediction risk up to a weight-independent term; the authors prove that these weights are asymptotically optimal when the target model is misspecified and that they asymptotically assign positive weight only to informative sources when the target is correctly specified. These guarantees are claimed to hold without any source model being correctly specified. Extensions to semiparametric and panel-data settings are developed, and the method is illustrated with simulations and a house-price application.
Significance. If the unbiasedness property and the asymptotic optimality results hold, the work supplies a theoretically grounded method for combining source estimates with a target sample under heterogeneity, without requiring correct specification of any source model. This is a useful contribution for economic applications that routinely face multiple heterogeneous datasets. The explicit asymptotic statements and the provision of reproducible simulation code would be strengths if delivered.
major comments (1)
- [Section 3 (or the section deriving the Mallows-type criterion)] The central unbiasedness claim (abstract and the derivation of the Mallows-type criterion) states that the criterion is unbiased for target prediction risk up to a weight-independent term after source estimators are plugged into the transfer map. When the transfer map is nonlinear or source estimators retain O(1/n_s) bias under misspecification, the cross term E[residual × transferred deviation] need not vanish or remain weight-independent; the manuscript must supply the precise conditions or bias-correction steps that guarantee this property, because the claim that the extension works without any source being correct rests on this step.
minor comments (3)
- [Asymptotic theorems] Ensure that all asymptotic statements explicitly list the rate conditions on the source sample sizes n_s relative to the target sample size n.
- [Simulation studies] In the simulation section, report the exact form of the transfer map used and confirm that the same map appears in both the theoretical derivations and the Monte Carlo design.
- [Empirical application] The house-price application would benefit from a brief table showing the estimated weights and the out-of-sample prediction error relative to the target-only estimator.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive feedback on our manuscript. We address the major comment below and are prepared to revise the paper accordingly to improve clarity on the key technical points.
read point-by-point responses
-
Referee: [Section 3 (or the section deriving the Mallows-type criterion)] The central unbiasedness claim (abstract and the derivation of the Mallows-type criterion) states that the criterion is unbiased for target prediction risk up to a weight-independent term after source estimators are plugged into the transfer map. When the transfer map is nonlinear or source estimators retain O(1/n_s) bias under misspecification, the cross term E[residual × transferred deviation] need not vanish or remain weight-independent; the manuscript must supply the precise conditions or bias-correction steps that guarantee this property, because the claim that the extension works without any source being correct rests on this step.
Authors: We appreciate the referee highlighting this subtlety in the unbiasedness argument. In Section 3, the Mallows-type criterion is derived under the parametric setting with the transfer map being a smooth (continuously differentiable) function of the source parameter estimates. Source estimators are assumed to be sqrt(n_s)-consistent for their pseudo-true values even under misspecification, which is standard in parametric M-estimation. The cross term E[residual × transferred deviation] is weight-independent because the transferred deviation is a linear (or first-order) function of the source estimation errors; by the orthogonality of target residuals to the target design and the law of large numbers applied to the target sample, its conditional expectation does not depend on the weights. For nonlinear transfer maps, a first-order Taylor expansion is used around the pseudo-true source parameters, with the remainder controlled uniformly in the weights by the o_p(1) terms under our regularity conditions (Assumptions 1-3 and Lemma 2). We agree that these conditions could be stated more explicitly. We will revise Section 3 to add a dedicated remark clarifying the precise conditions (including handling of O(1/n_s) bias terms, which are absorbed into the weight-independent component) under which unbiasedness holds without requiring any source model to be correct. This revision strengthens the exposition while preserving the main results. revision: yes
Circularity Check
No significant circularity; asymptotic claims derive independently from extended Mallows unbiasedness
full rationale
The paper derives a Mallows-type criterion shown to be unbiased for target prediction risk (up to a weight-independent term) via direct extension of the classical criterion to the parameter-transfer map. Asymptotic optimality when the target is misspecified, and selective weight allocation to informative sources when the target is correct, are then established as separate theoretical results. These do not reduce by construction to fitted inputs or self-citations; the unbiasedness step is proven rather than assumed, and the guarantees explicitly hold without requiring correct source models. The derivation chain remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Mallows-type criterion is unbiased for target prediction risk up to a weight-independent term
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The weights are obtained from a Mallows-type criterion that is unbiased for the target prediction risk up to a weight-independent term, extending the classical Mallows criterion to the parameter-transfer framework.
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We establish that the proposed weights are asymptotically optimal when the target model is misspecified, and asymptotically allocate weights only to informative sources when the target model is correctly specified.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in ":" * " " * FUNCTION f...
-
[2]
author Anselin, L. , year 2010 . title Thirty years of spatial econometrics . journal Papers in Regional Science volume 89 , pages 3--26
work page 2010
-
[3]
author Bastani, H. , author Simchi-Levi, D. , author Zhu, R. , year 2022 . title Meta dynamic pricing: Transfer learning across experiments . journal Management Science volume 68 , pages 1865--1881
work page 2022
-
[4]
Risk of transfer learning and its applications in finance,
author Cao, H. , author Gu, H. , author Guo, X. , author Rosenbaum, M. , year 2023 . title Risk of transfer learning and its applications in finance . note ArXiv:2311.03283
-
[5]
author Chen, X. , author Yu, D. , author Zhang, X. , year 2024 . title Optimal weighted random forests . journal Journal of Machine Learning Research volume 25 , pages 1--81
work page 2024
-
[6]
author Cliff, A.D. , author Ord, J.K. , year 1973 . title Spatial autocorrelation . publisher Pion , address London
work page 1973
-
[7]
author Elhorst, J.P. , et al., year 2014 . title Spatial econometrics: From cross-sectional data to spatial panels . volume volume 479 . publisher Springer
work page 2014
-
[8]
author Fu, J. , author Yuan, F. , author Song, Y. , author Yuan, Z. , author Cheng, M. , author Cheng, S. , author Zhang, J. , author Wang, J. , author Pan, Y. , year 2024 . title Exploring adapter-based transfer learning for recommender systems: Empirical studies and practical insights , in: booktitle Proceedings of the 17th ACM international conference ...
work page 2024
-
[9]
author Hansen, B.E. , year 2007 . title Least squares model averaging . journal Econometrica volume 75 , pages 1175--1189
work page 2007
-
[10]
author Hansen, B.E. , author Racine, J.S. , year 2012 . title Jackknife model averaging . journal Journal of Econometrics volume 167 , pages 38--46
work page 2012
-
[11]
author Hu, X. , author Zhang, X. , year 2023 . title Optimal parameter-transfer learning by semiparametric model averaging . journal Journal of Machine Learning Research volume 24 , pages 1--53
work page 2023
-
[12]
author Kelejian, H.H. , author Prucha, I.R. , year 1998 . title A generalized spatial two-stage least squares procedure for estimating a spatial autoregressive model with autoregressive disturbances . journal The Journal of Real Estate Finance and Economics volume 17 , pages 99--121
work page 1998
-
[13]
author Lebichot, B. , author Le Borgne, Y.A. , author He-Guelton, L. , author Obl \'e , F. , author Bontempi, G. , year 2020 . title Deep-learning domain adaptation techniques for credit cards fraud detection , in: booktitle Recent advances in big data and deep learning: proceedings of the INNS big data and deep learning conference, held at Sestri Levante...
work page 2020
-
[14]
author Lee, L.-F. , year 2004 . title Asymptotic distributions of quasi-maximum likelihood estimators for spatial autoregressive models . journal Econometrica volume 72 , pages 1899--1925
work page 2004
-
[15]
author Li, F. , author Zhou, T. , year 2019 . title Effects of urban form on air quality in C hina: An analysis based on the spatial autoregressive model . journal Cities volume 89 , pages 130--140
work page 2019
-
[16]
author Li, S. , author Cai, T.T. , author Li, H. , year 2022 . title Transfer learning for high-dimensional linear regression: Prediction, estimation and minimax optimality . journal Journal of the Royal Statistical Society Series B: Statistical Methodology volume 84 , pages 149--173
work page 2022
-
[17]
author Li, Y. , author Kanbur, R. , author Lin, C. , year 2019 . title Minimum wage competition between local governments in C hina . journal The Journal of Development Studies volume 55 , pages 2479--2494
work page 2019
-
[18]
author Liao, J. , author Zou, G. , author Gao, Y. , author Zhang, X. , year 2021 . title Model averaging prediction for time series models with a diverging number of parameters . journal Journal of Econometrics volume 223 , pages 190--221
work page 2021
-
[19]
author Lin, X. , author Lee, L.-F. , year 2010 . title GMM estimation of spatial autoregressive models with unknown heteroskedasticity . journal Journal of Econometrics volume 157 , pages 34--52
work page 2010
-
[20]
author Liu, Q. , author Okui, R. , author Yoshimura, A. , year 2016 . title Generalized least squares model averaging . journal Econometric Reviews volume 35 , pages 1692--1752
work page 2016
-
[21]
author Nguyen, T.T. , author Yoon, S. , year 2019 . title A novel approach to short-term stock price movement prediction using transfer learning . journal Applied Sciences volume 9 , pages 4745
work page 2019
-
[22]
author Ord, K. , year 1975 . title Estimation methods for models of spatial interaction . journal Journal of the American Statistical Association volume 70 , pages 120--126
work page 1975
-
[23]
author Pan, S.J. , author Yang, Q. , year 2009 . title A survey on transfer learning . journal IEEE Transactions on Knowledge and Data Engineering volume 22 , pages 1345--1359
work page 2009
-
[24]
author Qiu, Y. , author Wang, W. , author Xie, T. , author Yu, J. , author Zhang, X. , year 2023 . title Boosting store sales through machine learning-informed promotional decisions . journal Available at SSRN 4605803
work page 2023
-
[25]
author Racine, J.S. , author Li, Q. , author Yu, D. , author Zheng, L. , year 2023 . title Optimal model averaging of mixed-data kernel-weighted spline regressions . journal Journal of Business & Economic Statistics volume 41 , pages 1251--1261
work page 2023
-
[26]
author Saputro, D. , author Sulistyaningsih, S. , author Widyaningsih, P. , year 2021 . title Spatial autoregressive ( SAR ) model with ensemble learning-multiplicative noise with lognormal distribution (case on poverty data in E ast JAVA ) . journal Media Statistika volume 14 , pages 89--97
work page 2021
-
[27]
author Seebaluck-Sandoram, R. , author Mahomoodally, F. , year 2017 . title Management of infectious diseases in A frica , in: booktitle Medicinal spices and vegetables from Africa . publisher Elsevier , pp. pages 133--151
work page 2017
-
[28]
author Tian, Y. , author Feng, Y. , year 2023 . title Transfer learning under high-dimensional generalized linear models . journal Journal of the American Statistical Association volume 118 , pages 2684--2697
work page 2023
-
[29]
author Wan, A.T. , author Zhang, X. , author Zou, G. , year 2010 . title Least squares model averaging by M allows criterion . journal Journal of Econometrics volume 156 , pages 277--283
work page 2010
-
[30]
author White, H. , year 1982 . title Maximum likelihood estimation of misspecified models . journal Econometrica volume 50 , pages 1--25
work page 1982
-
[31]
author Wu, D. , author Wang, X. , author Wu, S. , year 2022 . title Jointly modeling transfer learning of industrial chain information and deep learning for stock prediction . journal Expert Systems with Applications volume 191 , pages 116257
work page 2022
-
[32]
author Yu, H. , author Li, J. , author Bardin, S. , author Gu, H. , author Fan, C. , year 2021 . title Spatiotemporal dynamic of COVID-19 diffusion in C hina: A dynamic spatial autoregressive model analysis . journal ISPRS International Journal of Geo-Information volume 10 , pages 510
work page 2021
-
[33]
author Yuan, F. , author Yao, L. , author Benatallah, B. , year 2019 . title DAR ec: D eep domain adaptation for cross-domain recommendation via transferring rating patterns , in: booktitle Proceedings of the 28th International Joint Conference on Artificial Intelligence , publisher AAAI Press . p. pages 4227–4233
work page 2019
-
[34]
author Zeng, H. , author Zhong, W. , author Xu, X. , year 2024 . title Transfer learning for spatial autoregressive models with application to US presidential election prediction . note ArXiv:2405.15600
-
[35]
author Zhang, X. , year 2021 . title A new study on asymptotic optimality of least squares model averaging . journal Econometric Theory volume 37 , pages 388--407
work page 2021
-
[36]
author Zhang, X. , author Liu, C.A. , year 2023 . title Model averaging prediction by K -fold cross-validation . journal Journal of Econometrics volume 235 , pages 280--301
work page 2023
-
[37]
author Zhang, X. , author Liu, H. , author Wei, Y. , author Ma, Y. , year 2024 . title Prediction using many samples with models possibly containing partially shared parameters . journal Journal of Business & Economic Statistics volume 42 , pages 187--196
work page 2024
-
[38]
author Zhang, X. , author Yu, J. , year 2018 . title Spatial weights matrix selection and model averaging for spatial autoregressive models . journal Journal of Econometrics volume 203 , pages 1--18
work page 2018
-
[39]
author Zhang, X. , author Zou, G. , author Carroll, R.J. , year 2015 . title Model averaging based on K ullback- L eibler distance . journal Statistica Sinica volume 25 , pages 1583--1598
work page 2015
-
[40]
author Zhang, X. , author Zou, G. , author Liang, H. , author Carroll, R.J. , year 2020 . title Parsimonious model averaging with a diverging number of parameters . journal Journal of the American Statistical Association volume 115 , pages 972--984
work page 2020
-
[41]
author Zhu, R. , author Wan, A.T. , author Zhang, X. , author Zou, G. , year 2019 . title A M allows-type model averaging estimator for the varying-coefficient partially linear model . journal Journal of the American Statistical Association volume 114 , pages 882--892
work page 2019
-
[42]
author Zou, H. , author Zhang, H.H. , year 2009 . title On the adaptive elastic-net with a diverging number of parameters . journal Annals of Statistics volume 37 , pages 1733--1751
work page 2009
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.