pith. sign in

arxiv: 2507.14453 · v2 · submitted 2025-07-19 · 📊 stat.ME

Generalized optimal parameter-transfer learning through Mallows-type model averaging

Pith reviewed 2026-05-19 04:49 UTC · model grok-4.3

classification 📊 stat.ME
keywords model averagingparameter transferMallows criterionasymptotic optimalitymisspecified modelsprediction risksource weighting
0
0 comments X

The pith

A Mallows-type criterion produces asymptotically optimal weights for combining source parameter estimates with a target model even when sources are misspecified.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a method to combine estimates from multiple source datasets with a target dataset by sharing only the source parameter estimates. It extends the classical Mallows criterion to obtain weights that are unbiased for the target prediction risk up to a weight-independent term. The authors prove that these weights achieve the lowest possible asymptotic prediction risk when the target model is misspecified, and they concentrate on informative sources when the target model is correct. This matters in settings like economics where datasets are heterogeneous and full data sharing is often impractical. The optimality results hold without assuming that any source model is correctly specified.

Core claim

The proposed Mallows-type model averaging obtains weights from a criterion that is unbiased for the target prediction risk up to a weight-independent term. These weights are asymptotically optimal for the target prediction risk when the target model is misspecified. When the target model is correctly specified, the weights asymptotically allocate positive weight only to informative source models. These results hold without any requirement that source models are correctly specified. The framework extends to semiparametric and panel data settings.

What carries the argument

The Mallows-type criterion for determining combination weights in the parameter-transfer framework, which extends the classical Mallows criterion by remaining unbiased for target prediction risk.

If this is right

  • When the target model is misspecified, the weighted estimator achieves the lowest possible asymptotic prediction risk among all weight choices.
  • When the target model is correctly specified, non-informative sources receive zero asymptotic weight.
  • The approach applies to semiparametric models and panel data without requiring correct specification of any source.
  • Only source parameter estimates need to be shared rather than raw datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This weighting rule could reduce data-sharing requirements in privacy-sensitive applications.
  • Similar unbiased criteria might be constructed for other loss functions or model classes.
  • Finite-sample behavior in high-dimensional or small-target-sample regimes remains open for direct verification.

Load-bearing premise

The Mallows-type criterion provides an unbiased estimate of the target prediction risk up to a term that does not depend on the weights.

What would settle it

In large samples with a misspecified target model, if the prediction risk of the Mallows-weighted estimator exceeds the risk of the single best source estimator, the asymptotic optimality claim would be false.

Figures

Figures reproduced from arXiv: 2507.14453 by Fen Jiang, Wenhui Li, Xinyu Zhang.

Figure 1
Figure 1. Figure 1: The boxplots of MSEβ and MSEµ of different methods when all candidate models are correct. size n to vary in {100, 225, 400, 625, 900, 1024}. Other settings remain the same as in the previous simulations. Then, we conduct 100 replications for each simulation [PITH_FULL_IMAGE:figures/full_fig_p015_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The boxplots of MSEβ and MSEµ of different methods with the correct target model and partially misspecified source models. 5. Prediction of infection counts in Africa In this section, we apply the proposed method to predict infection counts for infectious diseases in Africa. Studies indicate that Africa is a highly affected region for infectious diseases (Seebaluck-Sandoram and Mahomoodally, 2017). The inf… view at source ↗
Figure 3
Figure 3. Figure 3: The boxplot of MSEµ of different methods with the misspecified target model and partially misspecified source models [PITH_FULL_IMAGE:figures/full_fig_p017_3.png] view at source ↗
read the original abstract

In many economic applications, multiple source datasets are available, but their effective combination is challenging due to heterogeneity across datasets. To address this problem, we study a parameter-transfer framework that shares only source-side estimates and propose a Mallows-type model averaging method for combining target and source models in the parametric setting. The weights are obtained from a Mallows-type criterion that is unbiased for the target prediction risk up to a weight-independent term, extending the classical Mallows criterion to the parameter-transfer framework. We establish that the proposed weights are asymptotically optimal when the target model is misspecified, and asymptotically allocate weights only to informative sources when the target model is correctly specified. These guarantees do not require any source model to be correctly specified. We also consider extensions of the framework to semiparametric and panel data settings. Simulation studies and house price application further demonstrate the effectiveness of our approach.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 3 minor

Summary. The paper proposes a Mallows-type model averaging procedure for parameter-transfer learning in the parametric setting. Weights are obtained from a criterion that is unbiased for target prediction risk up to a weight-independent term; the authors prove that these weights are asymptotically optimal when the target model is misspecified and that they asymptotically assign positive weight only to informative sources when the target is correctly specified. These guarantees are claimed to hold without any source model being correctly specified. Extensions to semiparametric and panel-data settings are developed, and the method is illustrated with simulations and a house-price application.

Significance. If the unbiasedness property and the asymptotic optimality results hold, the work supplies a theoretically grounded method for combining source estimates with a target sample under heterogeneity, without requiring correct specification of any source model. This is a useful contribution for economic applications that routinely face multiple heterogeneous datasets. The explicit asymptotic statements and the provision of reproducible simulation code would be strengths if delivered.

major comments (1)
  1. [Section 3 (or the section deriving the Mallows-type criterion)] The central unbiasedness claim (abstract and the derivation of the Mallows-type criterion) states that the criterion is unbiased for target prediction risk up to a weight-independent term after source estimators are plugged into the transfer map. When the transfer map is nonlinear or source estimators retain O(1/n_s) bias under misspecification, the cross term E[residual × transferred deviation] need not vanish or remain weight-independent; the manuscript must supply the precise conditions or bias-correction steps that guarantee this property, because the claim that the extension works without any source being correct rests on this step.
minor comments (3)
  1. [Asymptotic theorems] Ensure that all asymptotic statements explicitly list the rate conditions on the source sample sizes n_s relative to the target sample size n.
  2. [Simulation studies] In the simulation section, report the exact form of the transfer map used and confirm that the same map appears in both the theoretical derivations and the Monte Carlo design.
  3. [Empirical application] The house-price application would benefit from a brief table showing the estimated weights and the out-of-sample prediction error relative to the target-only estimator.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and constructive feedback on our manuscript. We address the major comment below and are prepared to revise the paper accordingly to improve clarity on the key technical points.

read point-by-point responses
  1. Referee: [Section 3 (or the section deriving the Mallows-type criterion)] The central unbiasedness claim (abstract and the derivation of the Mallows-type criterion) states that the criterion is unbiased for target prediction risk up to a weight-independent term after source estimators are plugged into the transfer map. When the transfer map is nonlinear or source estimators retain O(1/n_s) bias under misspecification, the cross term E[residual × transferred deviation] need not vanish or remain weight-independent; the manuscript must supply the precise conditions or bias-correction steps that guarantee this property, because the claim that the extension works without any source being correct rests on this step.

    Authors: We appreciate the referee highlighting this subtlety in the unbiasedness argument. In Section 3, the Mallows-type criterion is derived under the parametric setting with the transfer map being a smooth (continuously differentiable) function of the source parameter estimates. Source estimators are assumed to be sqrt(n_s)-consistent for their pseudo-true values even under misspecification, which is standard in parametric M-estimation. The cross term E[residual × transferred deviation] is weight-independent because the transferred deviation is a linear (or first-order) function of the source estimation errors; by the orthogonality of target residuals to the target design and the law of large numbers applied to the target sample, its conditional expectation does not depend on the weights. For nonlinear transfer maps, a first-order Taylor expansion is used around the pseudo-true source parameters, with the remainder controlled uniformly in the weights by the o_p(1) terms under our regularity conditions (Assumptions 1-3 and Lemma 2). We agree that these conditions could be stated more explicitly. We will revise Section 3 to add a dedicated remark clarifying the precise conditions (including handling of O(1/n_s) bias terms, which are absorbed into the weight-independent component) under which unbiasedness holds without requiring any source model to be correct. This revision strengthens the exposition while preserving the main results. revision: yes

Circularity Check

0 steps flagged

No significant circularity; asymptotic claims derive independently from extended Mallows unbiasedness

full rationale

The paper derives a Mallows-type criterion shown to be unbiased for target prediction risk (up to a weight-independent term) via direct extension of the classical criterion to the parameter-transfer map. Asymptotic optimality when the target is misspecified, and selective weight allocation to informative sources when the target is correct, are then established as separate theoretical results. These do not reduce by construction to fitted inputs or self-citations; the unbiasedness step is proven rather than assumed, and the guarantees explicitly hold without requiring correct source models. The derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review conducted from abstract only; full derivations and assumptions not available. No explicit free parameters or invented entities are described. The core assumption is the unbiasedness property of the extended Mallows criterion.

axioms (1)
  • domain assumption Mallows-type criterion is unbiased for target prediction risk up to a weight-independent term
    Stated as the basis for obtaining the weights in the parameter-transfer framework.

pith-pipeline@v0.9.0 · 5677 in / 1314 out tokens · 41406 ms · 2026-05-19T04:49:01.553335+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages

  1. [1]

    write newline

    " write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in ":" * " " * FUNCTION f...

  2. [2]

    , year 2010

    author Anselin, L. , year 2010 . title Thirty years of spatial econometrics . journal Papers in Regional Science volume 89 , pages 3--26

  3. [3]

    , author Simchi-Levi, D

    author Bastani, H. , author Simchi-Levi, D. , author Zhu, R. , year 2022 . title Meta dynamic pricing: Transfer learning across experiments . journal Management Science volume 68 , pages 1865--1881

  4. [4]

    Risk of transfer learning and its applications in finance,

    author Cao, H. , author Gu, H. , author Guo, X. , author Rosenbaum, M. , year 2023 . title Risk of transfer learning and its applications in finance . note ArXiv:2311.03283

  5. [5]

    , author Yu, D

    author Chen, X. , author Yu, D. , author Zhang, X. , year 2024 . title Optimal weighted random forests . journal Journal of Machine Learning Research volume 25 , pages 1--81

  6. [6]

    , author Ord, J.K

    author Cliff, A.D. , author Ord, J.K. , year 1973 . title Spatial autocorrelation . publisher Pion , address London

  7. [7]

    , et al., year 2014

    author Elhorst, J.P. , et al., year 2014 . title Spatial econometrics: From cross-sectional data to spatial panels . volume volume 479 . publisher Springer

  8. [8]

    , author Yuan, F

    author Fu, J. , author Yuan, F. , author Song, Y. , author Yuan, Z. , author Cheng, M. , author Cheng, S. , author Zhang, J. , author Wang, J. , author Pan, Y. , year 2024 . title Exploring adapter-based transfer learning for recommender systems: Empirical studies and practical insights , in: booktitle Proceedings of the 17th ACM international conference ...

  9. [9]

    , year 2007

    author Hansen, B.E. , year 2007 . title Least squares model averaging . journal Econometrica volume 75 , pages 1175--1189

  10. [10]

    , author Racine, J.S

    author Hansen, B.E. , author Racine, J.S. , year 2012 . title Jackknife model averaging . journal Journal of Econometrics volume 167 , pages 38--46

  11. [11]

    , author Zhang, X

    author Hu, X. , author Zhang, X. , year 2023 . title Optimal parameter-transfer learning by semiparametric model averaging . journal Journal of Machine Learning Research volume 24 , pages 1--53

  12. [12]

    , author Prucha, I.R

    author Kelejian, H.H. , author Prucha, I.R. , year 1998 . title A generalized spatial two-stage least squares procedure for estimating a spatial autoregressive model with autoregressive disturbances . journal The Journal of Real Estate Finance and Economics volume 17 , pages 99--121

  13. [13]

    , author Le Borgne, Y.A

    author Lebichot, B. , author Le Borgne, Y.A. , author He-Guelton, L. , author Obl \'e , F. , author Bontempi, G. , year 2020 . title Deep-learning domain adaptation techniques for credit cards fraud detection , in: booktitle Recent advances in big data and deep learning: proceedings of the INNS big data and deep learning conference, held at Sestri Levante...

  14. [14]

    , year 2004

    author Lee, L.-F. , year 2004 . title Asymptotic distributions of quasi-maximum likelihood estimators for spatial autoregressive models . journal Econometrica volume 72 , pages 1899--1925

  15. [15]

    , author Zhou, T

    author Li, F. , author Zhou, T. , year 2019 . title Effects of urban form on air quality in C hina: An analysis based on the spatial autoregressive model . journal Cities volume 89 , pages 130--140

  16. [16]

    , author Cai, T.T

    author Li, S. , author Cai, T.T. , author Li, H. , year 2022 . title Transfer learning for high-dimensional linear regression: Prediction, estimation and minimax optimality . journal Journal of the Royal Statistical Society Series B: Statistical Methodology volume 84 , pages 149--173

  17. [17]

    , author Kanbur, R

    author Li, Y. , author Kanbur, R. , author Lin, C. , year 2019 . title Minimum wage competition between local governments in C hina . journal The Journal of Development Studies volume 55 , pages 2479--2494

  18. [18]

    , author Zou, G

    author Liao, J. , author Zou, G. , author Gao, Y. , author Zhang, X. , year 2021 . title Model averaging prediction for time series models with a diverging number of parameters . journal Journal of Econometrics volume 223 , pages 190--221

  19. [19]

    , author Lee, L.-F

    author Lin, X. , author Lee, L.-F. , year 2010 . title GMM estimation of spatial autoregressive models with unknown heteroskedasticity . journal Journal of Econometrics volume 157 , pages 34--52

  20. [20]

    , author Okui, R

    author Liu, Q. , author Okui, R. , author Yoshimura, A. , year 2016 . title Generalized least squares model averaging . journal Econometric Reviews volume 35 , pages 1692--1752

  21. [21]

    , author Yoon, S

    author Nguyen, T.T. , author Yoon, S. , year 2019 . title A novel approach to short-term stock price movement prediction using transfer learning . journal Applied Sciences volume 9 , pages 4745

  22. [22]

    , year 1975

    author Ord, K. , year 1975 . title Estimation methods for models of spatial interaction . journal Journal of the American Statistical Association volume 70 , pages 120--126

  23. [23]

    , author Yang, Q

    author Pan, S.J. , author Yang, Q. , year 2009 . title A survey on transfer learning . journal IEEE Transactions on Knowledge and Data Engineering volume 22 , pages 1345--1359

  24. [24]

    , author Wang, W

    author Qiu, Y. , author Wang, W. , author Xie, T. , author Yu, J. , author Zhang, X. , year 2023 . title Boosting store sales through machine learning-informed promotional decisions . journal Available at SSRN 4605803

  25. [25]

    , author Li, Q

    author Racine, J.S. , author Li, Q. , author Yu, D. , author Zheng, L. , year 2023 . title Optimal model averaging of mixed-data kernel-weighted spline regressions . journal Journal of Business & Economic Statistics volume 41 , pages 1251--1261

  26. [26]

    , author Sulistyaningsih, S

    author Saputro, D. , author Sulistyaningsih, S. , author Widyaningsih, P. , year 2021 . title Spatial autoregressive ( SAR ) model with ensemble learning-multiplicative noise with lognormal distribution (case on poverty data in E ast JAVA ) . journal Media Statistika volume 14 , pages 89--97

  27. [27]

    , author Mahomoodally, F

    author Seebaluck-Sandoram, R. , author Mahomoodally, F. , year 2017 . title Management of infectious diseases in A frica , in: booktitle Medicinal spices and vegetables from Africa . publisher Elsevier , pp. pages 133--151

  28. [28]

    , author Feng, Y

    author Tian, Y. , author Feng, Y. , year 2023 . title Transfer learning under high-dimensional generalized linear models . journal Journal of the American Statistical Association volume 118 , pages 2684--2697

  29. [29]

    , author Zhang, X

    author Wan, A.T. , author Zhang, X. , author Zou, G. , year 2010 . title Least squares model averaging by M allows criterion . journal Journal of Econometrics volume 156 , pages 277--283

  30. [30]

    , year 1982

    author White, H. , year 1982 . title Maximum likelihood estimation of misspecified models . journal Econometrica volume 50 , pages 1--25

  31. [31]

    , author Wang, X

    author Wu, D. , author Wang, X. , author Wu, S. , year 2022 . title Jointly modeling transfer learning of industrial chain information and deep learning for stock prediction . journal Expert Systems with Applications volume 191 , pages 116257

  32. [32]

    , author Li, J

    author Yu, H. , author Li, J. , author Bardin, S. , author Gu, H. , author Fan, C. , year 2021 . title Spatiotemporal dynamic of COVID-19 diffusion in C hina: A dynamic spatial autoregressive model analysis . journal ISPRS International Journal of Geo-Information volume 10 , pages 510

  33. [33]

    , author Yao, L

    author Yuan, F. , author Yao, L. , author Benatallah, B. , year 2019 . title DAR ec: D eep domain adaptation for cross-domain recommendation via transferring rating patterns , in: booktitle Proceedings of the 28th International Joint Conference on Artificial Intelligence , publisher AAAI Press . p. pages 4227–4233

  34. [34]

    , author Zhong, W

    author Zeng, H. , author Zhong, W. , author Xu, X. , year 2024 . title Transfer learning for spatial autoregressive models with application to US presidential election prediction . note ArXiv:2405.15600

  35. [35]

    , year 2021

    author Zhang, X. , year 2021 . title A new study on asymptotic optimality of least squares model averaging . journal Econometric Theory volume 37 , pages 388--407

  36. [36]

    , author Liu, C.A

    author Zhang, X. , author Liu, C.A. , year 2023 . title Model averaging prediction by K -fold cross-validation . journal Journal of Econometrics volume 235 , pages 280--301

  37. [37]

    , author Liu, H

    author Zhang, X. , author Liu, H. , author Wei, Y. , author Ma, Y. , year 2024 . title Prediction using many samples with models possibly containing partially shared parameters . journal Journal of Business & Economic Statistics volume 42 , pages 187--196

  38. [38]

    , author Yu, J

    author Zhang, X. , author Yu, J. , year 2018 . title Spatial weights matrix selection and model averaging for spatial autoregressive models . journal Journal of Econometrics volume 203 , pages 1--18

  39. [39]

    , author Zou, G

    author Zhang, X. , author Zou, G. , author Carroll, R.J. , year 2015 . title Model averaging based on K ullback- L eibler distance . journal Statistica Sinica volume 25 , pages 1583--1598

  40. [40]

    , author Zou, G

    author Zhang, X. , author Zou, G. , author Liang, H. , author Carroll, R.J. , year 2020 . title Parsimonious model averaging with a diverging number of parameters . journal Journal of the American Statistical Association volume 115 , pages 972--984

  41. [41]

    , author Wan, A.T

    author Zhu, R. , author Wan, A.T. , author Zhang, X. , author Zou, G. , year 2019 . title A M allows-type model averaging estimator for the varying-coefficient partially linear model . journal Journal of the American Statistical Association volume 114 , pages 882--892

  42. [42]

    , author Zhang, H.H

    author Zou, H. , author Zhang, H.H. , year 2009 . title On the adaptive elastic-net with a diverging number of parameters . journal Annals of Statistics volume 37 , pages 1733--1751