pith. sign in

arxiv: 2606.05258 · v1 · pith:DGF3VOVJnew · submitted 2026-06-03 · 📊 stat.ML · cs.LG· stat.AP

Harnessing Source Heterogeneity for Cluster-Structured Transfer Learning

Pith reviewed 2026-06-28 04:00 UTC · model grok-4.3

classification 📊 stat.ML cs.LGstat.AP
keywords transfer learningcluster-structured learninggeneralized linear modelssource heterogeneitynon-asymptotic boundsmulti-source fusionhospital data analysis
0
0 comments X

The pith

Trans-GLMC recovers latent source clusters via coefficient distances to adapt fusion in generalized linear models and tighten non-asymptotic error bounds when clusters align with the target.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops Trans-GLMC, a transfer learning procedure for generalized linear models that first builds a coefficient-based distance to identify latent clusters among auxiliary sources. It then performs global fusion, within-cluster refinement, and target debiasing to produce an estimator that adapts to the recovered structure. A non-asymptotic error bound is established that improves over unclustered transfer learning whenever a meaningful target cluster exists and otherwise matches the unclustered rate up to constants. The approach is illustrated on a large multi-hospital dataset for suicide risk prediction, where per-facility data is sparse but sources may share group-level risk profiles. In both simulations and the real study, the method improves facility-specific predictions and identifies interpretable communities of mutually transferable hospitals.

Core claim

Trans-GLMC constructs a coefficient-based distance among the target and candidate sources to recover latent source clusters, then combines global fusion, within-cluster refinement, and target debiasing; the resulting estimator satisfies a non-asymptotic error bound that improves over its unclustered counterpart whenever a meaningful target cluster exists and matches the unclustered rate up to constants otherwise.

What carries the argument

The coefficient-based distance among target and candidate sources, which recovers latent source clusters and enables the subsequent adaptive fusion steps.

If this is right

  • The non-asymptotic error bound improves over unclustered transfer learning whenever a meaningful target cluster exists.
  • The bound matches the unclustered rate up to constants when no such cluster is present.
  • In multi-hospital settings with rare events, the procedure improves facility-specific prediction accuracy.
  • It identifies interpretable communities of hospitals that share mutual transferability and recovers clinically coherent risk factors.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same coefficient-distance clustering step could be tested in other multi-site sparse-data problems, such as regional modeling of rare outcomes.
  • If the distance proves stable under moderate misspecification, the adaptive bound may reduce reliance on manual source screening in high-dimensional transfer tasks.
  • Extensions to non-GLM losses or time-varying clusters would be natural next checks on whether the improvement mechanism generalizes.

Load-bearing premise

The coefficient-based distance among target and candidate sources accurately recovers the latent source clusters.

What would settle it

A simulation in which known source clusters exist but the coefficient-based distance fails to recover them, producing error rates that show no improvement over the unclustered baseline.

Figures

Figures reproduced from arXiv: 2606.05258 by Jun Jin, Kun Chen, Robert H. Aseltine, Shane J. Sacco, Xiaohui Yin.

Figure 1
Figure 1. Figure 1: Schematic of Trans-GLMC. Step 1 produces an estimate [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: summarizes the results; full tables, broken down by |A|, are reported in Tables S.1 and S.2 of the Supplement. d2 = 1 d2 = 3 0 5 10 15 20 0 5 10 15 20 6 9 12 Number of informative sources M S E o n β Methods Target−only Trans−GLM Trans−GLM−Q Trans−GLM−IDW Trans−GLM−SPH Trans−GLMC [PITH_FULL_IMAGE:figures/full_fig_p012_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Facility-specific differences for Trans-GLMC relative to Trans-GLM and to the Target [PITH_FULL_IMAGE:figures/full_fig_p015_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Transferability communities in the CHIME study. Panel (a) shows pairwise transferability [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Trans-GLMC coefficients for the top 20 risk factors across 27 facilities. [PITH_FULL_IMAGE:figures/full_fig_p018_5.png] view at source ↗
read the original abstract

Transfer learning is a natural strategy when a target population has limited data but multiple related auxiliary sources are available. A central difficulty is source heterogeneity: auxiliary sources may not be equally useful, and their usefulness may vary in a structured, cluster-like fashion. Existing transfer-learning methods often reduce source selection to a binary informative/non-informative decision, overlooking subgroups of sources with differential transferability. Motivated by a suicide-risk study using data from the Connecticut Hospital Information Management Exchange (CHIME), comprising 636,758 patients across 27 hospitals, we propose Trans-GLMC, a cluster-structured transfer-learning procedure for generalized linear models. The CHIME setting illustrates the core challenge: hospital-specific risk models are unstable because suicide attempts are rare at any single facility, whereas indiscriminate pooling across hospitals can obscure facility-level differences in patient mix and risk profiles. Trans-GLMC first constructs a coefficient-based distance among the target and candidate sources to recover latent source clusters. It then combines global fusion, within-cluster refinement, and target debiasing to produce an estimator that adapts to the detected structure. We establish a non-asymptotic error bound that improves over its unclustered counterpart whenever a meaningful target cluster exists and matches the unclustered rate up to constants otherwise. In simulations and in the CHIME study, Trans-GLMC improves facility-specific prediction, identifies interpretable communities of hospitals with mutual transferability, and recovers clinically coherent suicide-risk factors.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper proposes Trans-GLMC, a cluster-structured transfer learning procedure for generalized linear models. It first uses a coefficient-based distance to recover latent clusters among target and source populations, then applies global fusion, within-cluster refinement, and target debiasing. The central theoretical claim is a non-asymptotic error bound that improves over the unclustered estimator when a meaningful target cluster exists and matches the unclustered rate up to constants otherwise. The method is illustrated on simulations and the CHIME hospital data for suicide-risk prediction.

Significance. If the non-asymptotic bound can be established with a rigorous high-probability guarantee on the initial clustering step, the work would provide a useful adaptive procedure for exploiting structured source heterogeneity in transfer learning for GLMs, particularly in rare-event settings such as hospital-level medical data. The CHIME application demonstrates potential for identifying interpretable groups of transferable sources.

major comments (1)
  1. [Abstract / theoretical analysis] Abstract and theoretical analysis section: The claimed improvement in the non-asymptotic error bound over the unclustered estimator holds only after the coefficient-based distance recovers the latent source clusters. No high-probability guarantee on successful cluster recovery is stated, so the bound's advantage is conditional on an unverified premise about the first algorithmic stage; if recovery fails, the procedure reverts to the unclustered rate without the advertised improvement.
minor comments (2)
  1. [Empirical section] The manuscript should supply explicit details on data-exclusion rules, hyperparameter tuning, and cross-validation procedures used in the CHIME experiments to allow verification of the reported prediction improvements.
  2. [Method description] Notation for the coefficient-based distance and the subsequent fusion steps could be clarified with a short algorithmic pseudocode box.

Simulated Author's Rebuttal

1 responses · 0 unresolved

Thank you for your detailed review. We address the major comment on the theoretical guarantee for cluster recovery below.

read point-by-point responses
  1. Referee: [Abstract / theoretical analysis] Abstract and theoretical analysis section: The claimed improvement in the non-asymptotic error bound over the unclustered estimator holds only after the coefficient-based distance recovers the latent source clusters. No high-probability guarantee on successful cluster recovery is stated, so the bound's advantage is conditional on an unverified premise about the first algorithmic stage; if recovery fails, the procedure reverts to the unclustered rate without the advertised improvement.

    Authors: We appreciate the referee highlighting this point on presentation. The non-asymptotic bound in the main theorem is derived conditional on correct recovery of the latent clusters by the coefficient-based distance. The appendix establishes that, under the minimum separation condition between cluster centers (Assumption 3), the probability of misclustering decays exponentially in the sample size, so the improved rate holds with high probability. To address the concern directly, we will revise the abstract and theoretical analysis section to state this high-probability guarantee explicitly and to present the overall bound as holding with probability at least 1-δ under the stated assumptions. This makes the adaptive improvement rigorous without altering the results. revision: yes

Circularity Check

0 steps flagged

No circularity: bound derived conditionally on clustering success without reducing to fitted inputs by construction

full rationale

The paper's central result is a non-asymptotic error bound for the Trans-GLMC estimator that improves over the unclustered rate precisely when a meaningful target cluster is recovered by the coefficient-based distance step. This is presented as a conditional theoretical guarantee rather than a quantity defined in terms of the fitted parameters themselves. No equations or steps in the abstract reduce the bound to a self-referential fit, and the provided text contains no self-citations that serve as load-bearing premises for the uniqueness or form of the result. The derivation chain therefore remains self-contained against external statistical benchmarks for transfer learning bounds.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Abstract-only review supplies limited detail; the method rests on the domain assumption that sources exhibit recoverable cluster structure via coefficient distances and on standard GLM regularity conditions. No explicit free parameters or invented entities are named in the abstract.

axioms (2)
  • domain assumption Coefficient-based distance recovers latent source clusters
    Central modeling choice stated in the abstract as the first step of Trans-GLMC.
  • standard math Generalized linear model regularity conditions hold for the error bound
    Implicit in any non-asymptotic GLM analysis.

pith-pipeline@v0.9.1-grok · 5802 in / 1424 out tokens · 33989 ms · 2026-06-28T04:00:16.559966+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

43 extracted references · 3 canonical work pages · 1 internal anchor

  1. [1]

    Ahmedani, B. K., G. E. Simon, C. Stewart, A. Beck, B. E. Waitzfelder, R. Rossom, F. Lynch, A. Owen-Smith, E. M. Hunkeler, U. Whiteside, et al. (2014). Health care contacts in the year before suicide death. Journal of General Internal Medicine\/ 29 , 870--877

  2. [2]

    Barak-Corren, Y., V. M. Castro, M. K. Nock, K. D. Mandl, E. M. Madsen, A. Seiger, W. G. Adams, R. J. Applegate, E. V. Bernstam, J. G. Klann, et al. (2020). Validation of an electronic health record--based suicide risk prediction modeling approach across multiple health care systems. JAMA Network Open\/ 3\/ (3), e201262--e201262

  3. [3]

    Bastani, H. (2021). Predicting with proxies: Transfer learning in high dimension. Management Science\/ 67\/ (5), 2964--2984

  4. [4]

    Bemporad, A. (2023). Active learning for regression by inverse distance weighting. Information Sciences\/ 626 , 275--292

  5. [5]

    Benjamini, Y. and Y. Hochberg (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B\/ 57\/ (1), 289--300

  6. [6]

    Bertolote, J. M., A. Fleischmann, D. De Leo, and D. Wasserman (2004). Psychiatric diagnoses and suicide: revisiting the evidence. Crisis\/ 25\/ (4), 147--155

  7. [7]

    Cai, T. T. and H. Wei (2021). Transfer learning for nonparametric classification: Minimax rate and adaptive classifier. The Annals of Statistics\/ 49\/ (1), 100--128

  8. [8]

    Clauset, A., M. E. J. Newman, and C. Moore (2004). Finding community structure in very large networks. Physical Review E\/ 70\/ (6), 066111

  9. [9]

    Cressie, N. (2015). Statistics for spatial data . John Wiley & Sons

  10. [10]

    Doshi, R. P., K. Chen, F. Wang, H. Schwartz, A. Herzog, and R. H. Aseltine Jr (2020). Identifying risk factors for mortality among patients previously hospitalized for a suicide attempt. Scientific Reports\/ 10\/ (1), 15223

  11. [11]

    Franklin, J. C., J. D. Ribeiro, K. R. Fox, K. H. Bentley, E. M. Kleiman, X. Huang, K. M. Musacchio, A. C. Jaroszewski, B. P. Chang, and M. K. Nock (2017). Risk factors for suicidal thoughts and behaviors: A meta-analysis of 50 years of research. Psychological Bulletin\/ 143\/ (2), 187--232

  12. [12]

    Harris, E. C. and B. Barraclough (1997). Suicide as an outcome for mental disorders: a meta-analysis. British Journal of Psychiatry\/ 170\/ (3), 205--228

  13. [13]

    Ilgen, M. A., K. R. Conner, K. M. Roeder, F. C. Blow, K. Austin, and M. Valenstein (2012). Patterns of treatment utilization before suicide among male veterans with substance use disorders. American Journal of Public Health\/ 102\/ (S1), S88--S92

  14. [14]

    Jin, J., J. Yan, R. H. Aseltine, and K. Chen (2024). Transfer learning with large-scale quantile regression. Technometrics\/ , 1--30

  15. [15]

    Kessler, R. C., M. S. Bauer, T. M. Bishop, O. V. Demler, S. K. Dobscha, S. M. Gildea, J. L. Goulet, E. Karras, J. Kreyenbuhl, S. J. Landes, et al. (2020). Using administrative data to predict suicide after psychiatric hospitalization in the veterans health administration system. Frontiers in Psychiatry\/ 11 , 390

  16. [16]

    Kessler, R. C., G. Borges, and E. E. Walters (1999). Prevalence of and risk factors for lifetime suicide attempts in the national comorbidity survey. Archives of General Psychiatry\/ 56\/ (7), 617--626

  17. [17]

    Labouliere, C. D., P. Vasan, A. Kramer, G. Brown, K. Green, M. Rahman, J. Kammer, M. Finnerty, and B. Stanley (2018). ``Zero Suicide''--a model for reducing suicide in United States behavioral healthcare. Suicidologi\/ 23\/ (1), 22

  18. [18]

    Li, M., Y. Tian, Y. Feng, and Y. Yu (2024). Federated transfer learning with differential privacy. arXiv preprint arXiv:2403.11343\/

  19. [19]

    Cai, and R

    Li, S., T. Cai, and R. Duan (2023). Targeting underrepresented populations in precision medicine: A federated transfer learning approach. The Annals of Applied Statistics\/ 17\/ (4), 2970--2992

  20. [20]

    Li, S., T. T. Cai, and H. Li (2022). Transfer learning for high-dimensional linear regression: Prediction, estimation and minimax optimality. Journal of the Royal Statistical Society Series B\/ 84\/ (1), 149--173

  21. [21]

    Li, S., T. T. Cai, and H. Li (2023). Transfer learning in large-scale Gaussian graphical models with false discovery rate control. Journal of the American Statistical Association\/ 118\/ (543), 2171--2183

  22. [22]

    Loh, P.-L. and M. J. Wainwright (2013). Regularized m-estimators with nonconvexity: Statistical and algorithmic theory for local optima. In Advances in Neural Information Processing Systems 26 , pp.\ 476--484

  23. [23]

    Luoma, J. B., C. E. Martin, and J. L. Pearson (2002). Contact with mental health and primary care providers before suicide: a review of the evidence. American Journal of Psychiatry\/ 159\/ (6), 909--916

  24. [24]

    McCarter, C. (2023). Inverse distance weighting attention. arXiv preprint arXiv:2310.18805\/

  25. [25]

    Negahban, S., B. Yu, M. J. Wainwright, and P. Ravikumar (2009). A unified framework for high-dimensional analysis of m -estimators with decomposable regularizers. Advances in Neural Information Processing Systems\/ 22

  26. [26]

    Nock, M. K., G. Borges, E. J. Bromet, C. B. Cha, R. C. Kessler, and S. Lee (2008). Suicide and suicidal behavior. Epidemiologic Reviews\/ 30\/ (1), 133

  27. [27]

    Reeve, H. W. J., T. I. Cannings, and R. J. Samworth (2021). Adaptive transfer learning. The Annals of Statistics\/ 49\/ (6), 3618--3649

  28. [28]

    Ribeiro, J. D., J. C. Franklin, K. R. Fox, K. H. Bentley, E. M. Kleiman, B. P. Chang, and M. K. Nock (2016). Self-injurious thoughts and behaviors as risk factors for future suicide ideation, attempts, and death: a meta-analysis of longitudinal studies. Psychological Medicine\/ 46\/ (2), 225--236

  29. [29]

    Rosenstein, M. T., Z. Marx, L. P. Kaelbling, and T. G. Dietterich (2005). To transfer or not to transfer. In NIPS 2005 workshop on transfer learning , Volume 898

  30. [30]

    Sacco, S. J., K. Chen, F. Wang, and R. Aseltine (2023). Target-based fusion using social determinants of health to enhance suicide prediction with electronic health records. PLOS ONE\/ 18\/ (4), e0283595

  31. [31]

    Shepard, D. (1968). A two-dimensional interpolation function for irregularly-spaced data. In Proceedings of the 1968 23rd ACM national conference , pp.\ 517--524

  32. [32]

    Simon, G. E., E. Johnson, J. M. Lawrence, R. C. Rossom, B. Ahmedani, F. L. Lynch, A. Beck, B. Waitzfelder, R. Ziebell, R. B. Penfold, et al. (2018). Predicting suicide attempts and suicide deaths following outpatient visits using electronic health records. American Journal of Psychiatry\/ 175\/ (10), 951--960

  33. [33]

    Stone, D. M. (2018). Vital signs: trends in state suicide rates--United States, 1999--2016 and circumstances contributing to suicide--27 states, 2015. MMWR: Morbidity and Mortality Weekly Report\/ 67

  34. [34]

    Aseltine, R

    Su, C., R. Aseltine, R. Doshi, K. Chen, S. C. Rogers, and F. Wang (2020). Machine learning for suicide risk prediction in children and adolescents with electronic health records. Translational Psychiatry\/ 10\/ (1), 413

  35. [35]

    Suk, H.-I. and D. Shen (2014). Clustering-induced multi-task learning for AD/MCI classification. In Medical Image Computing and Computer-Assisted Intervention--MICCAI 2014: 17th International Conference, Boston, MA, USA, September 14-18, 2014, Proceedings, Part III 17 , pp.\ 393--400

  36. [36]

    Suresh, H., J. J. Gong, and J. V. Guttag (2018). Learning tasks for multitask learning: Heterogenous patient populations in the ICU. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining , pp.\ 802--810

  37. [37]

    Tian, Y. and Y. Feng (2023). Transfer learning under high-dimensional generalized linear models. Journal of the American Statistical Association\/ 118\/ (544), 2684--2697

  38. [38]

    Weng, and Y

    Tian, Y., H. Weng, and Y. Feng (2023). Unsupervised multi-task and transfer learning on Gaussian mixture models. arXiv preprint arXiv:2209.15224\/

  39. [39]

    Walsh, C. G., K. B. Johnson, M. Ripperger, S. Sperry, J. Harris, N. Clark, E. Fielstein, L. Novak, K. Robinson, and W. W. Stead (2021). Prospective validation of an electronic health record--based, real-time suicide risk model. JAMA Network Open\/ 4\/ (3), e211428--e211428

  40. [40]

    Li, and J

    Wang, W., Y. Li, and J. Yan (2021). Touch: Tools of utilization and cost in healthcare. R package version 2019. Available online at https://CRAN.R-project.org/package=touch. Accessed on 2022-07-08

  41. [41]

    Wilimitis, D., R. W. Turer, M. Ripperger, A. B. McCoy, S. H. Sperry, E. M. Fielstein, T. Kurz, and C. G. Walsh (2022). Integration of face-to-face screening with real-time machine learning to predict risk of suicide among adults. JAMA Network Open\/ 5\/ (5), e2212095--e2212095

  42. [42]

    Xu, W., C. Su, Y. Li, S. Rogers, F. Wang, K. Chen, and R. Aseltine (2022). Improving suicide risk prediction via targeted data fusion: proof of concept using medical claims data. Journal of the American Medical Informatics Association\/ 29\/ (3), 500--511

  43. [43]

    Zang, C., Y. Hou, D. Lyu, J. Jin, S. Sacco, K. Chen, R. Aseltine, and F. Wang (2024). Accuracy and transportability of machine learning models for adolescent suicide prediction with longitudinal clinical records. Translational Psychiatry\/ 14\/ (1), 316