Semi-Parametric Bayesian Additive Regression Trees for Risk Prediction with High-Dimensional Epigenetic Signatures and Low-Dimensional Covariates

Brian C.-H. Chiu; Parveen Bhatti; Saurabh Bhandari; Yuan Ji

arxiv: 2605.20143 · v1 · pith:USMGG3SMnew · submitted 2026-05-19 · 📊 stat.AP · stat.CO· stat.ML

Semi-Parametric Bayesian Additive Regression Trees for Risk Prediction with High-Dimensional Epigenetic Signatures and Low-Dimensional Covariates

Saurabh Bhandari , Brian C.-H. Chiu , Parveen Bhatti , Yuan Ji This is my paper

Pith reviewed 2026-05-20 03:04 UTC · model grok-4.3

classification 📊 stat.AP stat.COstat.ML

keywords semi-parametric BARTBayesian additive regression treesepigenetic signaturesvariable selectionrisk predictionmultiple myelomahigh-dimensional data5-hydroxymethylcytosine

0 comments

The pith

A semi-parametric BART model places low-dimensional covariates in a parametric component with interpretable coefficients while modeling high-dimensional epigenetic predictors through the tree ensemble.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces spBART to overcome a limitation in standard BART models. Standard BART treats every predictor the same inside the tree ensemble, which hides the separate contributions of a few important covariates and makes variable selection difficult when most predictors are high-dimensional. spBART therefore adds an explicit parametric regression term for the low-dimensional covariates so that their coefficients remain directly interpretable. The high-dimensional epigenetic signatures continue to be handled by the usual BART tree sum, which captures nonlinear effects and interactions. The authors also supply a cross-validation routine that pools posterior inclusion probabilities across folds and applies Bayesian false-discovery-rate control to produce a stable, parsimonious set of selected loci. When fitted to pooled 5-hydroxymethylcytosine profiles from two multiple-myeloma studies, the model reaches an AUC of 0.96 on held-out data while returning only a small number of candidate loci.

Core claim

spBART augments the standard Bayesian additive regression tree ensemble with a parametric linear component for low-dimensional covariates. This separation yields directly interpretable regression coefficients for the covariates while the tree ensemble retains its flexibility for complex, nonlinear associations among the high-dimensional epigenetic predictors. A cross-validation procedure aggregates posterior inclusion probabilities across folds and imposes Bayesian false-discovery-rate control to perform stable variable selection. Applied to genome-wide 5-hydroxymethylcytosine profiles from 869 participants in two multiple-myeloma case-control studies, the model identifies a parsimonious set

What carries the argument

The semi-parametric BART (spBART) model, which augments the nonparametric BART tree ensemble with a separate parametric regression component for low-dimensional covariates.

If this is right

Interpretable coefficients are obtained for the effects of low-dimensional covariates such as clinical or demographic factors.
Stable variable selection remains feasible in high-dimensional epigenetic settings despite complex dependence among predictors.
Strong out-of-sample discrimination (AUC 0.96) is achieved in held-out data for multiple-myeloma risk prediction.
A unified modeling framework combines covariate adjustment with flexible tree-based prediction and controlled variable selection.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The parametric-nonparametric split may simplify adjustment for known confounders when the low-dimensional covariates include clinical variables that precede the epigenetic measurements.
The same separation could be tested on other paired data types such as gene-expression profiles together with basic demographic covariates.
Changing the number of cross-validation folds or the exact Bayesian FDR threshold would provide a direct check on the robustness of the selected loci.

Load-bearing premise

The cross-validation procedure that aggregates posterior inclusion probabilities across folds and applies Bayesian false-discovery-rate control produces stable variable selection even when the high-dimensional epigenetic predictors are dependent and interact with the parametric component.

What would settle it

Re-running the analysis on an independent set of comparable size and observing whether the AUC falls substantially below 0.96 or the selected loci fail to overlap with those reported in the original validation set.

Figures

Figures reproduced from arXiv: 2605.20143 by Brian C.-H. Chiu, Parveen Bhatti, Saurabh Bhandari, Yuan Ji.

read the original abstract

In the era of precision medicine, genome-wide epigenetic modifications offer rich data that could inform risk prediction. However, these data are high-dimensional and exhibit complex dependence structures, which makes it difficult to jointly model them with low-dimensional covariates when the goal is to obtain interpretable effect estimates for covariate adjustment. Standard Bayesian additive regression trees (BART) provide strong predictive performance but treat all predictors uniformly within the tree ensemble, obscuring the contributions of significant covariates and complicating variable selection in high-dimensional settings. We propose a semi-parametric BART model (spBART) that addresses this limitation by modeling low-dimensional covariates through a parametric component with interpretable coefficients, while capturing complex nonlinear associations among high-dimensional predictors through the tree ensemble. To perform stable variable selection, we develop a cross-validation-based procedure that aggregates posterior inclusion probabilities across folds and applies Bayesian false discovery rate control. We apply the proposed method to a pooled case--control analysis of high-dimensional genome-wide 5-hydroxymethylcytosine profiles derived from circulating cell-free DNA in two multiple myeloma studies ($N = 869$). The approach identifies a parsimonious set of candidate loci and achieves strong out-of-sample discrimination (AUC $= 0.96$) in a held-out validation set. Overall, spBART provides a unified framework for combining interpretable covariate inference with flexible modeling and variable selection in high-dimensional biomedical studies.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

spBART gives a workable split between parametric covariates and tree-based high-dim modeling, with a CV aggregation step for selection, but dependence in the epigenetic data could still affect how stable the selected loci turn out to be.

read the letter

The new piece is the semi-parametric BART that keeps a parametric term for the low-dimensional covariates so you get direct coefficients, while the trees handle the nonlinear patterns in the high-dimensional 5hmC signatures. They add a cross-validation procedure that pools posterior inclusion probabilities across folds and then applies Bayesian FDR to pick a smaller set of loci. That combination is not standard in the BART literature and fits the mixed-data setting they describe. On the application side, the pooled analysis of the two multiple myeloma studies with N=869 produces a held-out AUC of 0.96 and a parsimonious list of candidate sites, which shows the method can deliver usable prediction while keeping some interpretability for the clinical covariates. The real-data example is concrete and the performance number is reported on a proper validation split. The soft spot is the variable-selection step. Epigenetic profiles carry local and global correlations from chromatin structure and technical effects, and it is not obvious from the description whether the CV aggregation and FDR control fully protect against that dependence or against leakage between the parametric and tree components. If the paper does not include explicit checks for selection stability under those correlations or sensitivity to hyperparameter choices, the claimed parsimonious set could be less reliable than it appears. The abstract also leaves out fitting details and robustness checks, which makes it harder to judge how much the 0.96 AUC depends on specific decisions. This is aimed at statisticians and bioinformaticians who need to combine low-dimensional clinical variables with high-dimensional biomarker panels for risk models. A reader working on similar mixed-dimensional problems in precision medicine would get a practical template. It is solid enough on the core idea and the data example to go to a serious referee, even if revisions will focus on the selection diagnostics and dependence handling.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a semi-parametric Bayesian additive regression trees (spBART) model that integrates a parametric component for low-dimensional covariates to obtain interpretable coefficients with a nonparametric tree ensemble for capturing complex associations in high-dimensional epigenetic predictors. It introduces a cross-validation procedure that aggregates posterior inclusion probabilities (PIPs) across folds and applies Bayesian false discovery rate (BFDR) control for variable selection. In an application to pooled case-control data from two multiple myeloma studies involving genome-wide 5-hydroxymethylcytosine (5hmC) profiles in circulating cell-free DNA (N = 869), the method identifies a parsimonious set of candidate loci and reports an area under the curve (AUC) of 0.96 on a held-out validation set.

Significance. If the variable selection procedure is robust to the dependence structures inherent in epigenetic data, spBART could provide a useful framework for risk prediction in precision medicine by balancing interpretability of covariate effects with flexible modeling of high-dimensional predictors. The reported high out-of-sample AUC suggests strong predictive performance, and the focus on parsimony aids in identifying biologically relevant loci. However, the significance hinges on validation of the selection stability, which is not fully detailed.

major comments (2)

[§3] §3 (variable selection procedure): The cross-validation aggregation of posterior inclusion probabilities followed by Bayesian FDR control implicitly assumes exchangeability or independence of predictors across folds. Epigenetic data exhibit substantial local and global correlations from chromatin domains, co-regulation, and batch effects; when this assumption fails, the selected loci may be unstable or exhibit inflated false discovery rates. This directly undermines the central claim of a reliable 'parsimonious set of candidate loci' and requires either a simulation study under realistic correlation structures or explicit sensitivity checks to support the reported results.
[§2] §2 (model specification): The description of spBART does not detail how the parametric component for low-dimensional covariates interacts with the tree ensemble during fitting or selection. Unmodeled leakage of covariate effects into the nonparametric component could compromise both the interpretability of the parametric coefficients and the stability of high-dimensional variable selection, which is load-bearing for the claimed separation of roles and the AUC = 0.96 result.

minor comments (2)

[Abstract] The abstract reports AUC = 0.96 on held-out data but provides no information on the exact splitting procedure, whether case-control sampling was accounted for in the AUC calculation, or any calibration checks.
[Application] In the application section, clarify the pooling of the two multiple myeloma studies and whether batch correction or normalization was applied to the 5hmC profiles prior to modeling, as technical artifacts could affect both selection and prediction.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and insightful comments. We have addressed each major comment point by point below, providing clarifications and indicating revisions made to strengthen the manuscript.

read point-by-point responses

Referee: §3 (variable selection procedure): The cross-validation aggregation of posterior inclusion probabilities followed by Bayesian FDR control implicitly assumes exchangeability or independence of predictors across folds. Epigenetic data exhibit substantial local and global correlations from chromatin domains, co-regulation, and batch effects; when this assumption fails, the selected loci may be unstable or exhibit inflated false discovery rates. This directly undermines the central claim of a reliable 'parsimonious set of candidate loci' and requires either a simulation study under realistic correlation structures or explicit sensitivity checks to support the reported results.

Authors: We thank the referee for this important observation on dependence structures in epigenetic data. Our cross-validation aggregation of PIPs was developed specifically to improve selection stability across data partitions, which can partially buffer against correlation-induced variability. However, we agree that explicit checks under realistic correlation patterns would better support the robustness claims. In the revised manuscript, we have added a dedicated simulation study that generates synthetic data with correlation structures calibrated to epigenetic profiles (local chromatin-domain correlations, co-regulation blocks, and batch effects). Results confirm that the aggregated PIP + BFDR procedure maintains nominal FDR control and produces stable locus selections under moderate-to-high dependence. We have also included sensitivity analyses on the real multiple myeloma data examining selection stability across varying fold counts and correlation-adjusted priors. revision: yes
Referee: §2 (model specification): The description of spBART does not detail how the parametric component for low-dimensional covariates interacts with the tree ensemble during fitting or selection. Unmodeled leakage of covariate effects into the nonparametric component could compromise both the interpretability of the parametric coefficients and the stability of high-dimensional variable selection, which is load-bearing for the claimed separation of roles and the AUC = 0.96 result.

Authors: We agree that the original exposition of the fitting procedure was insufficiently detailed. The spBART model is defined as Y = Xβ + f(Z) + ε, with β estimated parametrically and f implemented via BART on the high-dimensional epigenetic predictors Z. In the MCMC sampler, the tree ensemble is updated on the residuals after subtracting the current parametric fit, while β is drawn from its full conditional given the current tree predictions; this alternating scheme explicitly prevents leakage of covariate effects into the nonparametric component. Variable selection (aggregated PIPs and BFDR) is performed exclusively on the predictors entering the tree ensemble. We have substantially expanded Section 2 with the complete set of full-conditional distributions, pseudocode for the sampler, and a diagram illustrating the separation of roles. These additions directly support the interpretability of the parametric coefficients and the stability of high-dimensional selection underlying the reported AUC. revision: yes

Circularity Check

0 steps flagged

No significant circularity in spBART proposal or variable selection

full rationale

The paper proposes a new semi-parametric BART extension (spBART) that separates parametric modeling of low-dimensional covariates from tree-ensemble modeling of high-dimensional epigenetic predictors, then applies an independent cross-validation procedure to aggregate posterior inclusion probabilities and control Bayesian FDR for variable selection. The reported AUC of 0.96 is obtained on a held-out validation set after model fitting, providing external evaluation rather than any in-sample reduction. No equations or procedures in the abstract or described methodology equate a claimed prediction or result to a fitted input by construction, and no self-citations are invoked as load-bearing uniqueness theorems. The central claims rest on the model structure and out-of-sample performance rather than tautological re-use of fitted quantities.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review performed on abstract only; specific free parameters, axioms, and invented entities cannot be audited in detail. The approach rests on standard BART tree priors plus the added assumption that low-dimensional covariates can be isolated in a parametric component without loss of important interactions.

axioms (1)

domain assumption Low-dimensional covariates exert effects that are adequately captured by a parametric (interpretable-coefficient) component while high-dimensional epigenetic predictors require nonparametric tree modeling.
This separation is the core modeling choice stated in the abstract.

pith-pipeline@v0.9.0 · 5799 in / 1484 out tokens · 61905 ms · 2026-05-20T03:04:25.886867+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose a semi-parametric BART model (spBART) that addresses this limitation by modeling low-dimensional covariates through a parametric component with interpretable coefficients, while capturing complex nonlinear associations among high-dimensional predictors through the tree ensemble... cross-validation-based procedure that aggregates posterior inclusion probabilities across folds and applies Bayesian false discovery rate control.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The approach identifies a parsimonious set of candidate loci and achieves strong out-of-sample discrimination (AUC = 0.96)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

87 extracted references · 87 canonical work pages · 1 internal anchor

[1]

Clinical Pharmacology & Therapeutics , volume=

Beyond randomized clinical trials: use of external controls , author=. Clinical Pharmacology & Therapeutics , volume=. 2020 , publisher=

work page 2020
[2]

Clinical Trials , volume=

Summarizing historical information on controls in clinical trials , author=. Clinical Trials , volume=. 2010 , publisher=

work page 2010
[3]

Statistics in Medicine , volume=

Bayesian semiparametric meta-analytic-predictive prior for historical control borrowing in clinical trials , author=. Statistics in Medicine , volume=. 2021 , publisher=

work page 2021
[4]

Biometrics , volume=

A causal inference framework for leveraging external controls in hybrid trials , author=. Biometrics , volume=. 2024 , publisher=

work page 2024
[5]

Biometrics , volume=

Improving efficiency of inference in clinical trials with external control data , author=. Biometrics , volume=. 2023 , publisher=

work page 2023
[6]

arXiv preprint arXiv:2310.20087 , year=

PAM-HC: A Bayesian Nonparametric Construction of Hybrid Control for Randomized Clinical Trials Using External Data , author=. arXiv preprint arXiv:2310.20087 , year=

work page arXiv
[7]

Statistics in Medicine , volume=

Incorporating external data into the analysis of clinical trials via Bayesian additive regression trees , author=. Statistics in Medicine , volume=. 2021 , publisher=

work page 2021
[8]

The lancet oncology , volume=

International Myeloma Working Group consensus criteria for response and minimal residual disease assessment in multiple myeloma , author=. The lancet oncology , volume=. 2016 , publisher=

work page 2016
[9]

Blood, The Journal of the American Society of Hematology , volume=

Whole-genome sequencing of multiple myeloma from diagnosis to plasma cell leukemia reveals genomic initiating events, evolution, and clonal tides , author=. Blood, The Journal of the American Society of Hematology , volume=. 2012 , publisher=

work page 2012
[10]

Blood, The Journal of the American Society of Hematology , volume=

Clonal competition with alternating dominance in multiple myeloma , author=. Blood, The Journal of the American Society of Hematology , volume=. 2012 , publisher=

work page 2012
[11]

European journal of haematology , volume=

Dynamics of tumor-specific cfDNA in response to therapy in multiple myeloma patients , author=. European journal of haematology , volume=. 2020 , publisher=

work page 2020
[12]

Nature Reviews Genetics , volume=

DNA methylation profiling in the clinic: applications and challenges , author=. Nature Reviews Genetics , volume=. 2012 , publisher=

work page 2012
[13]

Nature Reviews Cancer , volume=

Cell-free nucleic acids as biomarkers in cancer patients , author=. Nature Reviews Cancer , volume=. 2011 , publisher=

work page 2011
[14]

New England Journal of Medicine , volume=

Application of cell-free DNA analysis to cancer treatment , author=. New England Journal of Medicine , volume=. 2018 , publisher=

work page 2018
[15]

arXiv preprint arXiv:2404.07923 , year=

A Bayesian Estimator of Sample Size , author=. arXiv preprint arXiv:2404.07923 , year=

work page arXiv
[16]

Journal of Big data , volume=

A survey of transfer learning , author=. Journal of Big data , volume=. 2016 , publisher=

work page 2016
[17]

Handbook of research on machine learning applications and trends: algorithms, methods, and techniques , pages=

Transfer learning , author=. Handbook of research on machine learning applications and trends: algorithms, methods, and techniques , pages=. 2010 , publisher=

work page 2010
[18]

Proceedings of the IEEE , volume=

A comprehensive survey on transfer learning , author=. Proceedings of the IEEE , volume=. 2020 , publisher=

work page 2020
[19]

Journal of Big Data , volume=

Transfer learning: a friendly introduction , author=. Journal of Big Data , volume=. 2022 , publisher=

work page 2022
[20]

Learning , volume=

Transfer learning , author=. Learning , volume=

work page
[21]

IEEE Transactions on knowledge and data engineering , volume=

A survey on transfer learning , author=. IEEE Transactions on knowledge and data engineering , volume=. 2009 , publisher=

work page 2009
[22]

arXiv preprint arXiv:2312.13484 , year=

Bayesian transfer learning , author=. arXiv preprint arXiv:2312.13484 , year=

work page arXiv
[23]

IEEE Transactions on Signal Processing , volume=

Optimal Bayesian transfer learning , author=. IEEE Transactions on Signal Processing , volume=. 2018 , publisher=

work page 2018
[24]

IEEE Signal Processing Letters , volume=

Optimal Bayesian transfer regression , author=. IEEE Signal Processing Letters , volume=. 2018 , publisher=

work page 2018
[25]

arXiv preprint arXiv:2109.13233 , year=

Bayesian transfer learning: An overview of probabilistic graphical models for transfer learning , author=. arXiv preprint arXiv:2109.13233 , year=

work page arXiv
[26]

Advances in Neural Information Processing Systems , volume=

Pre-train your loss: Easy bayesian transfer learning with informative priors , author=. Advances in Neural Information Processing Systems , volume=

work page
[27]

IEEE Transactions on Knowledge and Data Engineering , volume=

Transfer learning for dynamic feature extraction using variational Bayesian inference , author=. IEEE Transactions on Knowledge and Data Engineering , volume=. 2021 , publisher=

work page 2021
[28]

International conference on artificial intelligence and statistics , pages=

Transfer learning with gaussian processes for bayesian optimization , author=. International conference on artificial intelligence and statistics , pages=. 2022 , organization=

work page 2022
[29]

IEEE Transactions on Geoscience and Remote Sensing , volume=

Bayesian transfer learning for object detection in optical remote sensing images , author=. IEEE Transactions on Geoscience and Remote Sensing , volume=. 2020 , publisher=

work page 2020
[30]

Quality Engineering , volume=

Statistical transfer learning: A review and some extensions to statistical process control , author=. Quality Engineering , volume=. 2018 , publisher=

work page 2018
[31]

proceedings of the AAAI Conference on Artificial Intelligence , volume=

Adaptive transfer learning , author=. proceedings of the AAAI Conference on Artificial Intelligence , volume=

work page
[32]

The Annals of Statistics , volume=

Adaptive transfer learning , author=. The Annals of Statistics , volume=. 2021 , publisher=

work page 2021
[33]

Biometrika , volume=

Sparse Bayesian infinite factor models , author=. Biometrika , volume=. 2011 , publisher=

work page 2011
[34]

Variable selection for

Bleich, Justin and Kapelner, Adam and George, Edward I and Jensen, Shane T , journal=. Variable selection for. 2014 , publisher=

work page 2014
[35]

Variable selection using

Luo, Chuji and Daniels, Michael J , journal=. Variable selection using. 2024 , publisher=

work page 2024
[36]

Bayesian additive regression trees and the General

Tan, Yaoyuan Vincent and Roy, Jason , journal=. Bayesian additive regression trees and the General. 2019 , publisher=

work page 2019
[37]

2010 , publisher=

Chipman, Hugh A and George, Edward I and McCulloch, Robert E , journal=. 2010 , publisher=

work page 2010
[38]

Journal of Statistical Software , volume=

bartMachine: Machine learning with Bayesian additive regression trees , author=. Journal of Statistical Software , volume=

work page
[39]

Annual Review of Statistics and Its Application , volume=

Bayesian additive regression trees: A review and look forward , author=. Annual Review of Statistics and Its Application , volume=. 2020 , publisher=

work page 2020
[40]

Journal of the American Statistical Association , volume=

Bayesian regression trees for high-dimensional prediction and variable selection , author=. Journal of the American Statistical Association , volume=. 2018 , publisher=

work page 2018
[41]

The annals of applied statistics , volume=

A semiparametric modeling approach using Bayesian additive regression trees with an application to evaluate heterogeneous treatment effects , author=. The annals of applied statistics , volume=

work page
[42]

The Annals of Applied Statistics , volume=

Accounting for shared covariates in semiparametric Bayesian additive regression trees , author=. The Annals of Applied Statistics , volume=. 2025 , publisher=

work page 2025
[43]

Journal of Computational and Graphical Statistics , volume=

A product partition model with regression on covariates , author=. Journal of Computational and Graphical Statistics , volume=. 2011 , publisher=

work page 2011
[44]

Johns Hopkins University, Dept

FDR and Bayesian multiple comparisons rules , author=. Johns Hopkins University, Dept. of Biostatistics Working Papers , volume=. 2006 , publisher=

work page 2006
[45]

Journal of the Royal statistical society: series B (Methodological) , volume=

Controlling the false discovery rate: a practical and powerful approach to multiple testing , author=. Journal of the Royal statistical society: series B (Methodological) , volume=. 1995 , publisher=

work page 1995
[46]

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining , pages=

Regression-based latent factor models , author=. Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining , pages=

work page
[47]

International Workshop on Artificial Intelligence and Statistics , pages=

Semiparametric latent factor models , author=. International Workshop on Artificial Intelligence and Statistics , pages=. 2005 , organization=

work page 2005
[48]

Journal of the National Cancer Institute Monographs , volume=

Etiologic heterogeneity among non-Hodgkin lymphoma subtypes: the InterLymph non-Hodgkin lymphoma subtypes project , author=. Journal of the National Cancer Institute Monographs , volume=. 2014 , publisher=

work page 2014
[49]

Journal of the National Cancer Institute Monographs , volume=

Medical history, lifestyle, family history, and occupational risk factors for follicular lymphoma: the InterLymph Non-Hodgkin Lymphoma Subtypes Project , author=. Journal of the National Cancer Institute Monographs , volume=. 2014 , publisher=

work page 2014
[50]

NPJ genomic medicine , volume=

Alterations of 5-hydroxymethylation in circulating cell-free DNA reflect molecular distinctions of subtypes of non-Hodgkin lymphoma , author=. NPJ genomic medicine , volume=. 2021 , publisher=

work page 2021
[51]

Journal of Hematology & Oncology , volume=

Genome-wide profiling of 5-hydroxymethylcytosines in circulating cell-free DNA reveals population-specific pathways in the development of multiple myeloma , author=. Journal of Hematology & Oncology , volume=. 2022 , publisher=

work page 2022
[52]

The Journal of Machine Learning Research , volume=

A widely applicable Bayesian information criterion , author=. The Journal of Machine Learning Research , volume=. 2013 , publisher=

work page 2013
[53]

Bayesian Cross Validation and WAIC for Predictive Prior Design in Regular Asymptotic Theory

Bayesian cross validation and WAIC for predictive prior design in regular asymptotic theory , author=. arXiv preprint arXiv:1503.07970 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[54]

Proceedings of the Workshop on Information Theoretic Methods in Science and Engineering , pages=

WAIC and WBIC are information criteria for singular statistical model evaluation , author=. Proceedings of the Workshop on Information Theoretic Methods in Science and Engineering , pages=

work page
[55]

Statistica Sinica , pages=

Assessing the treatment effect heterogeneity with a latent variable , author=. Statistica Sinica , pages=. 2018 , publisher=

work page 2018
[56]

American Journal of Epidemiology , volume=

Assessing heterogeneity of treatment effects in observational studies , author=. American Journal of Epidemiology , volume=. 2021 , publisher=

work page 2021
[57]

arXiv preprint arXiv:2509.05775 , year=

Causal Clustering for Conditional Average Treatment Effects Estimation and Subgroup Discovery , author=. arXiv preprint arXiv:2509.05775 , year=

work page arXiv
[58]

Annals of Statistics , pages=

Convergence rates of posterior distributions , author=. Annals of Statistics , pages=. 2000 , publisher=

work page 2000
[59]

Genome biology , volume=

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , author=. Genome biology , volume=. 2014 , publisher=

work page 2014
[60]

CRAN R Repositary , volume=

Package ‘glmnet’ , author=. CRAN R Repositary , volume=

work page
[61]

Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

Regularization and variable selection via the elastic net , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2005 , publisher=

work page 2005
[62]

The annals of statistics , pages=

The bayesian bootstrap , author=. The annals of statistics , pages=. 1981 , publisher=

work page 1981
[63]

Journal of the American statistical Association , volume=

Bayesian analysis of binary and polychotomous response data , author=. Journal of the American statistical Association , volume=. 1993 , publisher=

work page 1993
[64]

mclust 5: Clustering, classification and density estimation using

Scrucca, Luca and Fop, Michael and Murphy, T Brendan and Raftery, Adrian E , journal=. mclust 5: Clustering, classification and density estimation using

work page
[65]

Robin, Xavier and Turck, Natacha and Hainard, Alexandre and Tiberti, Natalia and Lisacek, Fr. p. BMC Bioinformatics , volume=. 2011 , publisher=

work page 2011
[66]

Leukemia , volume=

Contrast in cytokine expression between patients with monoclonal gammopathy of undetermined significance or multiple myeloma , author=. Leukemia , volume=. 1998 , publisher=

work page 1998
[67]

Mayo Clinic Proceedings , volume=

Induction of a chronic disease state in patients with smoldering or indolent multiple myeloma by targeting interleukin 1 -induced interleukin 6 production and the myeloma proliferative component , author=. Mayo Clinic Proceedings , volume=. 2009 , organization=

work page 2009
[68]

Oncotarget , volume=

FGF23 is elevated in multiple myeloma and increases heparanase expression by tumor cells , author=. Oncotarget , volume=

work page
[69]

Proceedings of the National Academy of Sciences , volume=

Activating KRAS, NRAS, and BRAF mutants enhance proteasome capacity and reduce endoplasmic reticulum stress in multiple myeloma , author=. Proceedings of the National Academy of Sciences , volume=. 2020 , publisher=

work page 2020
[70]

elife , volume=

Paradoxical resistance of multiple myeloma to proteasome inhibitors by decreased levels of 19S proteasomal subunits , author=. elife , volume=. 2015 , publisher=

work page 2015
[71]

BMC Medical Genomics , volume=

The prognostic significance of ubiquitination-related genes in multiple myeloma by bioinformatics analysis , author=. BMC Medical Genomics , volume=. 2024 , publisher=

work page 2024
[72]

Econometrica: journal of the Econometric Society , pages=

Root-N-consistent semiparametric regression , author=. Econometrica: journal of the Econometric Society , pages=. 1988 , publisher=

work page 1988
[73]

2003 , publisher=

Semiparametric regression , author=. 2003 , publisher=

work page 2003
[74]

2018 , publisher=

Double/debiased machine learning for treatment and structural parameters , author=. 2018 , publisher=

work page 2018
[75]

The Annals of Applied Statistics , volume=

A weakly informative default prior distribution for logistic and other regression models , author=. The Annals of Applied Statistics , volume=

work page
[76]

Electronic Journal of Statistics , volume=

The horseshoe estimator: Posterior concentration around nearly black vectors , author=. Electronic Journal of Statistics , volume=

work page
[77]

Nature biotechnology , volume=

Selective chemical labeling reveals the genome-wide distribution of 5-hydroxymethylcytosine , author=. Nature biotechnology , volume=. 2011 , publisher=

work page 2011
[78]

Cell research , volume=

5-Hydroxymethylcytosine signatures in circulating cell-free DNA as diagnostic biomarkers for human cancers , author=. Cell research , volume=. 2017 , publisher=

work page 2017
[79]

Blood , volume=

Multiple myeloma , author=. Blood , volume=

work page
[80]

Blood, the Journal of the American Society of Hematology , volume=

Racial disparities in incidence and outcome in multiple myeloma: a population-based study , author=. Blood, the Journal of the American Society of Hematology , volume=. 2010 , publisher=

work page 2010

Showing first 80 references.

[1] [1]

Clinical Pharmacology & Therapeutics , volume=

Beyond randomized clinical trials: use of external controls , author=. Clinical Pharmacology & Therapeutics , volume=. 2020 , publisher=

work page 2020

[2] [2]

Clinical Trials , volume=

Summarizing historical information on controls in clinical trials , author=. Clinical Trials , volume=. 2010 , publisher=

work page 2010

[3] [3]

Statistics in Medicine , volume=

Bayesian semiparametric meta-analytic-predictive prior for historical control borrowing in clinical trials , author=. Statistics in Medicine , volume=. 2021 , publisher=

work page 2021

[4] [4]

Biometrics , volume=

A causal inference framework for leveraging external controls in hybrid trials , author=. Biometrics , volume=. 2024 , publisher=

work page 2024

[5] [5]

Biometrics , volume=

Improving efficiency of inference in clinical trials with external control data , author=. Biometrics , volume=. 2023 , publisher=

work page 2023

[6] [6]

arXiv preprint arXiv:2310.20087 , year=

PAM-HC: A Bayesian Nonparametric Construction of Hybrid Control for Randomized Clinical Trials Using External Data , author=. arXiv preprint arXiv:2310.20087 , year=

work page arXiv

[7] [7]

Statistics in Medicine , volume=

Incorporating external data into the analysis of clinical trials via Bayesian additive regression trees , author=. Statistics in Medicine , volume=. 2021 , publisher=

work page 2021

[8] [8]

The lancet oncology , volume=

International Myeloma Working Group consensus criteria for response and minimal residual disease assessment in multiple myeloma , author=. The lancet oncology , volume=. 2016 , publisher=

work page 2016

[9] [9]

Blood, The Journal of the American Society of Hematology , volume=

Whole-genome sequencing of multiple myeloma from diagnosis to plasma cell leukemia reveals genomic initiating events, evolution, and clonal tides , author=. Blood, The Journal of the American Society of Hematology , volume=. 2012 , publisher=

work page 2012

[10] [10]

Blood, The Journal of the American Society of Hematology , volume=

Clonal competition with alternating dominance in multiple myeloma , author=. Blood, The Journal of the American Society of Hematology , volume=. 2012 , publisher=

work page 2012

[11] [11]

European journal of haematology , volume=

Dynamics of tumor-specific cfDNA in response to therapy in multiple myeloma patients , author=. European journal of haematology , volume=. 2020 , publisher=

work page 2020

[12] [12]

Nature Reviews Genetics , volume=

DNA methylation profiling in the clinic: applications and challenges , author=. Nature Reviews Genetics , volume=. 2012 , publisher=

work page 2012

[13] [13]

Nature Reviews Cancer , volume=

Cell-free nucleic acids as biomarkers in cancer patients , author=. Nature Reviews Cancer , volume=. 2011 , publisher=

work page 2011

[14] [14]

New England Journal of Medicine , volume=

Application of cell-free DNA analysis to cancer treatment , author=. New England Journal of Medicine , volume=. 2018 , publisher=

work page 2018

[15] [15]

arXiv preprint arXiv:2404.07923 , year=

A Bayesian Estimator of Sample Size , author=. arXiv preprint arXiv:2404.07923 , year=

work page arXiv

[16] [16]

Journal of Big data , volume=

A survey of transfer learning , author=. Journal of Big data , volume=. 2016 , publisher=

work page 2016

[17] [17]

Handbook of research on machine learning applications and trends: algorithms, methods, and techniques , pages=

Transfer learning , author=. Handbook of research on machine learning applications and trends: algorithms, methods, and techniques , pages=. 2010 , publisher=

work page 2010

[18] [18]

Proceedings of the IEEE , volume=

A comprehensive survey on transfer learning , author=. Proceedings of the IEEE , volume=. 2020 , publisher=

work page 2020

[19] [19]

Journal of Big Data , volume=

Transfer learning: a friendly introduction , author=. Journal of Big Data , volume=. 2022 , publisher=

work page 2022

[20] [20]

Learning , volume=

Transfer learning , author=. Learning , volume=

work page

[21] [21]

IEEE Transactions on knowledge and data engineering , volume=

A survey on transfer learning , author=. IEEE Transactions on knowledge and data engineering , volume=. 2009 , publisher=

work page 2009

[22] [22]

arXiv preprint arXiv:2312.13484 , year=

Bayesian transfer learning , author=. arXiv preprint arXiv:2312.13484 , year=

work page arXiv

[23] [23]

IEEE Transactions on Signal Processing , volume=

Optimal Bayesian transfer learning , author=. IEEE Transactions on Signal Processing , volume=. 2018 , publisher=

work page 2018

[24] [24]

IEEE Signal Processing Letters , volume=

Optimal Bayesian transfer regression , author=. IEEE Signal Processing Letters , volume=. 2018 , publisher=

work page 2018

[25] [25]

arXiv preprint arXiv:2109.13233 , year=

Bayesian transfer learning: An overview of probabilistic graphical models for transfer learning , author=. arXiv preprint arXiv:2109.13233 , year=

work page arXiv

[26] [26]

Advances in Neural Information Processing Systems , volume=

Pre-train your loss: Easy bayesian transfer learning with informative priors , author=. Advances in Neural Information Processing Systems , volume=

work page

[27] [27]

IEEE Transactions on Knowledge and Data Engineering , volume=

Transfer learning for dynamic feature extraction using variational Bayesian inference , author=. IEEE Transactions on Knowledge and Data Engineering , volume=. 2021 , publisher=

work page 2021

[28] [28]

International conference on artificial intelligence and statistics , pages=

Transfer learning with gaussian processes for bayesian optimization , author=. International conference on artificial intelligence and statistics , pages=. 2022 , organization=

work page 2022

[29] [29]

IEEE Transactions on Geoscience and Remote Sensing , volume=

Bayesian transfer learning for object detection in optical remote sensing images , author=. IEEE Transactions on Geoscience and Remote Sensing , volume=. 2020 , publisher=

work page 2020

[30] [30]

Quality Engineering , volume=

Statistical transfer learning: A review and some extensions to statistical process control , author=. Quality Engineering , volume=. 2018 , publisher=

work page 2018

[31] [31]

proceedings of the AAAI Conference on Artificial Intelligence , volume=

Adaptive transfer learning , author=. proceedings of the AAAI Conference on Artificial Intelligence , volume=

work page

[32] [32]

The Annals of Statistics , volume=

Adaptive transfer learning , author=. The Annals of Statistics , volume=. 2021 , publisher=

work page 2021

[33] [33]

Biometrika , volume=

Sparse Bayesian infinite factor models , author=. Biometrika , volume=. 2011 , publisher=

work page 2011

[34] [34]

Variable selection for

Bleich, Justin and Kapelner, Adam and George, Edward I and Jensen, Shane T , journal=. Variable selection for. 2014 , publisher=

work page 2014

[35] [35]

Variable selection using

Luo, Chuji and Daniels, Michael J , journal=. Variable selection using. 2024 , publisher=

work page 2024

[36] [36]

Bayesian additive regression trees and the General

Tan, Yaoyuan Vincent and Roy, Jason , journal=. Bayesian additive regression trees and the General. 2019 , publisher=

work page 2019

[37] [37]

2010 , publisher=

Chipman, Hugh A and George, Edward I and McCulloch, Robert E , journal=. 2010 , publisher=

work page 2010

[38] [38]

Journal of Statistical Software , volume=

bartMachine: Machine learning with Bayesian additive regression trees , author=. Journal of Statistical Software , volume=

work page

[39] [39]

Annual Review of Statistics and Its Application , volume=

Bayesian additive regression trees: A review and look forward , author=. Annual Review of Statistics and Its Application , volume=. 2020 , publisher=

work page 2020

[40] [40]

Journal of the American Statistical Association , volume=

Bayesian regression trees for high-dimensional prediction and variable selection , author=. Journal of the American Statistical Association , volume=. 2018 , publisher=

work page 2018

[41] [41]

The annals of applied statistics , volume=

A semiparametric modeling approach using Bayesian additive regression trees with an application to evaluate heterogeneous treatment effects , author=. The annals of applied statistics , volume=

work page

[42] [42]

The Annals of Applied Statistics , volume=

Accounting for shared covariates in semiparametric Bayesian additive regression trees , author=. The Annals of Applied Statistics , volume=. 2025 , publisher=

work page 2025

[43] [43]

Journal of Computational and Graphical Statistics , volume=

A product partition model with regression on covariates , author=. Journal of Computational and Graphical Statistics , volume=. 2011 , publisher=

work page 2011

[44] [44]

Johns Hopkins University, Dept

FDR and Bayesian multiple comparisons rules , author=. Johns Hopkins University, Dept. of Biostatistics Working Papers , volume=. 2006 , publisher=

work page 2006

[45] [45]

Journal of the Royal statistical society: series B (Methodological) , volume=

Controlling the false discovery rate: a practical and powerful approach to multiple testing , author=. Journal of the Royal statistical society: series B (Methodological) , volume=. 1995 , publisher=

work page 1995

[46] [46]

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining , pages=

Regression-based latent factor models , author=. Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining , pages=

work page

[47] [47]

International Workshop on Artificial Intelligence and Statistics , pages=

Semiparametric latent factor models , author=. International Workshop on Artificial Intelligence and Statistics , pages=. 2005 , organization=

work page 2005

[48] [48]

Journal of the National Cancer Institute Monographs , volume=

Etiologic heterogeneity among non-Hodgkin lymphoma subtypes: the InterLymph non-Hodgkin lymphoma subtypes project , author=. Journal of the National Cancer Institute Monographs , volume=. 2014 , publisher=

work page 2014

[49] [49]

Journal of the National Cancer Institute Monographs , volume=

Medical history, lifestyle, family history, and occupational risk factors for follicular lymphoma: the InterLymph Non-Hodgkin Lymphoma Subtypes Project , author=. Journal of the National Cancer Institute Monographs , volume=. 2014 , publisher=

work page 2014

[50] [50]

NPJ genomic medicine , volume=

Alterations of 5-hydroxymethylation in circulating cell-free DNA reflect molecular distinctions of subtypes of non-Hodgkin lymphoma , author=. NPJ genomic medicine , volume=. 2021 , publisher=

work page 2021

[51] [51]

Journal of Hematology & Oncology , volume=

Genome-wide profiling of 5-hydroxymethylcytosines in circulating cell-free DNA reveals population-specific pathways in the development of multiple myeloma , author=. Journal of Hematology & Oncology , volume=. 2022 , publisher=

work page 2022

[52] [52]

The Journal of Machine Learning Research , volume=

A widely applicable Bayesian information criterion , author=. The Journal of Machine Learning Research , volume=. 2013 , publisher=

work page 2013

[53] [53]

Bayesian Cross Validation and WAIC for Predictive Prior Design in Regular Asymptotic Theory

Bayesian cross validation and WAIC for predictive prior design in regular asymptotic theory , author=. arXiv preprint arXiv:1503.07970 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[54] [54]

Proceedings of the Workshop on Information Theoretic Methods in Science and Engineering , pages=

WAIC and WBIC are information criteria for singular statistical model evaluation , author=. Proceedings of the Workshop on Information Theoretic Methods in Science and Engineering , pages=

work page

[55] [55]

Statistica Sinica , pages=

Assessing the treatment effect heterogeneity with a latent variable , author=. Statistica Sinica , pages=. 2018 , publisher=

work page 2018

[56] [56]

American Journal of Epidemiology , volume=

Assessing heterogeneity of treatment effects in observational studies , author=. American Journal of Epidemiology , volume=. 2021 , publisher=

work page 2021

[57] [57]

arXiv preprint arXiv:2509.05775 , year=

Causal Clustering for Conditional Average Treatment Effects Estimation and Subgroup Discovery , author=. arXiv preprint arXiv:2509.05775 , year=

work page arXiv

[58] [58]

Annals of Statistics , pages=

Convergence rates of posterior distributions , author=. Annals of Statistics , pages=. 2000 , publisher=

work page 2000

[59] [59]

Genome biology , volume=

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , author=. Genome biology , volume=. 2014 , publisher=

work page 2014

[60] [60]

CRAN R Repositary , volume=

Package ‘glmnet’ , author=. CRAN R Repositary , volume=

work page

[61] [61]

Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

Regularization and variable selection via the elastic net , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2005 , publisher=

work page 2005

[62] [62]

The annals of statistics , pages=

The bayesian bootstrap , author=. The annals of statistics , pages=. 1981 , publisher=

work page 1981

[63] [63]

Journal of the American statistical Association , volume=

Bayesian analysis of binary and polychotomous response data , author=. Journal of the American statistical Association , volume=. 1993 , publisher=

work page 1993

[64] [64]

mclust 5: Clustering, classification and density estimation using

Scrucca, Luca and Fop, Michael and Murphy, T Brendan and Raftery, Adrian E , journal=. mclust 5: Clustering, classification and density estimation using

work page

[65] [65]

Robin, Xavier and Turck, Natacha and Hainard, Alexandre and Tiberti, Natalia and Lisacek, Fr. p. BMC Bioinformatics , volume=. 2011 , publisher=

work page 2011

[66] [66]

Leukemia , volume=

Contrast in cytokine expression between patients with monoclonal gammopathy of undetermined significance or multiple myeloma , author=. Leukemia , volume=. 1998 , publisher=

work page 1998

[67] [67]

Mayo Clinic Proceedings , volume=

Induction of a chronic disease state in patients with smoldering or indolent multiple myeloma by targeting interleukin 1 -induced interleukin 6 production and the myeloma proliferative component , author=. Mayo Clinic Proceedings , volume=. 2009 , organization=

work page 2009

[68] [68]

Oncotarget , volume=

FGF23 is elevated in multiple myeloma and increases heparanase expression by tumor cells , author=. Oncotarget , volume=

work page

[69] [69]

Proceedings of the National Academy of Sciences , volume=

Activating KRAS, NRAS, and BRAF mutants enhance proteasome capacity and reduce endoplasmic reticulum stress in multiple myeloma , author=. Proceedings of the National Academy of Sciences , volume=. 2020 , publisher=

work page 2020

[70] [70]

elife , volume=

Paradoxical resistance of multiple myeloma to proteasome inhibitors by decreased levels of 19S proteasomal subunits , author=. elife , volume=. 2015 , publisher=

work page 2015

[71] [71]

BMC Medical Genomics , volume=

The prognostic significance of ubiquitination-related genes in multiple myeloma by bioinformatics analysis , author=. BMC Medical Genomics , volume=. 2024 , publisher=

work page 2024

[72] [72]

Econometrica: journal of the Econometric Society , pages=

Root-N-consistent semiparametric regression , author=. Econometrica: journal of the Econometric Society , pages=. 1988 , publisher=

work page 1988

[73] [73]

2003 , publisher=

Semiparametric regression , author=. 2003 , publisher=

work page 2003

[74] [74]

2018 , publisher=

Double/debiased machine learning for treatment and structural parameters , author=. 2018 , publisher=

work page 2018

[75] [75]

The Annals of Applied Statistics , volume=

A weakly informative default prior distribution for logistic and other regression models , author=. The Annals of Applied Statistics , volume=

work page

[76] [76]

Electronic Journal of Statistics , volume=

The horseshoe estimator: Posterior concentration around nearly black vectors , author=. Electronic Journal of Statistics , volume=

work page

[77] [77]

Nature biotechnology , volume=

Selective chemical labeling reveals the genome-wide distribution of 5-hydroxymethylcytosine , author=. Nature biotechnology , volume=. 2011 , publisher=

work page 2011

[78] [78]

Cell research , volume=

5-Hydroxymethylcytosine signatures in circulating cell-free DNA as diagnostic biomarkers for human cancers , author=. Cell research , volume=. 2017 , publisher=

work page 2017

[79] [79]

Blood , volume=

Multiple myeloma , author=. Blood , volume=

work page

[80] [80]

Blood, the Journal of the American Society of Hematology , volume=

Racial disparities in incidence and outcome in multiple myeloma: a population-based study , author=. Blood, the Journal of the American Society of Hematology , volume=. 2010 , publisher=

work page 2010