pith. sign in

arxiv: 2503.08606 · v1 · submitted 2025-03-11 · 📊 stat.ME · q-bio.QM· stat.AP

A Multi-Omics Framework for Survival Mediation Analysis of High-Dimensional Proteogenomic Data

Pith reviewed 2026-05-23 00:03 UTC · model grok-4.3

classification 📊 stat.ME q-bio.QMstat.AP
keywords survival mediation analysishigh-dimensional datamulti-omicsaccelerated failure time modelproteogenomicscausal pathwayshead and neck carcinoma
0
0 comments X

The pith

SMAHP provides a statistical framework for identifying how genes influence survival through protein mediators in high-dimensional multi-omics data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops SMAHP, a method for survival mediation analysis that handles high-dimensional exposures and mediators from multiple omics layers. It uses the accelerated failure time model to link these to survival outcomes in a causal framework. Simulations show it has high power and controls false discovery rates better than two other approaches. Application to head-and-neck carcinoma proteogenomic data identifies a gene whose effect on survival is mediated by a protein. This allows uncovering integrated biological pathways affecting time-to-event outcomes.

Core claim

SMAHP is a novel multi-omics causal mediation method based on the accelerated failure time model that simultaneously analyzes high-dimensional exposures and mediators to identify causal pathways affecting survival outcomes.

What carries the argument

SMAHP, a procedure that performs high-dimensional mediation analysis within the accelerated failure time model for multi-omics survival data.

If this is right

  • SMAHP achieves higher statistical power than compared methods while controlling FDR.
  • It can be applied to large proteogenomic datasets like those for head-and-neck carcinoma.
  • It detects specific mediated effects, such as a gene mediated by a protein on survival time.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the AFT model fits well, SMAHP could extend to other time-to-event outcomes in genomics.
  • Integrating more omics layers might reveal additional mediated pathways not visible in single-omics analysis.
  • Validation in independent datasets would strengthen the detected mediation in cancer data.

Load-bearing premise

The accelerated failure time model and the high-dimensional mediation assumptions hold sufficiently well in the proteogenomic setting, and the simulation scenarios adequately capture the statistical challenges of real multi-omics data.

What would settle it

Observing that SMAHP fails to control the false discovery rate or loses power in simulations where the accelerated failure time assumption is violated would challenge the method's validity.

read the original abstract

Survival analysis plays a crucial role in understanding time-to-event (survival) outcomes such as disease progression. Despite recent advancements in causal mediation frameworks for survival analysis, existing methods are typically based on Cox regression and primarily focus on a single exposure or individual omics layers, often overlooking multi-omics interplay. This limitation hinders the full potential of integrated biological insights. In this paper, we propose SMAHP, a novel method for survival mediation analysis that simultaneously handles high-dimensional exposures and mediators, integrates multi-omics data, and offers a robust statistical framework for identifying causal pathways on survival outcomes. This is one of the first attempts to introduce the accelerated failure time (AFT) model within a multi-omics causal mediation framework for survival outcomes. Through simulations across multiple scenarios, we demonstrate that SMAHP achieves high statistical power, while effectively controlling false discovery rate (FDR), compared with two other approaches. We further apply SMAHP to the largest head-and-neck carcinoma proteogenomic data, detecting a gene mediated by a protein that influences survival time.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes SMAHP, a novel survival mediation analysis method for high-dimensional proteogenomic data that integrates multiple omics layers, handles high-dimensional exposures and mediators, and employs the accelerated failure time (AFT) model. Simulations across multiple scenarios are claimed to show that SMAHP achieves high statistical power while controlling FDR better than two comparator approaches. The method is applied to the largest head-and-neck carcinoma proteogenomic dataset, identifying a gene mediated by a protein that influences survival time. This is presented as one of the first uses of AFT within a multi-omics causal mediation framework for survival outcomes.

Significance. If the central performance claims hold after verification of the derivations and assumptions, the work would address a clear gap: existing survival mediation methods are largely Cox-based and limited to single exposures or single omics layers. A properly validated AFT-based multi-omics framework could enable more flexible modeling of time-to-event outcomes and integrated biological pathway discovery. The real-data application provides a concrete demonstration of utility, but the absence of explicit model equations or high-dimensional handling details in the abstract limits immediate assessment of novelty and robustness.

major comments (2)
  1. [Abstract] Abstract: the claim that SMAHP 'achieves high statistical power, while effectively controlling false discovery rate (FDR)' is presented without any derivation details, model equations, or description of how high-dimensionality or multiple omics layers are handled, preventing verification that the math supports the stated performance claims.
  2. [Abstract] Abstract: the paper presents SMAHP as novel without any indication that performance metrics reduce to quantities defined by fitted parameters or self-referential citations, leaving the circularity concern unaddressed.
minor comments (1)
  1. [Abstract] The weakest assumption flagged—that the AFT model and high-dimensional mediation assumptions hold sufficiently well in the proteogenomic setting and that simulation scenarios capture real multi-omics challenges—requires explicit discussion and sensitivity checks in the methods section.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful review and for highlighting areas where the abstract could be clarified. We address each major comment below. The full manuscript provides the requested details on model equations, high-dimensional methods, and simulation-based performance evaluation; we propose a targeted abstract revision for improved readability while maintaining the summary nature of that section.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that SMAHP 'achieves high statistical power, while effectively controlling false discovery rate (FDR)' is presented without any derivation details, model equations, or description of how high-dimensionality or multiple omics layers are handled, preventing verification that the math supports the stated performance claims.

    Authors: The abstract is intentionally concise and summarizes results whose supporting derivations, model equations (AFT-based mediation model), and high-dimensional regularization procedures (for exposures and mediators across omics layers) appear in Sections 2–3 of the manuscript. The power and FDR claims are empirical, obtained from the simulation experiments in Section 4 that compare SMAHP against two existing approaches under multiple scenarios. We agree that a brief additional phrase in the abstract would help readers locate these elements and will revise the abstract accordingly. revision: yes

  2. Referee: [Abstract] Abstract: the paper presents SMAHP as novel without any indication that performance metrics reduce to quantities defined by fitted parameters or self-referential citations, leaving the circularity concern unaddressed.

    Authors: The novelty statement refers specifically to the use of the AFT model inside a multi-omics causal mediation framework for survival outcomes, which has not appeared in the prior literature. The reported performance metrics are not analytic functions of fitted parameters; they are Monte Carlo estimates of power and FDR obtained by applying SMAHP and the comparator methods to simulated data generated under known truth. No self-referential citations are used anywhere in the manuscript; all comparisons cite published methods. We therefore do not identify a circularity issue. revision: no

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The provided abstract and context describe SMAHP as a proposed method integrating AFT models into multi-omics mediation for survival outcomes, validated via simulations and a real-data application. No equations, parameter-fitting steps, or derivations are shown that reduce by construction to the inputs (e.g., no fitted quantities renamed as predictions, no self-definitional loops, and no load-bearing self-citations). The performance claims rest on external simulation benchmarks and data analysis rather than tautological redefinitions. This is the common case of a self-contained methodological proposal without internal reduction to its own fitted values or prior author work.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no information on free parameters, axioms, or invented entities; therefore the ledger is empty.

pith-pipeline@v0.9.0 · 5733 in / 1157 out tokens · 47835 ms · 2026-05-23T00:03:32.512714+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

64 extracted references · 64 canonical work pages

  1. [1]

    Oldham, J., Huang, Y., Bose, S. et al. Proteomic biomarkers of survival in idiopathic pulmonary fibrosis. Am J Respir Crit Care Med 209, 1111– 1120 (2024)

  2. [2]

    & Barnholtz-Sloan, J

    Stetson, L., Dazard, J. & Barnholtz-Sloan, J. Protein markers predict survival in glioma patients. Mol Cell Proteomics 15, 2356–2365 (2016)

  3. [3]

    & Yang, D

    Wu, Z. & Yang, D. Identification of a protein signature for predicting over- all survival of hepatocellular carcinoma: a study based on data mining. BMC Cancer 20, 720 (2020)

  4. [4]

    Huo, Z., Duan, Y., Zhan, D. et al. Proteomic stratification of prognosis and treatment options for small cell lung cancer. Genomics Proteomics Bioinformatics 22, qzae033 (2024)

  5. [5]

    Schuermans, A., Pournamdari, A., Lee, J. et al. Integrative proteomic analyses across common cardiac diseases yield mechanistic insights and enhanced prediction. Nat Cardiovasc Res 3, 1516–1530 (2024)

  6. [6]

    & Zhang, S

    Zhao, S., Cang, H., Liu, Y., Huang, Y. & Zhang, S. Integrated analysis of bulk rna-seq and single-cell rna-seq reveals the function of pyrocytosis in the pathogenesis of abdominal aortic aneurysm. Aging (Albany NY) 15, 15287–15323 (2023)

  7. [7]

    Zhang, Y., Wu, B., Chen, S. et al. Whole exome sequencing analyses iden- tified novel genes for alzheimer’s disease and related dementia.Alzheimers Dement 20, 7062–7078 (2024) . SMAHP 19

  8. [8]

    Liu, X. et al. A prognostic gene expression signature for oropharyngeal squamous cell carcinoma. EBioMedicine 61, 102805 (2020)

  9. [9]

    Luo, C. et al. High-dimensional mediation analysis in survival models. PLoS Comput Biol 16, e1007768 (2020)

  10. [10]

    Fan, J. & Lv, J. Sure independence screening for ultrahigh dimensional feature space. J. Royal Stat. Soci. Ser. B 70, 849–911 (2008)

  11. [11]

    Nearly unbiased variable selection under minimax concave penalty

    Zhang, C. Nearly unbiased variable selection under minimax concave penalty. Annals of Statistics 38, 894–942 (2010)

  12. [12]

    & Liu, L

    Zhang, H., Zheng, Y., Hou, L., Zheng, C. & Liu, L. Mediation analysis for survival data with high-dimensional mediators. Bioinformatics 37, 3815–3821 (2021)

  13. [13]

    & Wei, P

    Chi, S., Flowers, C., Li, Z., Huang, X. & Wei, P. Mash: Mediation analysis of survival outcome and high-dimensional omics mediators with application to complex diseases. Ann Appl Stat 18, 1360–1377 (2024)

  14. [14]

    Shao, Z. et al. IUSMMT: Survival mediation analysis of gene expression with multiple DNA methylation exposures and its application to cancers of TCGA. PLoS Comput Biol 31, e1009250 (2021)

  15. [15]

    Sharma, A. et al. Comprehensive multi-omics analysis of breast cancer reveals distinct long-term prognostic subtypes. Oncogenesis 13, 22 (2024)

  16. [16]

    Nativio, R. et al. An integrated multi-omics approach identifies epigenetic alterations associated with alzheimer’s disease. Nat Genet 52, 1024–1035 (2020)

  17. [17]

    Lim, J. et al. Advances in single-cell omics and multiomics for high- resolution molecular profiling. Exp Mol Med 56, 515–526 (2024)

  18. [18]

    & Lusis, A

    Hasin, Y., Seldin, M. & Lusis, A. Multi-omics approaches to disease. Genome Biology 18, 83 (2017)

  19. [19]

    Huang, L. et al. A unified mediation analysis framework for integra- tive cancer proteogenomics with clinical outcomes. Bioinformatics 39, btad023 (2023)

  20. [20]

    Petralia, F. et al. Integrated proteogenomic characterization across major histological types of pediatric brain cancer. Cell 183, 1962–1985.e31 (2020)

  21. [21]

    Zhan, X. et al. Correlation analysis of histopathology and proteogenomics data for breast cancer. Mol Cell Proteomics 18, S37–S51 (2019) . 20 SMAHP

  22. [22]

    Huang, C. et al. Proteogenomic insights into the biology and treatment of hpv-negative head and neck squamous cell carcinoma. Cancer Cell 39, 361–379.e16 (2021)

  23. [23]

    Satpathy, S. et al. A proteogenomic portrait of lung squamous cell carcinoma. Cell 184, 4348–4371.e40 (2021)

  24. [24]

    Vasaikar, S. et al. Proteogenomic analysis of human colon cancer reveals new therapeutic opportunities. Cell 177, 1035–1049.e1019 (2019)

  25. [25]

    Savage, S. et al. Pan-cancer proteogenomics expands the landscape of therapeutic targets. Cell 187, 4389–4407.e15 (2024)

  26. [26]

    & Creighton, C

    Zhang, Y., Chen, F., Chandrashekar, D., Varambally, S. & Creighton, C. Proteogenomic characterization of 2002 human cancers reveals pan- cancer molecular subtypes and associated pathways. Nat Commun 13, 2669 (2022)

  27. [27]

    & Molstad, A

    Suder, P. & Molstad, A. Scalable algorithms for semiparametric acceler- ated failure time models in high dimensions. Statistics in Medicine 41, 933–949 (2022)

  28. [28]

    & Sheets, V

    MacKinnon, D., Lockwood, C., Hoffman, J., West, S. & Sheets, V. A comparison of methods to test mediation and other intervening variable effects. Psychological Methods 7, 83–104 (2002)

  29. [29]

    Estimating causal effects of treatments in randomized and non-randomized studies

    Rubin, D. Estimating causal effects of treatments in randomized and non-randomized studies. Journal of Educational Psychology 66, 688–701 (1974)

  30. [30]

    Statistics and causal inference

    Holland, P. Statistics and causal inference. Journal of the American Statistical Association 81, 945–960 (1986)

  31. [31]

    & Greenland, S

    Robins, J. & Greenland, S. Identifiability and exchangeability for direct and indirect effects. Epidemiology 3, 143–155 (1992)

  32. [32]

    Mediation analysis via potential outcomes models

    Albert, J. Mediation analysis via potential outcomes models. Statistics in Medicine 27, 1282–1304 (2008)

  33. [33]

    & Vanderweele, T

    Vansteelandt, S. & Vanderweele, T. Natural direct and indirect effects on the exposed: effect decomposition under weaker assumptions. Biometrics 68, 1019–1027 (2012)

  34. [34]

    & Arah, O

    VanderWeele, T. & Arah, O. Bias formulas for sensitivity analy- sis of unmeasured confounding for general outcomes, treatments, and confounders. Epidemiology 22, 42–52 (2011) . SMAHP 21

  35. [35]

    & Bekaert, M

    Lange, T., Vansteelandt, S. & Bekaert, M. A simple unified approach for estimating natural direct and indirect effects. Am J Epidemiology 176, 190–195 (2012)

  36. [36]

    & Yang, H

    Huang, Y. & Yang, H. Causal mediation analysis of survival outcome with multiple mediators. Epidemiology 28, 370–378 (2017)

  37. [37]

    The accelerated failure time model: a useful alternative to the cox regression model in survival analysis

    Wei, L. The accelerated failure time model: a useful alternative to the cox regression model in survival analysis. Stat Med 11, 1871–1879 (1992)

  38. [38]

    Causal mediation analysis with survival data

    VanderWeele, T. Causal mediation analysis with survival data. Epidemi- ology 22, 582–585 (2011)

  39. [39]

    & Williams, P

    Fulcher, I., Tchetgen Tchetgen, E. & Williams, P. Mediation analy- sis for censored survival data under an accelerated failure time model. Epidemiology 28, 660–666 (2017)

  40. [40]

    & Rao, C

    Krishnaiah, P. & Rao, C. Handbook of Statistics Vol 7 (Elsevier Science Publishers, Amsterdam, Netherlands, 1988)

  41. [41]

    Clark-Boucher, D. et al. Methods for mediation analysis with high- dimensional DNA methylation data: Possible choices and comparisons. PLOS Genetics 19, e1011022 (2023)

  42. [42]

    & Hastie, T

    Zou, H. & Hastie, T. Regularization and variable selection via the elastic net. J. R. Statistic. Soc. B 67, 301–320 (2005)

  43. [43]

    Regression shrinkage and selection via the lasso

    Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Statistic. Soc. B 58, 267–288 (1996)

  44. [44]

    & Leng, C

    Wang, X. & Leng, C. High Dimensional Ordinary Least Squares Pro- jection for Screening Variables. J. Royal Stat. Soci. Ser. B 78, 589–611 (2016)

  45. [45]

    & Dai, X

    Fu, G., Wang, G. & Dai, X. An adaptive threshold determination method of feature screening for genomic selection. BMC Bioinformatics 18, 212 (2017)

  46. [46]

    & Zhu, L

    Li, R., Zhong, W. & Zhu, L. Feature Screening via Distance Correlation Learning. J Am Stat Assoc 107, 1129–1139 (2012)

  47. [47]

    & Zhu, H

    Pan, W., Wang, X., Xiao, W. & Zhu, H. A Generic Sure Independence Screening Procedure. J Am Stat Assoc 114, 928–937 (2019)

  48. [48]

    Zhang, H. et al. Estimating and testing high-dimensional mediation effects in epigenetic studies. Bioinformatics 32, 3150–3154 (2016) . 22 SMAHP

  49. [49]

    Barfield, R. et al. Testing for the indirect effect under the null for genome- wide mediation analyses. Genetic Epidemiology 41, 824–833 (2017)

  50. [50]

    & Hochberg, Y

    Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Statist. Soc. B 57, 289–300 (1995)

  51. [51]

    Haughton, P. et al. Differential transcriptional invasion signatures from patient derived organoid models define a functional prognostic tool for head and neck cancer. Oncogene 43, 2463–2474 (2024)

  52. [52]

    Bray, F. et al. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 68, 394–424 (2018)

  53. [53]

    Rudnick, P. et al. A Description of the Clinical Proteomic Tumor Analysis Consortium (CPTAC) Common Data Analysis Pipeline. J Proteome Res 15, 1023–1032 (2016)

  54. [54]

    Niu, L. et al. Biological functions and theranostic potential of HMGB family members in human cancers. Ther Adv Med Oncol 12, 1758835920970850 (2020)

  55. [55]

    Rappaport, N. et al. MalaCards: an amalgamated human disease com- pendium with diverse clinical and genetic annotation and structured search. Nucleic Acids Res 45, D877–D887 (2017)

  56. [56]

    Paczkowska, M. et al. Integrative pathway enrichment analysis of multivariate omics data. Nat Commun 11, 735 (2020)

  57. [57]

    Cheng, F. et al. Comprehensive characterization of protein-protein inter- actions perturbed by disease mutations. Nat Genet 53, 342–353 (2021)

  58. [58]

    A direct approach to false discovery rates

    Storey, J. A direct approach to false discovery rates. J. R. Statist. Soc. B 64, 479–498 (2002)

  59. [59]

    & Troendle, J

    Westfall, P. & Troendle, J. Multiple testing with minimal assumptions. Biom J 50, 745–755 (2008)

  60. [60]

    & LeBlanc, M

    Dai, J., Stanford, J. & LeBlanc, M. A multiple-testing procedure for high- dimensional mediation hypotheses. J Am Stat Assoc 117, 198–213 (2022)

  61. [61]

    & Tibshirani, R

    Simon, N., Friedman, J., Hastie, T. & Tibshirani, R. A Sparse-Group Lasso. Journal of Computational & Graphical Statistics 22, 231–245 (2013) . SMAHP 23

  62. [62]

    & Zhang, H

    Zou, H. & Zhang, H. On the adaptive elastic-net with a diverging number of parameters. Annals of Statistics 37, 1733–1751 (2009)

  63. [63]

    & Kennard, R

    Hoerl, A. & Kennard, R. Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics 42, 80–86 (2000)

  64. [64]

    & Huang, J

    Breheny, P. & Huang, J. Coordinate descent algorithms for noncon- vex penalized regression, with applications to biological feature selection. Annals of Applied Statistics 5, 232–253 (2011) . 24 SMAHP Appendix A Penalized Outcome Model in Step 1 Suppose the outcome model penalization is applied separately for X or M when Φ⊤ i = X and θ = βX, and when Φ⊤ i...