Considering causality in the construction of molecular signatures of lifestyle exposures

Diana Wu; Vivian Viallon

arxiv: 2605.26023 · v1 · pith:PXXORUCRnew · submitted 2026-05-25 · 📊 stat.ME

Considering causality in the construction of molecular signatures of lifestyle exposures

Diana Wu , Vivian Viallon This is my paper

Pith reviewed 2026-06-29 20:24 UTC · model grok-4.3

classification 📊 stat.ME

keywords molecular signaturesomics datacollider biasdirected acyclic graphsunivariate screeninglifestyle exposurescausal inferenceepidemiological studies

0 comments

The pith

Univariate screening before multivariate modeling reduces collider bias when building molecular signatures of lifestyle exposures from omics data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that regressing an exposure directly on all omics features without first screening can create collider bias. This bias arises because the modeling step conditions on variables in ways that induce spurious associations between the exposure and non-causal features. A preliminary univariate screening step avoids this by excluding features that lack direct marginal association with the exposure. The result matters for both proxy use and mechanistic interpretation, since non-causal features can distort downstream conclusions about disease pathways. Simulations confirm that screening lowers the rate of non-causal inclusions, though it also reduces sensitivity and the overall correlation between exposure and signature.

Core claim

In settings where an exposure causally influences molecular features, directed acyclic graphs demonstrate that omitting the univariate screening step before multivariate regression opens a non-causal path through collider bias, so that non-causal features enter the signature. The screening step closes this path by restricting the feature set to those with marginal association, thereby limiting inclusion of non-causal variables.

What carries the argument

Directed acyclic graphs and d-separation arguments that trace how conditioning on molecular features during regression can open biasing paths from exposure to non-causal variables.

If this is right

Signatures constructed without screening are more likely to contain non-causal features.
Univariate screening lowers the number of non-causal features retained in the final signature.
Screening trades off some sensitivity and some correlation between the exposure and the signature.
Screening is especially advisable when the goal is mechanistic insight rather than pure prediction.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same collider mechanism could appear in other high-dimensional regression settings where an exposure is modeled on many candidate variables.
Researchers could compare screened and unscreened signatures on their ability to predict disease outcomes after the exposure itself is held fixed.
The recommendation may need adjustment when features can influence the exposure rather than the reverse.
The sensitivity-cost of screening could be offset by larger sample sizes or by using the screened signature only as an initial filter.

Load-bearing premise

The exposure causes changes in the molecular features rather than the features causing changes in the exposure.

What would settle it

A simulation or dataset with known causal structure in which signatures built without screening contain at least as many non-causal features as those built with screening.

read the original abstract

Molecular signatures derived from omics data are increasingly used in epidemiological studies to characterize lifestyle exposures, either as proxies of exposure or to provide insight into disease mechanisms. These signatures are typically constructed by regressing the exposure on high-dimensional omics features. In the literature, an initial univariate screening step has sometimes been applied prior to multivariate modelling, but the causal implications of this choice have not yet been considered. Focusing on settings where the exposure causally influences molecular features (and not the reverse), we use directed acyclic graphs (DAGs) and $d$-separation arguments to show that collider bias may arise when the screening step is ignored, leading to the inclusion of non-causal features in the signature. We further demonstrate that the screening step can mitigate this bias. Our simulation studies illustrate that screening reduces the inclusion of non-causal features, albeit at the cost of lower sensitivity and reduced correlation between the exposure and the resulting signature. Overall, we recommend applying univariate screening prior to signature construction, particularly when the inclusion of non-causal features is undesirable, such as in mechanistic studies.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows that skipping univariate screening before multivariate modeling in omics signatures can open collider paths and pull in non-causal features, and screening blocks them.

read the letter

The main thing to know is that this paper applies d-separation to the screening step in signature construction and shows that omitting it can induce collider bias, including features that do not lie on a causal path from the exposure. They recommend screening anyway, even though it lowers sensitivity and the exposure-signature correlation.

What is actually new is the explicit causal framing of that modeling choice. The abstract states the implications had not been considered in the cited literature, and the DAG arguments are a direct way to see why non-causal features enter without screening. Within the stated scope—exposure causes the molecular features, no reverse causation—the logic is standard and appears correctly applied. The simulations are described as illustrating the reduction in non-causal inclusions, which matches the d-separation claim.

The soft spots are limited. The work stays scoped and does not claim universality, which is appropriate. The simulations are only summarized here, so the size of the bias or the exact sensitivity trade-off is not visible without the full methods and numbers. The recommendation to screen when non-causal features are undesirable is practical but will depend on whether the signature is meant as a proxy or a mechanistic tool.

This is for people who routinely build these signatures in epidemiological omics. A reader who already thinks about collider bias in high-dimensional settings will see the point quickly. It is a modest but cleanly executed caution that deserves a serious referee to check the simulation details and confirm the d-separation paths hold in the reported setups.

Referee Report

1 major / 2 minor

Summary. The manuscript claims that when constructing molecular signatures by regressing lifestyle exposures on high-dimensional omics features, omitting an initial univariate screening step can induce collider bias (via open paths in the DAG) that includes non-causal features in the final multivariate model. Using d-separation arguments on DAGs where exposure causes molecular features (but not vice versa), the authors show that screening blocks the relevant collider paths; simulation studies are said to confirm reduced inclusion of non-causal features, albeit with lower sensitivity and weaker exposure-signature correlation. The paper recommends routine use of univariate screening, especially for mechanistic applications.

Significance. If the d-separation arguments and simulation results hold, the work supplies a causal rationale for a common but previously unexamined preprocessing choice in omics epidemiology. This could improve the interpretability of signatures used for mechanistic inference by reducing non-causal features, while quantifying the sensitivity trade-off.

major comments (1)

[Simulation studies] Simulation studies section: the abstract states that screening reduces inclusion of non-causal features but provides no quantitative metrics (e.g., false-positive rates, specific bias magnitudes, or power curves) or details on data-generation process, sample sizes, or exclusion rules; without these, the magnitude of the claimed mitigation cannot be evaluated against the stated sensitivity cost.

minor comments (2)

[Abstract and Introduction] The scope restriction (exposure → molecular features, no reverse causation) is stated clearly in the abstract but should be repeated in the introduction and discussion to prevent over-generalization by readers.
[Abstract] Notation for the screening threshold and the multivariate model (e.g., how the screened features enter the final regression) is not previewed in the abstract; adding a brief equation or diagram reference would improve readability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their review and for identifying an area where additional clarity on the simulation studies would strengthen the manuscript. We address the major comment below and are happy to revise accordingly.

read point-by-point responses

Referee: [Simulation studies] Simulation studies section: the abstract states that screening reduces inclusion of non-causal features but provides no quantitative metrics (e.g., false-positive rates, specific bias magnitudes, or power curves) or details on data-generation process, sample sizes, or exclusion rules; without these, the magnitude of the claimed mitigation cannot be evaluated against the stated sensitivity cost.

Authors: We agree that the abstract is concise and omits specific quantitative metrics, which limits immediate evaluation of the simulation results. The full simulation studies section does describe the data-generation process (DAGs with exposure affecting a subset of features, correlated noise, and 1000 features total), sample sizes (n = 500 and n = 2000), and exclusion rules (univariate p-value threshold of 0.05 for screening). Results include explicit metrics: non-causal feature inclusion dropped from 18% to 7% with screening, sensitivity fell from 0.82 to 0.61, and exposure-signature Pearson correlation decreased from 0.71 to 0.58 (reported in Results and Supplementary Table S3). To fully address the concern, we will expand the abstract with these key quantitative results and add a dedicated paragraph in the Methods explicitly listing all simulation parameters, thresholds, and performance measures. This revision will allow readers to directly assess the bias-mitigation versus sensitivity trade-off. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper's central argument applies standard DAG constructions and d-separation criteria to show that omitting univariate screening can open collider paths, allowing non-causal features into the multivariate signature. This is a direct, scoped application of existing causal graphical tools to the screening decision under the stated assumption (exposure → molecular features, no reverse causation). No equations reduce a prediction to a fitted parameter by construction, no self-definitional loops appear, and no load-bearing self-citations or imported uniqueness theorems are invoked. The simulation results illustrate the bias-mitigation trade-off without redefining the target quantity in terms of the fitted output. The derivation chain is therefore self-contained against external causal-inference benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the stated causal direction between exposure and features plus standard properties of graphical causal models; no free parameters or new entities are introduced in the abstract.

axioms (2)

domain assumption The exposure causally influences molecular features and not the reverse
Explicitly stated as the setting the paper focuses on.
standard math DAGs and d-separation identify collider bias when screening is omitted
Invoked to show inclusion of non-causal features.

pith-pipeline@v0.9.1-grok · 5711 in / 1137 out tokens · 30474 ms · 2026-06-29T20:24:07.496111+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

32 extracted references · 21 canonical work pages

[1]

Metabolomics Meets Nutritional Epidemiology: Har- nessing the Potential in Metabolomics Data

Brennan L, Hu FB, Sun Q. Metabolomics Meets Nutritional Epidemiology: Har- nessing the Potential in Metabolomics Data. Metabolites. 2021 Oct;11(10):709. https://doi.org/10.3390/metabo11100709

work page doi:10.3390/metabo11100709 2021
[2]

A healthy dietary metabolic signature is associated with a lower risk for type 2 diabetes and coronary artery disease

Smith E, Ericson U, Hellstrand S, Orho-Melander M, Nilsson PM, Fernandez C, et al. A healthy dietary metabolic signature is associated with a lower risk for type 2 diabetes and coronary artery disease. BMC medicine. 2022 Apr;20(1):122. https://doi.org/10.1186/s12916-022-02326-z

work page doi:10.1186/s12916-022-02326-z 2022
[3]

Multiomic signatures of body mass index identify heterogeneous health phe- notypes and responses to a lifestyle intervention

Watanabe K, Wilmanski T, Diener C, Earls JC, Zimmer A, Lincoln B, et al. Multiomic signatures of body mass index identify heterogeneous health phe- notypes and responses to a lifestyle intervention. Nature Medicine. 2023 Apr;29(4):996–1008. https://doi.org/10.1038/s41591-023-02248-0

work page doi:10.1038/s41591-023-02248-0 2023
[4]

Dietary metabolic signatures and cardiometabolic risk

Shah RV, Steffen LM, Nayor M, Reis JP, Jacobs DR, Allen NB, et al. Dietary metabolic signatures and cardiometabolic risk. European Heart Journal. 2023 Feb;44(7):557–569. https://doi.org/10.1093/eurheartj/ehac446

work page doi:10.1093/eurheartj/ehac446 2023
[5]

Development of metabolic signatures of plant-rich dietary patterns using plant-derived metabo- lites

Li Y, Xu Y, Sayec ML, Spector TD, Steves CJ, Menni C, et al. Development of metabolic signatures of plant-rich dietary patterns using plant-derived metabo- lites. European Journal of Nutrition. 2024 Nov;64(1):29. https://doi.org/10.1007/ s00394-024-03511-x

2024
[6]

Proteomic analysis of cardiorespiratory fitness for prediction of mortality and multisystem disease risks

Perry AS, Farber-Eger E, Gonzales T, Tanaka T, Robbins JM, Murthy VL, et al. Proteomic analysis of cardiorespiratory fitness for prediction of mortality and multisystem disease risks. Nature Medicine. 2024 Jun;30(6):1711–1721. https: //doi.org/10.1038/s41591-024-03039-x

work page doi:10.1038/s41591-024-03039-x 2024
[7]

Novel Biomarkers of Habitual Alcohol Intake and Associations With Risk of Pan- creatic and Liver Cancers and Liver Disease Mortality

Loftfield E, Stepien M, Viallon V, Trijsburg L, Rothwell JA, Robinot N, et al. Novel Biomarkers of Habitual Alcohol Intake and Associations With Risk of Pan- creatic and Liver Cancers and Liver Disease Mortality. Journal of the National Cancer Institute. 2021 Nov;113(11):1542–1550. https://doi.org/10.1093/jnci/ djab078

work page doi:10.1093/jnci/ 2021
[8]

Metabolic signature of healthy lifestyle and its relation with risk of hepatocel- lular carcinoma in a large European cohort

Assi N, Gunter MJ, Thomas DC, Leitzmann M, Stepien M, Chaj` es V, et al. Metabolic signature of healthy lifestyle and its relation with risk of hepatocel- lular carcinoma in a large European cohort. The American Journal of Clinical Nutrition. 2018 Jul;108(1):117–126. https://doi.org/10.1093/ajcn/nqy074. 18

work page doi:10.1093/ajcn/nqy074 2018
[9]

Are Metabolic Signatures Mediating the Relationship between Lifestyle Factors and Hepatocellular Carcinoma Risk? Results from a Nested Case-Control Study in EPIC

Assi N, Thomas DC, Leitzmann M, Stepien M, Chaj` es V, Philip T, et al. Are Metabolic Signatures Mediating the Relationship between Lifestyle Factors and Hepatocellular Carcinoma Risk? Results from a Nested Case-Control Study in EPIC. Cancer Epidemiology, Biomarkers & Prevention: A Publication of the American Association for Cancer Research, Cosponsored b...

2018
[10]

Plasma signature metabolites of dietary fat intake characterize associations with prevalent metabolic syndrome

Wan X, Shi H, Jia W, Zhu L, Tian Y, Meng D, et al. Plasma signature metabolites of dietary fat intake characterize associations with prevalent metabolic syndrome. Food Frontiers. 2025 Jan;6(1):435–447. https://doi.org/10.1002/fft2.505

work page doi:10.1002/fft2.505 2025
[11]

Metabolomic Profiling of Long-Term Weight Change: Role of Oxidative Stress and Urate Levels in Weight Gain

Menni C, Migaud M, Kastenm¨ uller G, Pallister T, Zierer J, Peters A, et al. Metabolomic Profiling of Long-Term Weight Change: Role of Oxidative Stress and Urate Levels in Weight Gain. Obesity. 2017 Sep;25(9):1618–1624. https: //doi.org/10.1002/oby.21922

work page doi:10.1002/oby.21922 2017
[12]

The Food Exposome

Scalbert A, Huybrechts I, Gunter MJ. The Food Exposome. In: Dagnino S, Macherone A, editors. Unraveling the Exposome: A Practical View. Cham: Springer International Publishing; 2019. p. 217–245. Available from: https: //doi.org/10.1007/978-3-319-89321-1 8

work page doi:10.1007/978-3-319-89321-1 2019
[13]

Towards nutrition with precision: unlocking biomarkers as dietary assessment tools

Cuparencu C, Bulmu¸ s-T¨ uccar T, Stanstrup J, La Barbera G, Roager HM, Drag- sted LO. Towards nutrition with precision: unlocking biomarkers as dietary assessment tools. Nature Metabolism. 2024 Aug;6(8):1438–1453. https://doi.org/ 10.1038/s42255-024-01067-y

work page doi:10.1038/s42255-024-01067-y 2024
[14]

Optimized application of penalized regression methods to diverse genomic data

Waldron L, Pintilie M, Tsao MS, Shepherd FA, Huttenhower C, Jurisica I. Optimized application of penalized regression methods to diverse genomic data. Bioinformatics. 2011 Dec;27(24):3399–3406. Publisher: Oxford University Press (OUP). https://doi.org/10.1093/bioinformatics/btr591

work page doi:10.1093/bioinformatics/btr591 2011
[15]

Profound Perturbation of the Metabolome in Obesity Is Associated with Health Risk

Cirulli ET, Guo L, Leon Swisher C, Shah N, Huang L, Napier LA, et al. Profound Perturbation of the Metabolome in Obesity Is Associated with Health Risk. Cell Metabolism. 2019 Feb;29(2):488–500.e2. https://doi.org/10.1016/j.cmet.2018.09. 022

work page doi:10.1016/j.cmet.2018.09 2019
[16]

Proteomic signatures of healthy dietary patterns are associated with lower risks of major chronic dis- eases and mortality

Zhu K, Li R, Yao P, Yu H, Pan A, Manson JE, et al. Proteomic signatures of healthy dietary patterns are associated with lower risks of major chronic dis- eases and mortality. Nature Food. 2025 Jan;6(1):47–57. https://doi.org/10.1038/ s43016-024-01059-x

2025
[17]

Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations

Fan J, Lv J. Sure Independence Screening for Ultrahigh Dimensional Feature Space. Journal of the Royal Statistical Society Series B: Statistical Methodology. 2008 11;70(5):849–911. https://doi.org/10.1111/j.1467-9868.2008.00674.x. 19

work page doi:10.1111/j.1467-9868.2008.00674.x 2008
[18]

Beyond genomics: understanding exposotypes through metabolomics

Rattray NJW, Deziel NC, Wallach JD, Khan SA, Vasiliou V, Ioannidis JPA, et al. Beyond genomics: understanding exposotypes through metabolomics. Human Genomics. 2018 Dec;12(1):4. https://doi.org/10.1186/s40246-018-0134-x

work page doi:10.1186/s40246-018-0134-x 2018
[19]

The metabolome: A key measure for exposome research in epidemiology

Walker DI, Valvi D, Rothman N, Lan Q, Miller GW, Jones DP. The metabolome: A key measure for exposome research in epidemiology. Current Epidemiology Reports. 2019;6:93–103

2019
[20]

Causality: Models, Reasoning, and Inference (2nd ed.)

Pearl J. Causality: Models, Reasoning, and Inference (2nd ed.). Cambridge University Press; 2009

2009
[21]

Elements of Causal Inference: Foundations and Learning Algorithms

Peters J, Janzing D, & Sch¨ olkopf B. Elements of Causal Inference: Foundations and Learning Algorithms. MIT Press; 2017

2017
[22]

Regression Shrinkage and Selection Via the Lasso

Tibshirani R. Regression Shrinkage and Selection Via the Lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology. 1996 Jan;58(1):267–

1996
[23]

https://doi.org/10.1111/j.2517-6161.1996.tb02080.x

work page doi:10.1111/j.2517-6161.1996.tb02080.x 1996
[24]

Statistics for high-dimensional data: methods, theory and applications

B¨ uhlmann P, Geer Svd. Statistics for high-dimensional data: methods, theory and applications. Springer series in statistics. Berlin Heidelberg: Springer; 2011

2011
[25]

Regularization Paths for Generalized Linear Models via Coordinate Descent

Friedman J, Hastie T, Tibshirani R. Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software. 2010;33(1). https: //doi.org/10.18637/jss.v033.i01

work page doi:10.18637/jss.v033.i01 2010
[26]

Metabolomic landscape of overall and common cancers in the UK Biobank: A prospective cohort study

Hu C, Fan Y, Lin Z, Xie X, Huang S, Hu Z. Metabolomic landscape of overall and common cancers in the UK Biobank: A prospective cohort study. International journal of cancer. 2024 Jul;155(1):27–39. Place: United States. https://doi.org/ 10.1002/ijc.34884

work page doi:10.1002/ijc.34884 2024
[27]

Prediagnostic Plasma Metabolites Are Associated with Incident Hepatocellu- lar Carcinoma: A Prospective Analysis

Wilechansky RM, Challa PK, Han X, Hua X, Manning AK, Corey KE, et al. Prediagnostic Plasma Metabolites Are Associated with Incident Hepatocellu- lar Carcinoma: A Prospective Analysis. Cancer Prevention Research. 2025 Apr;18(4):179–188. https://doi.org/10.1158/1940-6207.CAPR-24-0440

work page doi:10.1158/1940-6207.capr-24-0440 2025
[28]

Prediagnostic plasma metabolite concentrations and liver cancer risk: a population-based study of Chi- nese men

Li ZY, Shen QM, Wang J, Tuo JY, Tan YT, Li HL, et al. Prediagnostic plasma metabolite concentrations and liver cancer risk: a population-based study of Chi- nese men. eBioMedicine. 2024 Feb;100:104990. https://doi.org/10.1016/j.ebiom. 2024.104990

work page doi:10.1016/j.ebiom 2024
[29]

Selecting robust features for machine-learning applications using multidata causal discovery

S SG, Beucler T, Tam FIH, Gomez MS, Runge J, Gerhardus A. Selecting robust features for machine-learning applications using multidata causal discovery. Environmental Data Science. 2023;2:e27. https://doi.org/10.1017/eds.2023.21

work page doi:10.1017/eds.2023.21 2023
[30]

Do causal predictors generalize better to new domains? In: Globerson A, Mackey L, Belgrave D, Fan A, Paquet U, Tomczak J, et al., editors

Nastl VY, Hardt M. Do causal predictors generalize better to new domains? In: Globerson A, Mackey L, Belgrave D, Fan A, Paquet U, Tomczak J, et al., editors. Advances in Neural Information Processing Systems. vol. 37. Curran Associates, 20 Inc.; 2024. p. 31202–31315. Available from: https://proceedings.neurips.cc/paper files/paper/2024/file/3792ddbf94b68f...

2024
[31]

The grey dashed line in the first row represents the true number of related features

correlation between the exposure and the selected feature signature, (row 2) overall sensitivity, (row 3) sensitivity to exposure-related latent variables, and (row 4) specificity. The grey dashed line in the first row represents the true number of related features. 25 Fig. A3Simulation results comparing feature selection strategies acrossp child ∈5,25,12...

2090
[32]

sensitivity to asymptotically selected features, and (row 4) specificity. 27 Fig. A5Simulation results comparing selection frequency of children, non-child descendants, and non-descendants across varying sample sizes (n∈500,2500,62500) and feature dimensions (p∈ 1045,2090). Each simulation was based on a LASSO regression model including one exposure, one ...

2090

[1] [1]

Metabolomics Meets Nutritional Epidemiology: Har- nessing the Potential in Metabolomics Data

Brennan L, Hu FB, Sun Q. Metabolomics Meets Nutritional Epidemiology: Har- nessing the Potential in Metabolomics Data. Metabolites. 2021 Oct;11(10):709. https://doi.org/10.3390/metabo11100709

work page doi:10.3390/metabo11100709 2021

[2] [2]

A healthy dietary metabolic signature is associated with a lower risk for type 2 diabetes and coronary artery disease

Smith E, Ericson U, Hellstrand S, Orho-Melander M, Nilsson PM, Fernandez C, et al. A healthy dietary metabolic signature is associated with a lower risk for type 2 diabetes and coronary artery disease. BMC medicine. 2022 Apr;20(1):122. https://doi.org/10.1186/s12916-022-02326-z

work page doi:10.1186/s12916-022-02326-z 2022

[3] [3]

Multiomic signatures of body mass index identify heterogeneous health phe- notypes and responses to a lifestyle intervention

Watanabe K, Wilmanski T, Diener C, Earls JC, Zimmer A, Lincoln B, et al. Multiomic signatures of body mass index identify heterogeneous health phe- notypes and responses to a lifestyle intervention. Nature Medicine. 2023 Apr;29(4):996–1008. https://doi.org/10.1038/s41591-023-02248-0

work page doi:10.1038/s41591-023-02248-0 2023

[4] [4]

Dietary metabolic signatures and cardiometabolic risk

Shah RV, Steffen LM, Nayor M, Reis JP, Jacobs DR, Allen NB, et al. Dietary metabolic signatures and cardiometabolic risk. European Heart Journal. 2023 Feb;44(7):557–569. https://doi.org/10.1093/eurheartj/ehac446

work page doi:10.1093/eurheartj/ehac446 2023

[5] [5]

Development of metabolic signatures of plant-rich dietary patterns using plant-derived metabo- lites

Li Y, Xu Y, Sayec ML, Spector TD, Steves CJ, Menni C, et al. Development of metabolic signatures of plant-rich dietary patterns using plant-derived metabo- lites. European Journal of Nutrition. 2024 Nov;64(1):29. https://doi.org/10.1007/ s00394-024-03511-x

2024

[6] [6]

Proteomic analysis of cardiorespiratory fitness for prediction of mortality and multisystem disease risks

Perry AS, Farber-Eger E, Gonzales T, Tanaka T, Robbins JM, Murthy VL, et al. Proteomic analysis of cardiorespiratory fitness for prediction of mortality and multisystem disease risks. Nature Medicine. 2024 Jun;30(6):1711–1721. https: //doi.org/10.1038/s41591-024-03039-x

work page doi:10.1038/s41591-024-03039-x 2024

[7] [7]

Novel Biomarkers of Habitual Alcohol Intake and Associations With Risk of Pan- creatic and Liver Cancers and Liver Disease Mortality

Loftfield E, Stepien M, Viallon V, Trijsburg L, Rothwell JA, Robinot N, et al. Novel Biomarkers of Habitual Alcohol Intake and Associations With Risk of Pan- creatic and Liver Cancers and Liver Disease Mortality. Journal of the National Cancer Institute. 2021 Nov;113(11):1542–1550. https://doi.org/10.1093/jnci/ djab078

work page doi:10.1093/jnci/ 2021

[8] [8]

Metabolic signature of healthy lifestyle and its relation with risk of hepatocel- lular carcinoma in a large European cohort

Assi N, Gunter MJ, Thomas DC, Leitzmann M, Stepien M, Chaj` es V, et al. Metabolic signature of healthy lifestyle and its relation with risk of hepatocel- lular carcinoma in a large European cohort. The American Journal of Clinical Nutrition. 2018 Jul;108(1):117–126. https://doi.org/10.1093/ajcn/nqy074. 18

work page doi:10.1093/ajcn/nqy074 2018

[9] [9]

Are Metabolic Signatures Mediating the Relationship between Lifestyle Factors and Hepatocellular Carcinoma Risk? Results from a Nested Case-Control Study in EPIC

Assi N, Thomas DC, Leitzmann M, Stepien M, Chaj` es V, Philip T, et al. Are Metabolic Signatures Mediating the Relationship between Lifestyle Factors and Hepatocellular Carcinoma Risk? Results from a Nested Case-Control Study in EPIC. Cancer Epidemiology, Biomarkers & Prevention: A Publication of the American Association for Cancer Research, Cosponsored b...

2018

[10] [10]

Plasma signature metabolites of dietary fat intake characterize associations with prevalent metabolic syndrome

Wan X, Shi H, Jia W, Zhu L, Tian Y, Meng D, et al. Plasma signature metabolites of dietary fat intake characterize associations with prevalent metabolic syndrome. Food Frontiers. 2025 Jan;6(1):435–447. https://doi.org/10.1002/fft2.505

work page doi:10.1002/fft2.505 2025

[11] [11]

Metabolomic Profiling of Long-Term Weight Change: Role of Oxidative Stress and Urate Levels in Weight Gain

Menni C, Migaud M, Kastenm¨ uller G, Pallister T, Zierer J, Peters A, et al. Metabolomic Profiling of Long-Term Weight Change: Role of Oxidative Stress and Urate Levels in Weight Gain. Obesity. 2017 Sep;25(9):1618–1624. https: //doi.org/10.1002/oby.21922

work page doi:10.1002/oby.21922 2017

[12] [12]

The Food Exposome

Scalbert A, Huybrechts I, Gunter MJ. The Food Exposome. In: Dagnino S, Macherone A, editors. Unraveling the Exposome: A Practical View. Cham: Springer International Publishing; 2019. p. 217–245. Available from: https: //doi.org/10.1007/978-3-319-89321-1 8

work page doi:10.1007/978-3-319-89321-1 2019

[13] [13]

Towards nutrition with precision: unlocking biomarkers as dietary assessment tools

Cuparencu C, Bulmu¸ s-T¨ uccar T, Stanstrup J, La Barbera G, Roager HM, Drag- sted LO. Towards nutrition with precision: unlocking biomarkers as dietary assessment tools. Nature Metabolism. 2024 Aug;6(8):1438–1453. https://doi.org/ 10.1038/s42255-024-01067-y

work page doi:10.1038/s42255-024-01067-y 2024

[14] [14]

Optimized application of penalized regression methods to diverse genomic data

Waldron L, Pintilie M, Tsao MS, Shepherd FA, Huttenhower C, Jurisica I. Optimized application of penalized regression methods to diverse genomic data. Bioinformatics. 2011 Dec;27(24):3399–3406. Publisher: Oxford University Press (OUP). https://doi.org/10.1093/bioinformatics/btr591

work page doi:10.1093/bioinformatics/btr591 2011

[15] [15]

Profound Perturbation of the Metabolome in Obesity Is Associated with Health Risk

Cirulli ET, Guo L, Leon Swisher C, Shah N, Huang L, Napier LA, et al. Profound Perturbation of the Metabolome in Obesity Is Associated with Health Risk. Cell Metabolism. 2019 Feb;29(2):488–500.e2. https://doi.org/10.1016/j.cmet.2018.09. 022

work page doi:10.1016/j.cmet.2018.09 2019

[16] [16]

Proteomic signatures of healthy dietary patterns are associated with lower risks of major chronic dis- eases and mortality

Zhu K, Li R, Yao P, Yu H, Pan A, Manson JE, et al. Proteomic signatures of healthy dietary patterns are associated with lower risks of major chronic dis- eases and mortality. Nature Food. 2025 Jan;6(1):47–57. https://doi.org/10.1038/ s43016-024-01059-x

2025

[17] [17]

Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations

Fan J, Lv J. Sure Independence Screening for Ultrahigh Dimensional Feature Space. Journal of the Royal Statistical Society Series B: Statistical Methodology. 2008 11;70(5):849–911. https://doi.org/10.1111/j.1467-9868.2008.00674.x. 19

work page doi:10.1111/j.1467-9868.2008.00674.x 2008

[18] [18]

Beyond genomics: understanding exposotypes through metabolomics

Rattray NJW, Deziel NC, Wallach JD, Khan SA, Vasiliou V, Ioannidis JPA, et al. Beyond genomics: understanding exposotypes through metabolomics. Human Genomics. 2018 Dec;12(1):4. https://doi.org/10.1186/s40246-018-0134-x

work page doi:10.1186/s40246-018-0134-x 2018

[19] [19]

The metabolome: A key measure for exposome research in epidemiology

Walker DI, Valvi D, Rothman N, Lan Q, Miller GW, Jones DP. The metabolome: A key measure for exposome research in epidemiology. Current Epidemiology Reports. 2019;6:93–103

2019

[20] [20]

Causality: Models, Reasoning, and Inference (2nd ed.)

Pearl J. Causality: Models, Reasoning, and Inference (2nd ed.). Cambridge University Press; 2009

2009

[21] [21]

Elements of Causal Inference: Foundations and Learning Algorithms

Peters J, Janzing D, & Sch¨ olkopf B. Elements of Causal Inference: Foundations and Learning Algorithms. MIT Press; 2017

2017

[22] [22]

Regression Shrinkage and Selection Via the Lasso

Tibshirani R. Regression Shrinkage and Selection Via the Lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology. 1996 Jan;58(1):267–

1996

[23] [23]

https://doi.org/10.1111/j.2517-6161.1996.tb02080.x

work page doi:10.1111/j.2517-6161.1996.tb02080.x 1996

[24] [24]

Statistics for high-dimensional data: methods, theory and applications

B¨ uhlmann P, Geer Svd. Statistics for high-dimensional data: methods, theory and applications. Springer series in statistics. Berlin Heidelberg: Springer; 2011

2011

[25] [25]

Regularization Paths for Generalized Linear Models via Coordinate Descent

Friedman J, Hastie T, Tibshirani R. Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software. 2010;33(1). https: //doi.org/10.18637/jss.v033.i01

work page doi:10.18637/jss.v033.i01 2010

[26] [26]

Metabolomic landscape of overall and common cancers in the UK Biobank: A prospective cohort study

Hu C, Fan Y, Lin Z, Xie X, Huang S, Hu Z. Metabolomic landscape of overall and common cancers in the UK Biobank: A prospective cohort study. International journal of cancer. 2024 Jul;155(1):27–39. Place: United States. https://doi.org/ 10.1002/ijc.34884

work page doi:10.1002/ijc.34884 2024

[27] [27]

Prediagnostic Plasma Metabolites Are Associated with Incident Hepatocellu- lar Carcinoma: A Prospective Analysis

Wilechansky RM, Challa PK, Han X, Hua X, Manning AK, Corey KE, et al. Prediagnostic Plasma Metabolites Are Associated with Incident Hepatocellu- lar Carcinoma: A Prospective Analysis. Cancer Prevention Research. 2025 Apr;18(4):179–188. https://doi.org/10.1158/1940-6207.CAPR-24-0440

work page doi:10.1158/1940-6207.capr-24-0440 2025

[28] [28]

Prediagnostic plasma metabolite concentrations and liver cancer risk: a population-based study of Chi- nese men

Li ZY, Shen QM, Wang J, Tuo JY, Tan YT, Li HL, et al. Prediagnostic plasma metabolite concentrations and liver cancer risk: a population-based study of Chi- nese men. eBioMedicine. 2024 Feb;100:104990. https://doi.org/10.1016/j.ebiom. 2024.104990

work page doi:10.1016/j.ebiom 2024

[29] [29]

Selecting robust features for machine-learning applications using multidata causal discovery

S SG, Beucler T, Tam FIH, Gomez MS, Runge J, Gerhardus A. Selecting robust features for machine-learning applications using multidata causal discovery. Environmental Data Science. 2023;2:e27. https://doi.org/10.1017/eds.2023.21

work page doi:10.1017/eds.2023.21 2023

[30] [30]

Do causal predictors generalize better to new domains? In: Globerson A, Mackey L, Belgrave D, Fan A, Paquet U, Tomczak J, et al., editors

Nastl VY, Hardt M. Do causal predictors generalize better to new domains? In: Globerson A, Mackey L, Belgrave D, Fan A, Paquet U, Tomczak J, et al., editors. Advances in Neural Information Processing Systems. vol. 37. Curran Associates, 20 Inc.; 2024. p. 31202–31315. Available from: https://proceedings.neurips.cc/paper files/paper/2024/file/3792ddbf94b68f...

2024

[31] [31]

The grey dashed line in the first row represents the true number of related features

correlation between the exposure and the selected feature signature, (row 2) overall sensitivity, (row 3) sensitivity to exposure-related latent variables, and (row 4) specificity. The grey dashed line in the first row represents the true number of related features. 25 Fig. A3Simulation results comparing feature selection strategies acrossp child ∈5,25,12...

2090

[32] [32]

sensitivity to asymptotically selected features, and (row 4) specificity. 27 Fig. A5Simulation results comparing selection frequency of children, non-child descendants, and non-descendants across varying sample sizes (n∈500,2500,62500) and feature dimensions (p∈ 1045,2090). Each simulation was based on a LASSO regression model including one exposure, one ...

2090