Quantifying Confounding Bias in Neuroimaging Datasets with Causal Inference
Pith reviewed 2026-05-25 00:23 UTC · model grok-4.3
The pith
Finding the simplest causal graphical model via minimum description length separates confounding biases from true causal effects in pooled neuroimaging datasets.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By approximating Kolmogorov complexity with the minimum description length principle, the simplest graphical model can be identified in a dataset of 12,207 MRI scans from 15 studies, enabling the quantification of confounding bias and the estimation of plausible causal relationships between variables in neuroimaging data.
What carries the argument
The minimum description length principle as an approximation to Kolmogorov complexity for selecting the causal graphical model with the lowest complexity from pooled scans.
If this is right
- Pooling without correction allows models to learn dataset-specific artifacts instead of biological signals.
- The recovered graphs can quantify the extent of confounding present in any single combined dataset.
- Empirical tests on real data produce plausible causal estimates that distinguish study-specific biases from true effects.
- This supplies a fully data-driven procedure for identifying which variables act as confounders versus causes.
Where Pith is reading between the lines
- The same selection procedure could be tested on simulated data with fully known ground-truth graphs to measure recovery accuracy directly.
- If the approach works, it suggests a general preprocessing step for any multi-site medical dataset where site effects might masquerade as signals.
- The method might be extended to longitudinal or multi-modal imaging collections where confounding structures are even more complex.
- One could check whether the identified causal subgraphs improve downstream prediction performance on held-out tasks compared with uncorrected pooling.
Load-bearing premise
The minimum description length provides a sufficiently accurate approximation to Kolmogorov complexity to recover the true causal graphical model separating confounding from causal factors in pooled neuroimaging data.
What would settle it
Apply the method to a pooled dataset where known confounding factors have been deliberately injected and check whether the recovered graph correctly isolates those confounding edges rather than attributing them to causal links.
Figures
read the original abstract
Neuroimaging datasets keep growing in size to address increasingly complex medical questions. However, even the largest datasets today alone are too small for training complex machine learning models. A potential solution is to increase sample size by pooling scans from several datasets. In this work, we combine 12,207 MRI scans from 15 studies and show that simple pooling is often ill-advised due to introducing various types of biases in the training data. First, we systematically define these biases. Second, we detect bias by experimentally showing that scans can be correctly assigned to their respective dataset with 73.3% accuracy. Finally, we propose to tell causal from confounding factors by quantifying the extent of confounding and causality in a single dataset using causal inference. We achieve this by finding the simplest graphical model in terms of Kolmogorov complexity. As Kolmogorov complexity is not directly computable, we employ the minimum description length to approximate it. We empirically show that our approach is able to estimate plausible causal relationships from real neuroimaging data.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that pooling neuroimaging datasets from multiple studies introduces confounding biases, which can be detected by a classifier assigning scans to their source datasets at 73.3% accuracy. It proposes quantifying causal versus confounding factors via the minimum description length (MDL) principle as a proxy for Kolmogorov complexity to recover the simplest graphical model, and reports that this yields plausible causal relationships on a pooled set of 12,207 MRI scans from 15 studies.
Significance. If the MDL-based procedure can be shown to recover ground-truth causal structure, the work would offer a practical information-theoretic tool for diagnosing dataset biases in pooled neuroimaging data and improving downstream machine-learning reliability. The emphasis on Kolmogorov complexity and MDL for causal discovery is a conceptually interesting direction, though its empirical grounding remains limited to real-data plausibility checks.
major comments (2)
- [Abstract] Abstract: the central claim that MDL minimization recovers the true causal graphical model separating confounding from causal factors rests on an untested assumption that the chosen encoding yields a description length whose minimum coincides with the generating DAG; no synthetic benchmarks with injected confounders and known ground-truth structure are described, so it is impossible to distinguish recovery of causality from selection of a parsimonious but non-causal factorization.
- [Abstract] Abstract: model selection is performed by minimizing description length on the identical dataset whose causal structure is being recovered, without mention of held-out validation, external benchmarks, or statistical controls; this circularity risks selecting models that merely fit observed correlations rather than independent causal mechanisms.
Simulated Author's Rebuttal
We are grateful to the referee for the detailed review and constructive criticism. The points raised regarding the validation of our MDL-based causal inference method are well-taken. We provide point-by-point responses below.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that MDL minimization recovers the true causal graphical model separating confounding from causal factors rests on an untested assumption that the chosen encoding yields a description length whose minimum coincides with the generating DAG; no synthetic benchmarks with injected confounders and known ground-truth structure are described, so it is impossible to distinguish recovery of causality from selection of a parsimonious but non-causal factorization.
Authors: The referee correctly identifies that our manuscript lacks synthetic benchmarks with known ground-truth causal structures. While the MDL principle provides a theoretical basis for preferring causal models through simplicity, we agree that empirical validation on synthetic data would be necessary to confirm that the minimum description length corresponds to the true generating DAG rather than an alternative parsimonious model. This represents a gap in the current presentation. We will incorporate synthetic experiments with injected confounders in the revised version of the manuscript. revision: yes
-
Referee: [Abstract] Abstract: model selection is performed by minimizing description length on the identical dataset whose causal structure is being recovered, without mention of held-out validation, external benchmarks, or statistical controls; this circularity risks selecting models that merely fit observed correlations rather than independent causal mechanisms.
Authors: We acknowledge that the model selection via MDL is conducted on the same dataset used for inference, which is standard practice in many causal discovery algorithms but does introduce the risk highlighted by the referee. Our encodings are designed to capture domain knowledge from neuroimaging, but to address the concern of circularity, we will add held-out validation procedures and additional controls in the revised manuscript to demonstrate that the selected models capture causal mechanisms beyond mere correlations. revision: yes
Circularity Check
No circularity; method is a direct MDL application without reduction to inputs
full rationale
The paper proposes selecting the simplest graphical model via minimum description length as a proxy for Kolmogorov complexity to separate causal from confounding factors in pooled neuroimaging data. This is a methodological choice grounded in the MDL principle, with the claim that the resulting model yields plausible causal relationships demonstrated empirically on real data. No derivation chain reduces a claimed prediction or result to its own fitted inputs by construction, no self-citation is load-bearing for the central premise, and no ansatz or uniqueness theorem is imported from prior author work. The approach is self-contained as an application of existing information-theoretic model selection; concerns about validation on synthetic ground-truth data pertain to empirical correctness rather than circularity in the stated procedure.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Minimum description length approximates Kolmogorov complexity sufficiently well to identify the true causal graphical model
Reference graph
Works this paper leans on
- [1]
-
[2]
Buckner, R., Hollinshead, M., Holmes, A., Brohawn, D., Fagerness, J., O’Keefe, T., Roffman, J.: The brain genomics superstruct project. HDN (2012)
work page 2012
-
[3]
Molecular psychiatry 19(6), 659–667 (2014)
Di Martino, A., Yan, C., et al.: The autism brain imaging data exchange: towards a large-scale evaluation of the intrinsic brain architecture in autism. Molecular psychiatry 19(6), 659–667 (2014)
work page 2014
-
[4]
Dukart, J., Schroeter, M.L., Mueller, K.: Age correction in dementia–matching to a healthy brain. PloS one 6(7), e22193 (2011)
work page 2011
-
[5]
International Psychogeriatrics 21(04), 672–687 (2009)
Ellis, K., Bush, A., Darby, D., et al.: The australian imaging, biomarkers and lifestyle (aibl) study of aging. International Psychogeriatrics 21(04), 672–687 (2009)
work page 2009
-
[6]
Fischl, B., Salat, D.H., et al.: Whole brain segmentation: automated labeling of neuroanatomical structures in the human brain. Neuron 33(3), 341–355 (2002)
work page 2002
-
[7]
Neuroimage 167, 104–120 (2018)
Fortin, J.P., Cullen, N., et al.: Harmonization of cortical thickness measurements across scanners and sites. Neuroimage 167, 104–120 (2018)
work page 2018
-
[8]
Neuroinformatics 11(3), 367–388 (2013)
Gollub, R.L., Shoemaker, J., King, M., White, T., Ehrlich, S., Sponheim, S., Clark, V., Turner, J., Mueller, B., Magnotta, V., et al.: The mcic collection: a shared repository of multi-modal, multi-site brain image data from a clinical investigation of schizophrenia. Neuroinformatics 11(3), 367–388 (2013)
work page 2013
-
[9]
Brain imaging and behavior 11(5), 1497–1514 (2017)
Guadalupe, T., Mathias, S.R., Theo, G., et al.: Human subcortical brain asymme- tries in 15,847 people worldwide reveal effects of age and sex. Brain imaging and behavior 11(5), 1497–1514 (2017)
work page 2017
-
[10]
IEEE TMI 26(4), 479–486 (2007)
Han, X., Fischl, B.: Atlas renormalization for improved brain mr image segmenta- tion across scanner platforms. IEEE TMI 26(4), 479–486 (2007)
work page 2007
-
[11]
Journal of magnetic resonance imaging 27(4), 685–691 (2008)
Jack, C.R., Bernstein, M.A., Fox, N.C., Thompson, P., et al.: The alzheimer’s disease neuroimaging initiative (adni): Mri methods. Journal of magnetic resonance imaging 27(4), 685–691 (2008)
work page 2008
-
[12]
In: SIAM International Conference on Data Mining (2019)
Kaltenpoth, D., Vreeken, J.: We are not your real parents: Telling causal from confounded by mdl. In: SIAM International Conference on Data Mining (2019)
work page 2019
-
[13]
Neuroimage 49(3), 2123–2133 (2010)
Kruggel, F., Turner, J., Muftuler, L.T.: Impact of scanner hardware and imaging protocol on image quality and compartment volume precision in the adni cohort. Neuroimage 49(3), 2123–2133 (2010)
work page 2010
-
[14]
The Journal of Machine Learning Research 18(1), 430–474 (2017)
Kucukelbir, A., Tran, D., et al.: Automatic differentiation variational inference. The Journal of Machine Learning Research 18(1), 430–474 (2017)
work page 2017
-
[15]
The inter- national journal of biostatistics 12(1), 31–44 (2016)
Linn, K.A., Gaonkar, B., Doshi, J., Davatzikos, C., Shinohara, R.T.: Addressing confounding in predictive models with an application to neuroimaging. The inter- national journal of biostatistics 12(1), 31–44 (2016)
work page 2016
-
[16]
Marcus, D.S., Wang, T.H., Parker, J., Csernansky, J.G., Morris, J.C., Buckner, R.L.: Open access series of imaging studies (oasis): cross-sectional mri data in young, middle aged, nondemented, and demented older adults. J. Cognitive Neu- rosci. 19(9), 1498–1507 (2007)
work page 2007
-
[17]
Progress in neurobiology 95(4), 629–635 (2011)
Marek, K., Jennings, D., Lasch, S., Siderowf, A., Tanner, C., Simuni, T., Coffey, C., Kieburtz, K., Flagg, E., Chowdhury, S., et al.: The parkinson progression marker initiative (ppmi). Progress in neurobiology 95(4), 629–635 (2011)
work page 2011
-
[18]
Human brain mapping 34(9), 2302–2312 (2013) Quantifying Confounding Bias in Neuroimaging Datasets 9
Mayer, A., Ruhl, D., Merideth, F., Ling, J., Hanlon, F., Bustillo, J., Ca˜ nive, J.: Functional imaging of the hemodynamic sensory gating response in schizophrenia. Human brain mapping 34(9), 2302–2312 (2013) Quantifying Confounding Bias in Neuroimaging Datasets 9
work page 2013
-
[19]
Frontiers in systems neuroscience 6, 62 (2012)
Milham, M.P., Fair, D., Mennes, M., Mostofsky, S.H., et al.: The adhd-200 con- sortium: a model to advance the translational potential of neuroimaging in clinical neuroscience. Frontiers in systems neuroscience 6, 62 (2012)
work page 2012
-
[20]
Frontiers in neuroscience 6 (2012)
Nooner, K.B., et al.: The nki-rockland sample: a model for accelerating the pace of discovery science in psychiatry. Frontiers in neuroscience 6 (2012)
work page 2012
-
[21]
Rao, A., Monteiro, J.M., Mourao-Miranda, J.: Predictive modelling using neu- roimaging data in the presence of confounds. NeuroImage 150, 23–49 (2017)
work page 2017
-
[22]
Smith, S.M., Nichols, T.E.: Statistical challenges in ”big data” human neuroimag- ing. Neuron 97(2), 263–268 (2018)
work page 2018
-
[23]
In: Computer Vision and Pattern Recognition (CVPR)
Torralba, A., Efros, A.A.: Unbiased look at dataset bias. In: Computer Vision and Pattern Recognition (CVPR). pp. 1521–1528 (2011)
work page 2011
-
[24]
Van Essen, D.C., Smith, S.M., Barch, D.M., Behrens, T., Yacoub, E., Ugurbil, K., Consortium, W.M.H., et al.: The wu-minn human connectome project: an overview. Neuroimage 80, 62–79 (2013)
work page 2013
-
[25]
Neuroimage 139, 470–479 (2016)
Wachinger, C., Reuter, M.: Domain adaptation for alzheimer’s disease diagnostics. Neuroimage 139, 470–479 (2016)
work page 2016
-
[26]
Scientific data 1, 140049 (2014)
Zuo, X.N., Anderson, J.S., Bellec, P., et al.: An open science resource for estab- lishing reliability and reproducibility in functional connectomics. Scientific data 1, 140049 (2014)
work page 2014
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.