Sparse Deep Additive Model with Interactions: Enhancing Interpretability and Predictability
Pith reviewed 2026-05-21 21:48 UTC · model grok-4.3
The pith
Higher-order interactions leave detectable marginal traces on their variables, allowing a three-stage sparse deep model to recover them reliably even when main effects are absent.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SDAMI shows that higher-order interactions produce detectable marginal traces on their constituent variables, enabling a three-stage procedure of footprint screening, group-lasso disentanglement of main effects from interactions, and modeling of each component with its own deep subnetwork to achieve consistent recovery of complex effect structures in high-dimensional sparse regression.
What carries the argument
The Effect Footprint principle, the claim that higher-order interactions leave detectable marginal traces on their constituent variables outside of measure-zero symmetry cases.
If this is right
- Pure interactions without main effects become recoverable, unlike methods that require heredity.
- Interaction recovery remains consistent because footprints vanish only on a set of measure zero.
- False positive rates for spurious interactions stay near zero across varied simulation designs.
- The model supports flexible nonlinear approximation while preserving sparsity and interpretability.
- The approach applies to small-sample high-dimensional regression problems where standard additive models fall short.
Where Pith is reading between the lines
- The footprint idea might be tested on genomic or neuroimaging data where known interaction networks exist.
- Screening could be adapted to other penalties or to survival and classification outcomes.
- If footprints prove robust, the same logic might help detect interactions in time-series or spatial settings.
- Combining this screening with modern variable-importance tools could further reduce the search space for deep models.
Load-bearing premise
Higher-order interactions always produce detectable marginal traces on their variables except in rare symmetric cases.
What would settle it
A controlled simulation or dataset containing a pure interaction term whose marginal effects on the constituent variables are statistically indistinguishable from noise would show that the three-stage recovery procedure fails to identify it.
Figures
read the original abstract
Recent advances in deep learning highlight the need for personalized models that can learn from small samples, handle high-dimensional features, and remain interpretable. To address this, we propose the Sparse Deep Additive Model with Interactions (SDAMI), a framework that combines sparsity-driven feature selection with deep subnetworks for flexible function approximation. Central to SDAMI is the Effect Footprint principle, which posits that higher-order interactions leave detectable marginal traces on constituent variables, enabling their discovery without exhaustive search. SDAMI executes this principle through a three-stage strategy: (1) screening for footprint variables, (2) disentangling main effects from interactions via group lasso, and (3) modeling components with dedicated deep subnetworks. Theoretical analysis confirms that footprints vanish only under measure-zero symmetry conditions that are rare in practice, ensuring consistent interaction recovery. Extensive simulations demonstrate that SDAMI successfully identifies pure interactions that heredity-based baselines fundamentally miss, recovering complex effect structures with near-zero false positive rates. Together, these results position SDAMI as a principled framework for interpretable high-dimensional regression.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes the Sparse Deep Additive Model with Interactions (SDAMI) for interpretable high-dimensional regression. It introduces the Effect Footprint principle, asserting that higher-order interactions produce detectable marginal traces on constituent variables except under measure-zero symmetry conditions. SDAMI uses a three-stage process: screening for footprint variables, disentangling main effects and interactions via group lasso, and modeling with deep subnetworks. Theoretical analysis is claimed to support consistent interaction recovery, and simulations show it identifies pure interactions missed by heredity-based methods with near-zero false positive rates.
Significance. If the central claims hold, SDAMI offers a novel approach to balancing flexibility of deep learning with interpretability and sparsity in small-sample high-dimensional settings. The ability to recover interactions without exhaustive search or strict heredity assumptions could advance personalized modeling. The simulations provide evidence of practical utility over baselines, though verification of the theoretical guarantees is needed.
major comments (1)
- [Theoretical Analysis] Theoretical Analysis: The assertion that footprints vanish only under measure-zero symmetry conditions does not address the finite-sample power of the screening stage (stage 1). In high-dimensional regimes, even with nonzero population marginal effects, variance in marginal estimates or multiple-testing issues could cause relevant variables to be screened out, preventing the subsequent disentangling and subnetwork fitting from recovering the interactions. This is a load-bearing concern for the reliability of the three-stage strategy.
minor comments (2)
- [Simulations] Simulations section: The abstract mentions extensive simulations with near-zero false positive rates, but specific details on data generation, sample sizes, dimensions, error bars, and exact comparison metrics to baselines would strengthen the presentation.
- [Method] Method description: The group lasso regularization parameters are free parameters; clarification on how they are chosen or tuned in practice would be helpful for reproducibility.
Simulated Author's Rebuttal
We thank the referee for their careful reading and constructive feedback on our manuscript. We value the opportunity to clarify the scope of our theoretical results and to strengthen the presentation of the three-stage procedure. Below we respond point-by-point to the single major comment.
read point-by-point responses
-
Referee: The assertion that footprints vanish only under measure-zero symmetry conditions does not address the finite-sample power of the screening stage (stage 1). In high-dimensional regimes, even with nonzero population marginal effects, variance in marginal estimates or multiple-testing issues could cause relevant variables to be screened out, preventing the subsequent disentangling and subnetwork fitting from recovering the interactions. This is a load-bearing concern for the reliability of the three-stage strategy.
Authors: We agree that the current theoretical analysis focuses on population-level properties and asymptotic consistency: we prove that the marginal footprints are nonzero except on a measure-zero set of symmetric distributions, which guarantees that the screening stage recovers the relevant variables with probability approaching one as n grows. Explicit finite-sample power bounds that account for estimation variance and multiple-testing corrections in the high-dimensional regime are not derived in the present version. To address this concern we will revise the theoretical section to include a brief discussion of finite-sample behavior, supported by concentration inequalities for the marginal estimators used in Stage 1 and by additional simulation diagnostics that quantify screening error rates under the exact high-dimensional small-sample regimes examined in the paper. These revisions will make the load-bearing role of the screening stage more transparent while leaving the core asymptotic guarantees and empirical findings unchanged. revision: yes
Circularity Check
No significant circularity; Effect Footprint is posited as an independent principle with separate theoretical support.
full rationale
The paper introduces the Effect Footprint principle as a posited assumption that higher-order interactions produce detectable marginal traces except on a measure-zero set of symmetries. It then states that theoretical analysis confirms the vanishing property and that simulations show recovery performance. No equation or step reduces the principle to a fitted parameter, a self-citation chain, or a renaming of the method's own output. The three-stage procedure (screening, group-lasso disentangling, deep subnetworks) is presented as an implementation of the principle rather than a redefinition of it. The derivation chain therefore remains self-contained against external benchmarks and does not collapse by construction.
Axiom & Free-Parameter Ledger
free parameters (1)
- Group lasso regularization parameters
axioms (1)
- domain assumption Higher-order interactions leave detectable marginal traces on constituent variables except under measure-zero symmetry conditions
invented entities (1)
-
Effect Footprint
no independent evidence
Reference graph
Works this paper leans on
-
[1]
" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...
-
[2]
Neural additive models: Interpretable machine learning with neural nets
Rishabh Agarwal, Levi Melnick, Nicholas Frosst, Xuezhou Zhang, Ben Lengerich, Rich Caruana, and Geoffrey E Hinton. Neural additive models: Interpretable machine learning with neural nets. Advances in neural information processing systems, 34: 0 4699--4711, 2021
work page 2021
-
[3]
Andrew R. Barron. Universal approximation bounds for superpositions of a sigmoidal function. IEEE Transactions on Information Theory, 39 0 (3): 0 930--945, 1993. doi:10.1109/18.256500
-
[4]
Peter L. Bartlett and Shahar Mendelson. Rademacher and gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3: 0 463--482, 2002
work page 2002
-
[5]
A lasso for hierarchical interactions
Jacob Bien, Jonathan Taylor, and Robert Tibshirani. A lasso for hierarchical interactions. Annals of statistics, 41 0 (3): 0 1111, 2013
work page 2013
-
[6]
Eugenio Cesario, Carmela Comito, and Ester Zumpano. A survey of the recent trends in deep learning for literature based discovery in the biomedical domain. Neurocomputing, 568: 0 127079, 2024
work page 2024
-
[7]
Gary S Collins, Karel GM Moons, Paula Dhiman, Richard D Riley, Andrew L Beam, Ben Van Calster, Marzyeh Ghassemi, Xiaoxuan Liu, Johannes B Reitsma, Maarten Van Smeden, et al. Tripod+ ai statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. bmj, 385, 2024
work page 2024
-
[8]
Sure independence screening for ultrahigh dimensional feature space
Jianqing Fan and Jinchi Lv. Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society Series B: Statistical Methodology, 70 0 (5): 0 849--911, 2008
work page 2008
-
[9]
Nonparametric independence screening in sparse ultra-high-dimensional additive models
Jianqing Fan, Yang Feng, and Rui Song. Nonparametric independence screening in sparse ultra-high-dimensional additive models. Journal of the American Statistical Association, 106 0 (494): 0 544--557, 2011 a
work page 2011
-
[10]
Sparse high-dimensional models in economics
Jianqing Fan, Jinchi Lv, and Lei Qi. Sparse high-dimensional models in economics. Annu. Rev. Econ., 3 0 (1): 0 291--317, 2011 b
work page 2011
-
[11]
Data Mining, Inference, and Prediction
Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York, 2nd edition, 2009. doi:10.1007/978-0-387-84858-7
-
[12]
Tong He, Ru Kong, Avram J Holmes, Minh Nguyen, Mert R Sabuncu, Simon B Eickhoff, Danilo Bzdok, Jiashi Feng, and BT Thomas Yeo. Deep neural networks and kernel regression achieve comparable accuracies for functional connectivity prediction of behavior and demographics. NeuroImage, 206: 0 116276, 2020
work page 2020
-
[13]
Extraction of wearout model parameters using on-line test of an sram
Shu-Han Hsu, Ying-Yuan Huang, Yi-Da Wu, Kexin Yang, Li-Hsiang Lin, and Linda Milor. Extraction of wearout model parameters using on-line test of an sram. Microelectronics Reliability, 114: 0 113756, 2020
work page 2020
-
[14]
Deep p-spline: Theory, fast tuning, and application
Noah Yi-Ting Hung, Li-Hsiang Lin, and Vince D Calhoun. Deep p-spline: Theory, fast tuning, and application. arXiv preprint arXiv:2501.01376, 2025
-
[15]
Kewal K Jain. Personalized medicine. Current opinion in molecular therapeutics, 4 0 (6): 0 548--558, 2002
work page 2002
-
[16]
Maximum projection designs for computer experiments
V Roshan Joseph, Evren Gul, and Shan Ba. Maximum projection designs for computer experiments. Biometrika, 102: 0 371--380, 2015
work page 2015
-
[17]
Identifying natural images from human brain activity
Kendrick N Kay, Thomas Naselaris, Ryan J Prenger, and Jack L Gallant. Identifying natural images from human brain activity. Nature, 452 0 (7185): 0 352--355, 2008
work page 2008
-
[18]
Lassonet: A neural network with feature sparsity
Ismael Lemhadri, Feng Ruan, Louis Abraham, and Robert Tibshirani. Lassonet: A neural network with feature sparsity. Journal of Machine Learning Research, 22 0 (127): 0 1--29, 2021
work page 2021
-
[19]
Yang Li. sodavis: SODA: Main and Interaction Effects Selection for Logistic Regression, Quadratic Discriminant and General Index Models. R Foundation for Statistical Computing, 2015. URL https://cran.r-project.org/web/packages/sodavis/
work page 2015
-
[20]
Learning interactions via hierarchical group-lasso regularization
Michael Lim and Trevor Hastie. Learning interactions via hierarchical group-lasso regularization. Journal of Computational and Graphical Statistics, 24 0 (3): 0 627--654, 2015
work page 2015
-
[21]
Karim Lounici, Massimiliano Pontil, Sara van de Geer, and Alexandre B. Tsybakov. Oracle inequalities and optimal inference under group sparsity. Annals of Statistics, 39 0 (4): 0 2164--2204, 2011. URL https://doi.org/10.1214/11-AOS896
-
[22]
Foundations of Machine Learning
Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar. Foundations of Machine Learning. MIT Press, 2nd edition, 2018
work page 2018
-
[23]
Interpretable machine learning
Christoph Molnar. Interpretable machine learning. Lulu. com, 2020
work page 2020
-
[24]
A unified framework for high-dimensional analysis of m -estimators with decomposable regularizers
Sahand Negahban, Bin Yu, Martin J Wainwright, and Pradeep Ravikumar. A unified framework for high-dimensional analysis of m -estimators with decomposable regularizers. Advances in neural information processing systems, 22, 2009
work page 2009
-
[25]
Machine learning methods in drug discovery
Lauv Patel, Tripti Shukla, Xiuzhen Huang, David W Ussery, and Shanzhi Wang. Machine learning methods in drug discovery. Molecules, 25 0 (22): 0 5277, 2020
work page 2020
-
[26]
Pradeep Ravikumar, John Lafferty, Han Liu, and Larry Wasserman. Sparse additive models. Journal of the Royal Statistical Society Series B: Statistical Methodology, 71 0 (5): 0 1009--1030, 2009
work page 2009
-
[27]
The Design and Analysis of Computer Experiments (2nd Edition)
Thomas J Santner, Brian J Williams, and William I Notz. The Design and Analysis of Computer Experiments (2nd Edition). New York, NW: Springer, 2019
work page 2019
-
[28]
Group sparse regularization for deep neural networks
Simone Scardapane, Danilo Comminiello, Amir Hussain, and Aurelio Uncini. Group sparse regularization for deep neural networks. Neurocomputing, 241: 0 81--89, 2017
work page 2017
-
[29]
The Annals of Statistics 48(4):1875--1897, ://dx.doi.org/10.1214/19-AOS1875
Johannes Schmidt-Hieber. Nonparametric regression using deep neural networks with relu activation function. Annals of Statistics, 48 0 (4): 0 1875--1897, 2020. doi:10.1214/19-AOS1875
-
[30]
Modelling interactions in high-dimensional data with backtracking
Rajen D Shah. Modelling interactions in high-dimensional data with backtracking. Journal of Machine Learning Research, 17 0 (207): 0 1--31, 2016
work page 2016
-
[31]
Noah Simon, Jerome Friedman, Trevor Hastie, and Robert Tibshirani. A sparse-group lasso. Journal of computational and graphical statistics, 22 0 (2): 0 231--245, 2013
work page 2013
-
[32]
Global sensitivity indices for nonlinear mathematical models and their monte carlo estimates
Ilya M Sobol. Global sensitivity indices for nonlinear mathematical models and their monte carlo estimates. Mathematics and computers in simulation, 55 0 (1-3): 0 271--280, 2001
work page 2001
-
[33]
On sensitivity estimation for nonlinear mathematical models
Il'ya Meerovich Sobol'. On sensitivity estimation for nonlinear mathematical models. Matematicheskoe modelirovanie, 2 0 (1): 0 112--118, 1990
work page 1990
-
[34]
Personalised medicine—implementation to the healthcare system in europe (focus group discussions)
Dorota Stefanicka-Wojtas and Donata Kurpas. Personalised medicine—implementation to the healthcare system in europe (focus group discussions). Journal of personalized medicine, 13 0 (3): 0 380, 2023
work page 2023
-
[35]
Adaptivity of deep relu network for learning in besov and mixed smooth besov spaces
Taiji Suzuki. Adaptivity of deep relu network for learning in besov and mixed smooth besov spaces. In Proceedings of the 36th International Conference on Machine Learning (ICML), volume 97 of Proceedings of Machine Learning Research, pp.\ 11692--11702. PMLR, 2019
work page 2019
-
[36]
Sara A. van de Geer. Empirical Processes in M-Estimation. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 2000
work page 2000
-
[37]
Explainable Neural Networks based on Additive Index Models
Joel Vaughan, Agus Sudjianto, Erind Brahimi, Jie Chen, and Vijayan N Nair. Explainable neural networks based on additive index models. arXiv preprint arXiv:1806.01933, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[38]
Nonparametric sparse hierarchical models describe v1 fmri responses to natural images
Vincent Q Vu, Bin Yu, Thomas Naselaris, Kendrick Kay, Jack Gallant, and Pradeep Ravikumar. Nonparametric sparse hierarchical models describe v1 fmri responses to natural images. Advances in Neural Information Processing Systems, 21, 2008
work page 2008
-
[39]
Hybrid predictive models: When an interpretable model collaborates with a black-box model
Tong Wang and Qihang Lin. Hybrid predictive models: When an interpretable model collaborates with a black-box model. Journal of Machine Learning Research, 22 0 (137): 0 1--38, 2021
work page 2021
-
[40]
Learning structured sparsity in deep neural networks
Wei Wen, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. Learning structured sparsity in deep neural networks. Advances in neural information processing systems, 29, 2016
work page 2016
-
[41]
Experiments: planning, analysis, and optimization
CF Jeff Wu and Michael S Hamada. Experiments: planning, analysis, and optimization. John Wiley and Sons, 2011
work page 2011
-
[42]
Sparse neural additive model: Interpretable deep learning with feature selection via group sparsity
Shiyun Xu, Zhiqi Bu, Pratik Chaudhari, and Ian J Barnett. Sparse neural additive model: Interpretable deep learning with feature selection via group sparsity. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp.\ 343--359. Springer, 2023
work page 2023
-
[43]
Kexin Yang, Taizhi Liu, Rui Zhang, Dae-Hyun Kim, and Linda Milor. Front-end of line and middle-of-line time-dependent dielectric breakdown reliability simulator for logic circuits. Microelectronics Reliability, 76: 0 81--86, 2017
work page 2017
-
[44]
Zebin Yang, Aijun Zhang, and Agus Sudjianto. Gami-net: An explainable neural network based on generalized additive models with structured interactions. Pattern Recognition, 120: 0 108192, 2021
work page 2021
-
[45]
Error bounds for approximations with deep relu networks
Dmitry Yarotsky. Error bounds for approximations with deep relu networks. Neural Networks, 94: 0 103--114, 2017. doi:10.1016/j.neunet.2017.07.005
-
[46]
Model selection and estimation in regression with grouped variables
Ming Yuan and Yi Lin. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society Series B: Statistical Methodology, 68 0 (1): 0 49--67, 2006
work page 2006
-
[47]
Structured variable selection and estimation
Ming Yuan, V Roshan Joseph, and Hui Zou. Structured variable selection and estimation. The Annals of Applied Statistics, pp.\ 1738--1757, 2009
work page 2009
-
[48]
The composite absolute penalties family for grouped and hierarchical variable selection
Peng Zhao, Guilherme Rocha, and Bin Yu. The composite absolute penalties family for grouped and hierarchical variable selection. 2009
work page 2009
-
[49]
Learning discriminative bayesian networks from high-dimensional continuous neuroimaging data
Luping Zhou, Lei Wang, Lingqiao Liu, Philip Ogunbona, and Dinggang Shen. Learning discriminative bayesian networks from high-dimensional continuous neuroimaging data. IEEE transactions on pattern analysis and machine intelligence, 38 0 (11): 0 2269--2283, 2015
work page 2015
-
[50]
\@ifxundefined[1] #1\@undefined \@firstoftwo \@secondoftwo \@ifnum[1] #1 \@firstoftwo \@secondoftwo \@ifx[1] #1 \@firstoftwo \@secondoftwo [2] @ #1 \@temptokena #2 #1 @ \@temptokena \@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command n...
-
[51]
\@lbibitem[] @bibitem@first@sw\@secondoftwo \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 @tmp #1 NAT@b@open@#2 NAT@b@shut@#2 \@ifnum @merge>\@ne @bibitem@first@sw \@firstoftwo \@ifundefined NAT@b*@#2 \@firstoftwo @num @NAT@ctr \@secondoft...
-
[52]
@open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.