Sparse Deep Additive Model with Interactions: Enhancing Interpretability and Predictability

Li-Hsiang Lin; Vince D. Calhoun; Yi-Ting Hung

arxiv: 2509.23068 · v2 · pith:75FPXQH5new · submitted 2025-09-27 · 📊 stat.ML · cs.LG

Sparse Deep Additive Model with Interactions: Enhancing Interpretability and Predictability

Yi-Ting Hung , Li-Hsiang Lin , Vince D. Calhoun This is my paper

Pith reviewed 2026-05-21 21:48 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords sparse deep modelsinteraction detectionhigh-dimensional regressionadditive modelseffect footprintgroup lassointerpretabilitysmall-sample learning

0 comments

The pith

Higher-order interactions leave detectable marginal traces on their variables, allowing a three-stage sparse deep model to recover them reliably even when main effects are absent.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces SDAMI to build interpretable models for high-dimensional data with small samples by combining sparsity selection with deep subnetworks for flexible fits. It rests on the Effect Footprint principle, which says interactions imprint measurable marginal signals on the variables involved. A screening step finds candidate variables, group lasso separates main effects from interactions, and dedicated subnetworks model each piece. Theory shows these footprints disappear only in rare symmetric cases that have measure zero, so recovery stays consistent in practice. Simulations confirm the method catches pure interactions that heredity rules miss while keeping false positives near zero.

Core claim

SDAMI shows that higher-order interactions produce detectable marginal traces on their constituent variables, enabling a three-stage procedure of footprint screening, group-lasso disentanglement of main effects from interactions, and modeling of each component with its own deep subnetwork to achieve consistent recovery of complex effect structures in high-dimensional sparse regression.

What carries the argument

The Effect Footprint principle, the claim that higher-order interactions leave detectable marginal traces on their constituent variables outside of measure-zero symmetry cases.

If this is right

Pure interactions without main effects become recoverable, unlike methods that require heredity.
Interaction recovery remains consistent because footprints vanish only on a set of measure zero.
False positive rates for spurious interactions stay near zero across varied simulation designs.
The model supports flexible nonlinear approximation while preserving sparsity and interpretability.
The approach applies to small-sample high-dimensional regression problems where standard additive models fall short.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The footprint idea might be tested on genomic or neuroimaging data where known interaction networks exist.
Screening could be adapted to other penalties or to survival and classification outcomes.
If footprints prove robust, the same logic might help detect interactions in time-series or spatial settings.
Combining this screening with modern variable-importance tools could further reduce the search space for deep models.

Load-bearing premise

Higher-order interactions always produce detectable marginal traces on their variables except in rare symmetric cases.

What would settle it

A controlled simulation or dataset containing a pure interaction term whose marginal effects on the constituent variables are statistically indistinguishable from noise would show that the three-stage recovery procedure fails to identify it.

Figures

Figures reproduced from arXiv: 2509.23068 by Li-Hsiang Lin, Vince D. Calhoun, Yi-Ting Hung.

**Figure 2.** Figure 2: The SDAMI architecture. Screening identifies both main and footprint variables, which [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: (Case 3) The three figures on the left: Estimated (red dashed lines) versus true additive [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 5.** Figure 5: (V1 Cell Dataset) Upper panel: the predicted Marginal main effects (solid black dots); lower panel: the estimated response surface for interactions. 7 CONCLUSION This paper introduced the Sparse Deep Additive Model with Interactions (SDAMI), a structured deep learning framework tailored for small-n, large-p regression problems. By leveraging the principle of effect footprints, SDAMI offers a systematic a… view at source ↗

**Figure 6.** Figure 6: The estimated (red dashed lines) versus true additive component functions (solid black [PITH_FULL_IMAGE:figures/full_fig_p019_6.png] view at source ↗

**Figure 7.** Figure 7: (Upper panel: Case (4); middle panel: Case (5)) The three figures on the left: Estimated [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗

**Figure 8.** Figure 8: (Upper) Main effects and (Lower) Interaction for Chip Data (Chip Dataset) The six figures [PITH_FULL_IMAGE:figures/full_fig_p021_8.png] view at source ↗

**Figure 9.** Figure 9: (Diabetes Dataset) The two figures on the left: Predicted marginal response of target with [PITH_FULL_IMAGE:figures/full_fig_p022_9.png] view at source ↗

read the original abstract

Recent advances in deep learning highlight the need for personalized models that can learn from small samples, handle high-dimensional features, and remain interpretable. To address this, we propose the Sparse Deep Additive Model with Interactions (SDAMI), a framework that combines sparsity-driven feature selection with deep subnetworks for flexible function approximation. Central to SDAMI is the Effect Footprint principle, which posits that higher-order interactions leave detectable marginal traces on constituent variables, enabling their discovery without exhaustive search. SDAMI executes this principle through a three-stage strategy: (1) screening for footprint variables, (2) disentangling main effects from interactions via group lasso, and (3) modeling components with dedicated deep subnetworks. Theoretical analysis confirms that footprints vanish only under measure-zero symmetry conditions that are rare in practice, ensuring consistent interaction recovery. Extensive simulations demonstrate that SDAMI successfully identifies pure interactions that heredity-based baselines fundamentally miss, recovering complex effect structures with near-zero false positive rates. Together, these results position SDAMI as a principled framework for interpretable high-dimensional regression.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SDAMI's Effect Footprint principle plus three-stage pipeline is a distinct try at interaction recovery without heredity, but the screening stage's finite-sample power is the part that needs checking.

read the letter

The main thing here is that the paper puts forward the Effect Footprint principle as a way to find higher-order interactions through their marginal traces on the involved variables, then runs a three-stage procedure: screen for those variables, use group lasso to separate main effects from interactions, and fit dedicated deep subnetworks to each piece. This is positioned as an improvement over heredity-based baselines that miss pure interactions.

Referee Report

1 major / 2 minor

Summary. The manuscript proposes the Sparse Deep Additive Model with Interactions (SDAMI) for interpretable high-dimensional regression. It introduces the Effect Footprint principle, asserting that higher-order interactions produce detectable marginal traces on constituent variables except under measure-zero symmetry conditions. SDAMI uses a three-stage process: screening for footprint variables, disentangling main effects and interactions via group lasso, and modeling with deep subnetworks. Theoretical analysis is claimed to support consistent interaction recovery, and simulations show it identifies pure interactions missed by heredity-based methods with near-zero false positive rates.

Significance. If the central claims hold, SDAMI offers a novel approach to balancing flexibility of deep learning with interpretability and sparsity in small-sample high-dimensional settings. The ability to recover interactions without exhaustive search or strict heredity assumptions could advance personalized modeling. The simulations provide evidence of practical utility over baselines, though verification of the theoretical guarantees is needed.

major comments (1)

[Theoretical Analysis] Theoretical Analysis: The assertion that footprints vanish only under measure-zero symmetry conditions does not address the finite-sample power of the screening stage (stage 1). In high-dimensional regimes, even with nonzero population marginal effects, variance in marginal estimates or multiple-testing issues could cause relevant variables to be screened out, preventing the subsequent disentangling and subnetwork fitting from recovering the interactions. This is a load-bearing concern for the reliability of the three-stage strategy.

minor comments (2)

[Simulations] Simulations section: The abstract mentions extensive simulations with near-zero false positive rates, but specific details on data generation, sample sizes, dimensions, error bars, and exact comparison metrics to baselines would strengthen the presentation.
[Method] Method description: The group lasso regularization parameters are free parameters; clarification on how they are chosen or tuned in practice would be helpful for reproducibility.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading and constructive feedback on our manuscript. We value the opportunity to clarify the scope of our theoretical results and to strengthen the presentation of the three-stage procedure. Below we respond point-by-point to the single major comment.

read point-by-point responses

Referee: The assertion that footprints vanish only under measure-zero symmetry conditions does not address the finite-sample power of the screening stage (stage 1). In high-dimensional regimes, even with nonzero population marginal effects, variance in marginal estimates or multiple-testing issues could cause relevant variables to be screened out, preventing the subsequent disentangling and subnetwork fitting from recovering the interactions. This is a load-bearing concern for the reliability of the three-stage strategy.

Authors: We agree that the current theoretical analysis focuses on population-level properties and asymptotic consistency: we prove that the marginal footprints are nonzero except on a measure-zero set of symmetric distributions, which guarantees that the screening stage recovers the relevant variables with probability approaching one as n grows. Explicit finite-sample power bounds that account for estimation variance and multiple-testing corrections in the high-dimensional regime are not derived in the present version. To address this concern we will revise the theoretical section to include a brief discussion of finite-sample behavior, supported by concentration inequalities for the marginal estimators used in Stage 1 and by additional simulation diagnostics that quantify screening error rates under the exact high-dimensional small-sample regimes examined in the paper. These revisions will make the load-bearing role of the screening stage more transparent while leaving the core asymptotic guarantees and empirical findings unchanged. revision: yes

Circularity Check

0 steps flagged

No significant circularity; Effect Footprint is posited as an independent principle with separate theoretical support.

full rationale

The paper introduces the Effect Footprint principle as a posited assumption that higher-order interactions produce detectable marginal traces except on a measure-zero set of symmetries. It then states that theoretical analysis confirms the vanishing property and that simulations show recovery performance. No equation or step reduces the principle to a fitted parameter, a self-citation chain, or a renaming of the method's own output. The three-stage procedure (screening, group-lasso disentangling, deep subnetworks) is presented as an implementation of the principle rather than a redefinition of it. The derivation chain therefore remains self-contained against external benchmarks and does not collapse by construction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

Assessment is based solely on the abstract; the ledger reflects only elements explicitly described or implied therein. Full paper may introduce additional parameters or assumptions.

free parameters (1)

Group lasso regularization parameters
Used in stage two to disentangle main effects from interactions; specific values or selection procedure not detailed in abstract.

axioms (1)

domain assumption Higher-order interactions leave detectable marginal traces on constituent variables except under measure-zero symmetry conditions
This is the central Effect Footprint principle that justifies the screening stage and consistent recovery claim.

invented entities (1)

Effect Footprint no independent evidence
purpose: To enable discovery of higher-order interactions without exhaustive combinatorial search
New posited principle introduced to support the three-stage strategy; no independent evidence outside the paper is mentioned.

pith-pipeline@v0.9.0 · 5717 in / 1379 out tokens · 59399 ms · 2026-05-21T21:48:47.325967+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

52 extracted references · 52 canonical work pages · 1 internal anchor

[1]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

work page
[2]

Neural additive models: Interpretable machine learning with neural nets

Rishabh Agarwal, Levi Melnick, Nicholas Frosst, Xuezhou Zhang, Ben Lengerich, Rich Caruana, and Geoffrey E Hinton. Neural additive models: Interpretable machine learning with neural nets. Advances in neural information processing systems, 34: 0 4699--4711, 2021

work page 2021
[3]

Andrew R. Barron. Universal approximation bounds for superpositions of a sigmoidal function. IEEE Transactions on Information Theory, 39 0 (3): 0 930--945, 1993. doi:10.1109/18.256500

work page doi:10.1109/18.256500 1993
[4]

Bartlett and Shahar Mendelson

Peter L. Bartlett and Shahar Mendelson. Rademacher and gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3: 0 463--482, 2002

work page 2002
[5]

A lasso for hierarchical interactions

Jacob Bien, Jonathan Taylor, and Robert Tibshirani. A lasso for hierarchical interactions. Annals of statistics, 41 0 (3): 0 1111, 2013

work page 2013
[6]

A survey of the recent trends in deep learning for literature based discovery in the biomedical domain

Eugenio Cesario, Carmela Comito, and Ester Zumpano. A survey of the recent trends in deep learning for literature based discovery in the biomedical domain. Neurocomputing, 568: 0 127079, 2024

work page 2024
[7]

Tripod+ ai statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods

Gary S Collins, Karel GM Moons, Paula Dhiman, Richard D Riley, Andrew L Beam, Ben Van Calster, Marzyeh Ghassemi, Xiaoxuan Liu, Johannes B Reitsma, Maarten Van Smeden, et al. Tripod+ ai statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. bmj, 385, 2024

work page 2024
[8]

Sure independence screening for ultrahigh dimensional feature space

Jianqing Fan and Jinchi Lv. Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society Series B: Statistical Methodology, 70 0 (5): 0 849--911, 2008

work page 2008
[9]

Nonparametric independence screening in sparse ultra-high-dimensional additive models

Jianqing Fan, Yang Feng, and Rui Song. Nonparametric independence screening in sparse ultra-high-dimensional additive models. Journal of the American Statistical Association, 106 0 (494): 0 544--557, 2011 a

work page 2011
[10]

Sparse high-dimensional models in economics

Jianqing Fan, Jinchi Lv, and Lei Qi. Sparse high-dimensional models in economics. Annu. Rev. Econ., 3 0 (1): 0 291--317, 2011 b

work page 2011
[11]

Data Mining, Inference, and Prediction

Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York, 2nd edition, 2009. doi:10.1007/978-0-387-84858-7

work page doi:10.1007/978-0-387-84858-7 2009
[12]

Deep neural networks and kernel regression achieve comparable accuracies for functional connectivity prediction of behavior and demographics

Tong He, Ru Kong, Avram J Holmes, Minh Nguyen, Mert R Sabuncu, Simon B Eickhoff, Danilo Bzdok, Jiashi Feng, and BT Thomas Yeo. Deep neural networks and kernel regression achieve comparable accuracies for functional connectivity prediction of behavior and demographics. NeuroImage, 206: 0 116276, 2020

work page 2020
[13]

Extraction of wearout model parameters using on-line test of an sram

Shu-Han Hsu, Ying-Yuan Huang, Yi-Da Wu, Kexin Yang, Li-Hsiang Lin, and Linda Milor. Extraction of wearout model parameters using on-line test of an sram. Microelectronics Reliability, 114: 0 113756, 2020

work page 2020
[14]

Deep p-spline: Theory, fast tuning, and application

Noah Yi-Ting Hung, Li-Hsiang Lin, and Vince D Calhoun. Deep p-spline: Theory, fast tuning, and application. arXiv preprint arXiv:2501.01376, 2025

work page arXiv 2025
[15]

Personalized medicine

Kewal K Jain. Personalized medicine. Current opinion in molecular therapeutics, 4 0 (6): 0 548--558, 2002

work page 2002
[16]

Maximum projection designs for computer experiments

V Roshan Joseph, Evren Gul, and Shan Ba. Maximum projection designs for computer experiments. Biometrika, 102: 0 371--380, 2015

work page 2015
[17]

Identifying natural images from human brain activity

Kendrick N Kay, Thomas Naselaris, Ryan J Prenger, and Jack L Gallant. Identifying natural images from human brain activity. Nature, 452 0 (7185): 0 352--355, 2008

work page 2008
[18]

Lassonet: A neural network with feature sparsity

Ismael Lemhadri, Feng Ruan, Louis Abraham, and Robert Tibshirani. Lassonet: A neural network with feature sparsity. Journal of Machine Learning Research, 22 0 (127): 0 1--29, 2021

work page 2021
[19]

sodavis: SODA: Main and Interaction Effects Selection for Logistic Regression, Quadratic Discriminant and General Index Models

Yang Li. sodavis: SODA: Main and Interaction Effects Selection for Logistic Regression, Quadratic Discriminant and General Index Models. R Foundation for Statistical Computing, 2015. URL https://cran.r-project.org/web/packages/sodavis/

work page 2015
[20]

Learning interactions via hierarchical group-lasso regularization

Michael Lim and Trevor Hastie. Learning interactions via hierarchical group-lasso regularization. Journal of Computational and Graphical Statistics, 24 0 (3): 0 627--654, 2015

work page 2015
[21]

Tsybakov

Karim Lounici, Massimiliano Pontil, Sara van de Geer, and Alexandre B. Tsybakov. Oracle inequalities and optimal inference under group sparsity. Annals of Statistics, 39 0 (4): 0 2164--2204, 2011. URL https://doi.org/10.1214/11-AOS896

work page doi:10.1214/11-aos896 2011
[22]

Foundations of Machine Learning

Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar. Foundations of Machine Learning. MIT Press, 2nd edition, 2018

work page 2018
[23]

Interpretable machine learning

Christoph Molnar. Interpretable machine learning. Lulu. com, 2020

work page 2020
[24]

A unified framework for high-dimensional analysis of m -estimators with decomposable regularizers

Sahand Negahban, Bin Yu, Martin J Wainwright, and Pradeep Ravikumar. A unified framework for high-dimensional analysis of m -estimators with decomposable regularizers. Advances in neural information processing systems, 22, 2009

work page 2009
[25]

Machine learning methods in drug discovery

Lauv Patel, Tripti Shukla, Xiuzhen Huang, David W Ussery, and Shanzhi Wang. Machine learning methods in drug discovery. Molecules, 25 0 (22): 0 5277, 2020

work page 2020
[26]

Sparse additive models

Pradeep Ravikumar, John Lafferty, Han Liu, and Larry Wasserman. Sparse additive models. Journal of the Royal Statistical Society Series B: Statistical Methodology, 71 0 (5): 0 1009--1030, 2009

work page 2009
[27]

The Design and Analysis of Computer Experiments (2nd Edition)

Thomas J Santner, Brian J Williams, and William I Notz. The Design and Analysis of Computer Experiments (2nd Edition). New York, NW: Springer, 2019

work page 2019
[28]

Group sparse regularization for deep neural networks

Simone Scardapane, Danilo Comminiello, Amir Hussain, and Aurelio Uncini. Group sparse regularization for deep neural networks. Neurocomputing, 241: 0 81--89, 2017

work page 2017
[29]

The Annals of Statistics 48(4):1875--1897, ://dx.doi.org/10.1214/19-AOS1875

Johannes Schmidt-Hieber. Nonparametric regression using deep neural networks with relu activation function. Annals of Statistics, 48 0 (4): 0 1875--1897, 2020. doi:10.1214/19-AOS1875

work page doi:10.1214/19-aos1875 2020
[30]

Modelling interactions in high-dimensional data with backtracking

Rajen D Shah. Modelling interactions in high-dimensional data with backtracking. Journal of Machine Learning Research, 17 0 (207): 0 1--31, 2016

work page 2016
[31]

A sparse-group lasso

Noah Simon, Jerome Friedman, Trevor Hastie, and Robert Tibshirani. A sparse-group lasso. Journal of computational and graphical statistics, 22 0 (2): 0 231--245, 2013

work page 2013
[32]

Global sensitivity indices for nonlinear mathematical models and their monte carlo estimates

Ilya M Sobol. Global sensitivity indices for nonlinear mathematical models and their monte carlo estimates. Mathematics and computers in simulation, 55 0 (1-3): 0 271--280, 2001

work page 2001
[33]

On sensitivity estimation for nonlinear mathematical models

Il'ya Meerovich Sobol'. On sensitivity estimation for nonlinear mathematical models. Matematicheskoe modelirovanie, 2 0 (1): 0 112--118, 1990

work page 1990
[34]

Personalised medicine—implementation to the healthcare system in europe (focus group discussions)

Dorota Stefanicka-Wojtas and Donata Kurpas. Personalised medicine—implementation to the healthcare system in europe (focus group discussions). Journal of personalized medicine, 13 0 (3): 0 380, 2023

work page 2023
[35]

Adaptivity of deep relu network for learning in besov and mixed smooth besov spaces

Taiji Suzuki. Adaptivity of deep relu network for learning in besov and mixed smooth besov spaces. In Proceedings of the 36th International Conference on Machine Learning (ICML), volume 97 of Proceedings of Machine Learning Research, pp.\ 11692--11702. PMLR, 2019

work page 2019
[36]

van de Geer

Sara A. van de Geer. Empirical Processes in M-Estimation. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 2000

work page 2000
[37]

Explainable Neural Networks based on Additive Index Models

Joel Vaughan, Agus Sudjianto, Erind Brahimi, Jie Chen, and Vijayan N Nair. Explainable neural networks based on additive index models. arXiv preprint arXiv:1806.01933, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[38]

Nonparametric sparse hierarchical models describe v1 fmri responses to natural images

Vincent Q Vu, Bin Yu, Thomas Naselaris, Kendrick Kay, Jack Gallant, and Pradeep Ravikumar. Nonparametric sparse hierarchical models describe v1 fmri responses to natural images. Advances in Neural Information Processing Systems, 21, 2008

work page 2008
[39]

Hybrid predictive models: When an interpretable model collaborates with a black-box model

Tong Wang and Qihang Lin. Hybrid predictive models: When an interpretable model collaborates with a black-box model. Journal of Machine Learning Research, 22 0 (137): 0 1--38, 2021

work page 2021
[40]

Learning structured sparsity in deep neural networks

Wei Wen, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. Learning structured sparsity in deep neural networks. Advances in neural information processing systems, 29, 2016

work page 2016
[41]

Experiments: planning, analysis, and optimization

CF Jeff Wu and Michael S Hamada. Experiments: planning, analysis, and optimization. John Wiley and Sons, 2011

work page 2011
[42]

Sparse neural additive model: Interpretable deep learning with feature selection via group sparsity

Shiyun Xu, Zhiqi Bu, Pratik Chaudhari, and Ian J Barnett. Sparse neural additive model: Interpretable deep learning with feature selection via group sparsity. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp.\ 343--359. Springer, 2023

work page 2023
[43]

Front-end of line and middle-of-line time-dependent dielectric breakdown reliability simulator for logic circuits

Kexin Yang, Taizhi Liu, Rui Zhang, Dae-Hyun Kim, and Linda Milor. Front-end of line and middle-of-line time-dependent dielectric breakdown reliability simulator for logic circuits. Microelectronics Reliability, 76: 0 81--86, 2017

work page 2017
[44]

Gami-net: An explainable neural network based on generalized additive models with structured interactions

Zebin Yang, Aijun Zhang, and Agus Sudjianto. Gami-net: An explainable neural network based on generalized additive models with structured interactions. Pattern Recognition, 120: 0 108192, 2021

work page 2021
[45]

Error bounds for approximations with deep relu networks

Dmitry Yarotsky. Error bounds for approximations with deep relu networks. Neural Networks, 94: 0 103--114, 2017. doi:10.1016/j.neunet.2017.07.005

work page doi:10.1016/j.neunet.2017.07.005 2017
[46]

Model selection and estimation in regression with grouped variables

Ming Yuan and Yi Lin. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society Series B: Statistical Methodology, 68 0 (1): 0 49--67, 2006

work page 2006
[47]

Structured variable selection and estimation

Ming Yuan, V Roshan Joseph, and Hui Zou. Structured variable selection and estimation. The Annals of Applied Statistics, pp.\ 1738--1757, 2009

work page 2009
[48]

The composite absolute penalties family for grouped and hierarchical variable selection

Peng Zhao, Guilherme Rocha, and Bin Yu. The composite absolute penalties family for grouped and hierarchical variable selection. 2009

work page 2009
[49]

Learning discriminative bayesian networks from high-dimensional continuous neuroimaging data

Luping Zhou, Lei Wang, Lingqiao Liu, Philip Ogunbona, and Dinggang Shen. Learning discriminative bayesian networks from high-dimensional continuous neuroimaging data. IEEE transactions on pattern analysis and machine intelligence, 38 0 (11): 0 2269--2283, 2015

work page 2015
[50]

@esa (Ref

\@ifxundefined[1] #1\@undefined \@firstoftwo \@secondoftwo \@ifnum[1] #1 \@firstoftwo \@secondoftwo \@ifx[1] #1 \@firstoftwo \@secondoftwo [2] @ #1 \@temptokena #2 #1 @ \@temptokena \@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command n...

work page
[51]

\@lbibitem[] @bibitem@first@sw\@secondoftwo \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 @tmp #1 NAT@b@open@#2 NAT@b@shut@#2 \@ifnum @merge>\@ne @bibitem@first@sw \@firstoftwo \@ifundefined NAT@b*@#2 \@firstoftwo @num @NAT@ctr \@secondoft...

work page
[52]

@open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...

work page

[1] [1]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

work page

[2] [2]

Neural additive models: Interpretable machine learning with neural nets

Rishabh Agarwal, Levi Melnick, Nicholas Frosst, Xuezhou Zhang, Ben Lengerich, Rich Caruana, and Geoffrey E Hinton. Neural additive models: Interpretable machine learning with neural nets. Advances in neural information processing systems, 34: 0 4699--4711, 2021

work page 2021

[3] [3]

Andrew R. Barron. Universal approximation bounds for superpositions of a sigmoidal function. IEEE Transactions on Information Theory, 39 0 (3): 0 930--945, 1993. doi:10.1109/18.256500

work page doi:10.1109/18.256500 1993

[4] [4]

Bartlett and Shahar Mendelson

Peter L. Bartlett and Shahar Mendelson. Rademacher and gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3: 0 463--482, 2002

work page 2002

[5] [5]

A lasso for hierarchical interactions

Jacob Bien, Jonathan Taylor, and Robert Tibshirani. A lasso for hierarchical interactions. Annals of statistics, 41 0 (3): 0 1111, 2013

work page 2013

[6] [6]

A survey of the recent trends in deep learning for literature based discovery in the biomedical domain

Eugenio Cesario, Carmela Comito, and Ester Zumpano. A survey of the recent trends in deep learning for literature based discovery in the biomedical domain. Neurocomputing, 568: 0 127079, 2024

work page 2024

[7] [7]

Tripod+ ai statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods

Gary S Collins, Karel GM Moons, Paula Dhiman, Richard D Riley, Andrew L Beam, Ben Van Calster, Marzyeh Ghassemi, Xiaoxuan Liu, Johannes B Reitsma, Maarten Van Smeden, et al. Tripod+ ai statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. bmj, 385, 2024

work page 2024

[8] [8]

Sure independence screening for ultrahigh dimensional feature space

Jianqing Fan and Jinchi Lv. Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society Series B: Statistical Methodology, 70 0 (5): 0 849--911, 2008

work page 2008

[9] [9]

Nonparametric independence screening in sparse ultra-high-dimensional additive models

Jianqing Fan, Yang Feng, and Rui Song. Nonparametric independence screening in sparse ultra-high-dimensional additive models. Journal of the American Statistical Association, 106 0 (494): 0 544--557, 2011 a

work page 2011

[10] [10]

Sparse high-dimensional models in economics

Jianqing Fan, Jinchi Lv, and Lei Qi. Sparse high-dimensional models in economics. Annu. Rev. Econ., 3 0 (1): 0 291--317, 2011 b

work page 2011

[11] [11]

Data Mining, Inference, and Prediction

Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York, 2nd edition, 2009. doi:10.1007/978-0-387-84858-7

work page doi:10.1007/978-0-387-84858-7 2009

[12] [12]

Deep neural networks and kernel regression achieve comparable accuracies for functional connectivity prediction of behavior and demographics

Tong He, Ru Kong, Avram J Holmes, Minh Nguyen, Mert R Sabuncu, Simon B Eickhoff, Danilo Bzdok, Jiashi Feng, and BT Thomas Yeo. Deep neural networks and kernel regression achieve comparable accuracies for functional connectivity prediction of behavior and demographics. NeuroImage, 206: 0 116276, 2020

work page 2020

[13] [13]

Extraction of wearout model parameters using on-line test of an sram

Shu-Han Hsu, Ying-Yuan Huang, Yi-Da Wu, Kexin Yang, Li-Hsiang Lin, and Linda Milor. Extraction of wearout model parameters using on-line test of an sram. Microelectronics Reliability, 114: 0 113756, 2020

work page 2020

[14] [14]

Deep p-spline: Theory, fast tuning, and application

Noah Yi-Ting Hung, Li-Hsiang Lin, and Vince D Calhoun. Deep p-spline: Theory, fast tuning, and application. arXiv preprint arXiv:2501.01376, 2025

work page arXiv 2025

[15] [15]

Personalized medicine

Kewal K Jain. Personalized medicine. Current opinion in molecular therapeutics, 4 0 (6): 0 548--558, 2002

work page 2002

[16] [16]

Maximum projection designs for computer experiments

V Roshan Joseph, Evren Gul, and Shan Ba. Maximum projection designs for computer experiments. Biometrika, 102: 0 371--380, 2015

work page 2015

[17] [17]

Identifying natural images from human brain activity

Kendrick N Kay, Thomas Naselaris, Ryan J Prenger, and Jack L Gallant. Identifying natural images from human brain activity. Nature, 452 0 (7185): 0 352--355, 2008

work page 2008

[18] [18]

Lassonet: A neural network with feature sparsity

Ismael Lemhadri, Feng Ruan, Louis Abraham, and Robert Tibshirani. Lassonet: A neural network with feature sparsity. Journal of Machine Learning Research, 22 0 (127): 0 1--29, 2021

work page 2021

[19] [19]

sodavis: SODA: Main and Interaction Effects Selection for Logistic Regression, Quadratic Discriminant and General Index Models

Yang Li. sodavis: SODA: Main and Interaction Effects Selection for Logistic Regression, Quadratic Discriminant and General Index Models. R Foundation for Statistical Computing, 2015. URL https://cran.r-project.org/web/packages/sodavis/

work page 2015

[20] [20]

Learning interactions via hierarchical group-lasso regularization

Michael Lim and Trevor Hastie. Learning interactions via hierarchical group-lasso regularization. Journal of Computational and Graphical Statistics, 24 0 (3): 0 627--654, 2015

work page 2015

[21] [21]

Tsybakov

Karim Lounici, Massimiliano Pontil, Sara van de Geer, and Alexandre B. Tsybakov. Oracle inequalities and optimal inference under group sparsity. Annals of Statistics, 39 0 (4): 0 2164--2204, 2011. URL https://doi.org/10.1214/11-AOS896

work page doi:10.1214/11-aos896 2011

[22] [22]

Foundations of Machine Learning

Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar. Foundations of Machine Learning. MIT Press, 2nd edition, 2018

work page 2018

[23] [23]

Interpretable machine learning

Christoph Molnar. Interpretable machine learning. Lulu. com, 2020

work page 2020

[24] [24]

A unified framework for high-dimensional analysis of m -estimators with decomposable regularizers

Sahand Negahban, Bin Yu, Martin J Wainwright, and Pradeep Ravikumar. A unified framework for high-dimensional analysis of m -estimators with decomposable regularizers. Advances in neural information processing systems, 22, 2009

work page 2009

[25] [25]

Machine learning methods in drug discovery

Lauv Patel, Tripti Shukla, Xiuzhen Huang, David W Ussery, and Shanzhi Wang. Machine learning methods in drug discovery. Molecules, 25 0 (22): 0 5277, 2020

work page 2020

[26] [26]

Sparse additive models

Pradeep Ravikumar, John Lafferty, Han Liu, and Larry Wasserman. Sparse additive models. Journal of the Royal Statistical Society Series B: Statistical Methodology, 71 0 (5): 0 1009--1030, 2009

work page 2009

[27] [27]

The Design and Analysis of Computer Experiments (2nd Edition)

Thomas J Santner, Brian J Williams, and William I Notz. The Design and Analysis of Computer Experiments (2nd Edition). New York, NW: Springer, 2019

work page 2019

[28] [28]

Group sparse regularization for deep neural networks

Simone Scardapane, Danilo Comminiello, Amir Hussain, and Aurelio Uncini. Group sparse regularization for deep neural networks. Neurocomputing, 241: 0 81--89, 2017

work page 2017

[29] [29]

The Annals of Statistics 48(4):1875--1897, ://dx.doi.org/10.1214/19-AOS1875

Johannes Schmidt-Hieber. Nonparametric regression using deep neural networks with relu activation function. Annals of Statistics, 48 0 (4): 0 1875--1897, 2020. doi:10.1214/19-AOS1875

work page doi:10.1214/19-aos1875 2020

[30] [30]

Modelling interactions in high-dimensional data with backtracking

Rajen D Shah. Modelling interactions in high-dimensional data with backtracking. Journal of Machine Learning Research, 17 0 (207): 0 1--31, 2016

work page 2016

[31] [31]

A sparse-group lasso

Noah Simon, Jerome Friedman, Trevor Hastie, and Robert Tibshirani. A sparse-group lasso. Journal of computational and graphical statistics, 22 0 (2): 0 231--245, 2013

work page 2013

[32] [32]

Global sensitivity indices for nonlinear mathematical models and their monte carlo estimates

Ilya M Sobol. Global sensitivity indices for nonlinear mathematical models and their monte carlo estimates. Mathematics and computers in simulation, 55 0 (1-3): 0 271--280, 2001

work page 2001

[33] [33]

On sensitivity estimation for nonlinear mathematical models

Il'ya Meerovich Sobol'. On sensitivity estimation for nonlinear mathematical models. Matematicheskoe modelirovanie, 2 0 (1): 0 112--118, 1990

work page 1990

[34] [34]

Personalised medicine—implementation to the healthcare system in europe (focus group discussions)

Dorota Stefanicka-Wojtas and Donata Kurpas. Personalised medicine—implementation to the healthcare system in europe (focus group discussions). Journal of personalized medicine, 13 0 (3): 0 380, 2023

work page 2023

[35] [35]

Adaptivity of deep relu network for learning in besov and mixed smooth besov spaces

Taiji Suzuki. Adaptivity of deep relu network for learning in besov and mixed smooth besov spaces. In Proceedings of the 36th International Conference on Machine Learning (ICML), volume 97 of Proceedings of Machine Learning Research, pp.\ 11692--11702. PMLR, 2019

work page 2019

[36] [36]

van de Geer

Sara A. van de Geer. Empirical Processes in M-Estimation. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 2000

work page 2000

[37] [37]

Explainable Neural Networks based on Additive Index Models

Joel Vaughan, Agus Sudjianto, Erind Brahimi, Jie Chen, and Vijayan N Nair. Explainable neural networks based on additive index models. arXiv preprint arXiv:1806.01933, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[38] [38]

Nonparametric sparse hierarchical models describe v1 fmri responses to natural images

Vincent Q Vu, Bin Yu, Thomas Naselaris, Kendrick Kay, Jack Gallant, and Pradeep Ravikumar. Nonparametric sparse hierarchical models describe v1 fmri responses to natural images. Advances in Neural Information Processing Systems, 21, 2008

work page 2008

[39] [39]

Hybrid predictive models: When an interpretable model collaborates with a black-box model

Tong Wang and Qihang Lin. Hybrid predictive models: When an interpretable model collaborates with a black-box model. Journal of Machine Learning Research, 22 0 (137): 0 1--38, 2021

work page 2021

[40] [40]

Learning structured sparsity in deep neural networks

Wei Wen, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. Learning structured sparsity in deep neural networks. Advances in neural information processing systems, 29, 2016

work page 2016

[41] [41]

Experiments: planning, analysis, and optimization

CF Jeff Wu and Michael S Hamada. Experiments: planning, analysis, and optimization. John Wiley and Sons, 2011

work page 2011

[42] [42]

Sparse neural additive model: Interpretable deep learning with feature selection via group sparsity

Shiyun Xu, Zhiqi Bu, Pratik Chaudhari, and Ian J Barnett. Sparse neural additive model: Interpretable deep learning with feature selection via group sparsity. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp.\ 343--359. Springer, 2023

work page 2023

[43] [43]

Front-end of line and middle-of-line time-dependent dielectric breakdown reliability simulator for logic circuits

Kexin Yang, Taizhi Liu, Rui Zhang, Dae-Hyun Kim, and Linda Milor. Front-end of line and middle-of-line time-dependent dielectric breakdown reliability simulator for logic circuits. Microelectronics Reliability, 76: 0 81--86, 2017

work page 2017

[44] [44]

Gami-net: An explainable neural network based on generalized additive models with structured interactions

Zebin Yang, Aijun Zhang, and Agus Sudjianto. Gami-net: An explainable neural network based on generalized additive models with structured interactions. Pattern Recognition, 120: 0 108192, 2021

work page 2021

[45] [45]

Error bounds for approximations with deep relu networks

Dmitry Yarotsky. Error bounds for approximations with deep relu networks. Neural Networks, 94: 0 103--114, 2017. doi:10.1016/j.neunet.2017.07.005

work page doi:10.1016/j.neunet.2017.07.005 2017

[46] [46]

Model selection and estimation in regression with grouped variables

Ming Yuan and Yi Lin. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society Series B: Statistical Methodology, 68 0 (1): 0 49--67, 2006

work page 2006

[47] [47]

Structured variable selection and estimation

Ming Yuan, V Roshan Joseph, and Hui Zou. Structured variable selection and estimation. The Annals of Applied Statistics, pp.\ 1738--1757, 2009

work page 2009

[48] [48]

The composite absolute penalties family for grouped and hierarchical variable selection

Peng Zhao, Guilherme Rocha, and Bin Yu. The composite absolute penalties family for grouped and hierarchical variable selection. 2009

work page 2009

[49] [49]

Learning discriminative bayesian networks from high-dimensional continuous neuroimaging data

Luping Zhou, Lei Wang, Lingqiao Liu, Philip Ogunbona, and Dinggang Shen. Learning discriminative bayesian networks from high-dimensional continuous neuroimaging data. IEEE transactions on pattern analysis and machine intelligence, 38 0 (11): 0 2269--2283, 2015

work page 2015

[50] [50]

@esa (Ref

\@ifxundefined[1] #1\@undefined \@firstoftwo \@secondoftwo \@ifnum[1] #1 \@firstoftwo \@secondoftwo \@ifx[1] #1 \@firstoftwo \@secondoftwo [2] @ #1 \@temptokena #2 #1 @ \@temptokena \@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command n...

work page

[51] [51]

\@lbibitem[] @bibitem@first@sw\@secondoftwo \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 @tmp #1 NAT@b@open@#2 NAT@b@shut@#2 \@ifnum @merge>\@ne @bibitem@first@sw \@firstoftwo \@ifundefined NAT@b*@#2 \@firstoftwo @num @NAT@ctr \@secondoft...

work page

[52] [52]

@open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...

work page