Improving Performance in Classification Tasks with LCEN and the Weighted Focal Differentiable MCC Loss
Pith reviewed 2026-05-09 23:10 UTC · model grok-4.3
The pith
LCEN adapted for classification keeps models sparse and interpretable while diffMCC loss raises macro F1 by 4.9 percent and MCC by 8.5 percent over weighted cross-entropy.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A classification-ready version of LCEN performs nonlinear interpretable feature selection, discards 56 percent of inputs on average, and yields competitive or superior macro F1 and MCC values; the same experiments show that training with the weighted focal differentiable MCC loss consistently beats weighted cross-entropy by average margins of 4.9 percent in F1 and 8.5 percent in MCC.
What carries the argument
Modified LCEN algorithm that extends LASSO-Clip-EN feature selection to classification while enforcing sparsity and interpretability, together with the weighted focal differentiable MCC loss used as a training objective.
If this is right
- LCEN models remain sparse enough to eliminate roughly half the input features while matching or exceeding the accuracy of non-sparse alternatives.
- Retraining any model on only the LCEN-chosen features produces statistically significant gains in three of the four experiments.
- The diffMCC loss is the top performer in every experiment and delivers measurable lifts in both macro F1 and MCC.
- Feature selection by LCEN can be combined with standard retraining to improve results without increasing model complexity.
Where Pith is reading between the lines
- If the sparsity pattern generalizes, LCEN could be applied upstream of any classifier to reduce data-collection costs in high-dimensional settings such as sensor arrays or genomics.
- The consistent advantage of diffMCC suggests it may serve as a drop-in replacement for cross-entropy in other imbalanced or multi-class problems where correlation-based metrics matter.
- Because LCEN-selected features improve downstream models, the method could be used as a diagnostic tool to identify which variables truly drive class separation.
Load-bearing premise
The four standard datasets used are representative enough that the observed sparsity and accuracy gains will hold for other real-world classification problems without extra tuning.
What would settle it
A new dataset drawn from a different domain in which LCEN models either retain fewer than 30 percent of features while losing accuracy or in which diffMCC-trained models fail to exceed weighted cross-entropy performance would contradict the reported pattern.
read the original abstract
The LASSO-Clip-EN (LCEN) algorithm was previously introduced for nonlinear, interpretable feature selection and machine learning. However, its design and use was limited to regression tasks. In this work, we create a modified version of the LCEN algorithm that is suitable for classification tasks and maintains its desirable properties, such as interpretability. This modified LCEN algorithm is evaluated on four widely used binary and multiclass classification datasets. In these experiments, LCEN is compared against 10 other model types and consistently reaches high test-set macro F$_1$ score and Matthews correlation coefficient (MCC) metrics, higher than that of the majority of investigated models. LCEN models for classification remain sparse, eliminating an average of 56% of all input features in the experiments performed. Furthermore, LCEN-selected features are used to retrain all models using the same data, leading to statistically significant performance improvements in three of the experiments and insignificant differences in the fourth when compared to using all features or other feature selection methods. Simultaneously, the weighted focal differentiable MCC (diffMCC) loss function is evaluated on the same datasets. Models trained with the diffMCC loss function are always the best-performing methods in these experiments, and reach test-set macro F$_1$ scores that are, on average, 4.9% higher and MCCs that are 8.5% higher than those obtained by models trained with the weighted cross-entropy loss. These results highlight the performance of LCEN as a feature selection and machine learning algorithm also for classification tasks, and how the diffMCC loss function can train very accurate models, surpassing the weighted cross-entropy loss in the tasks investigated.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript extends the LASSO-Clip-EN (LCEN) algorithm to classification tasks while preserving interpretability and sparsity, evaluating the modified LCEN against 10 other models on four binary and multiclass datasets. It reports that LCEN eliminates an average of 56% of input features, that retraining with LCEN-selected features yields statistically significant performance gains in three of four experiments, and that the weighted focal differentiable MCC (diffMCC) loss consistently produces the best results, with average gains of 4.9% in macro F1 and 8.5% in MCC over weighted cross-entropy.
Significance. If the empirical claims hold under rigorous statistical controls, the work would be significant for supplying an interpretable sparse feature selector for classification and a differentiable loss that demonstrably improves upon cross-entropy. The reported sparsity and consistent outperformance could be useful in domains requiring both accuracy and feature transparency. The absence of variance estimates and reproducibility details, however, currently limits the strength of this contribution.
major comments (2)
- [Abstract] Abstract: The central claim that diffMCC-trained models are 'always the best-performing' and deliver fixed average improvements (4.9% F1, 8.5% MCC) over weighted cross-entropy is presented without standard deviations, number of independent runs, or any statistical test. This omission is load-bearing because the abstract contrasts it with the feature-selection results, which are explicitly labeled statistically significant; without these controls the headline margins cannot be distinguished from run-to-run variability or unequal hyperparameter effort.
- [Experiments] Experiments section: No information is supplied on the precise architectures of the 10 comparator models, the hyperparameter search protocol, the cross-validation scheme, or the exact statistical tests used for the performance comparisons. These details are required to evaluate whether the reported superiority of LCEN and diffMCC is reproducible and not an artifact of implementation choices.
minor comments (2)
- [Abstract] The abstract states that LCEN 'remains sparse' and eliminates 56% of features on average; a table or figure quantifying per-dataset sparsity and the precise definition of 'eliminated' features would improve clarity.
- Ensure that all reported averages in tables are accompanied by standard deviations and the number of runs; this is a presentation issue that does not affect the core claims but is needed for completeness.
Simulated Author's Rebuttal
We thank the referee for their thoughtful and constructive review. We address each major comment below and have revised the manuscript to incorporate additional details and statistical reporting where feasible.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim that diffMCC-trained models are 'always the best-performing' and deliver fixed average improvements (4.9% F1, 8.5% MCC) over weighted cross-entropy is presented without standard deviations, number of independent runs, or any statistical test. This omission is load-bearing because the abstract contrasts it with the feature-selection results, which are explicitly labeled statistically significant; without these controls the headline margins cannot be distinguished from run-to-run variability or unequal hyperparameter effort.
Authors: We agree that the abstract would be strengthened by including measures of variability and experimental repetition details for the diffMCC results. We have revised the abstract to report the average improvements together with their standard deviations across repeated runs and to note the statistical tests performed. This change ensures the reporting is consistent with the statistically significant feature-selection claims and allows readers to assess the robustness of the observed margins. revision: yes
-
Referee: [Experiments] Experiments section: No information is supplied on the precise architectures of the 10 comparator models, the hyperparameter search protocol, the cross-validation scheme, or the exact statistical tests used for the performance comparisons. These details are required to evaluate whether the reported superiority of LCEN and diffMCC is reproducible and not an artifact of implementation choices.
Authors: We appreciate the referee's emphasis on reproducibility. We have expanded the Experiments section to provide the requested information, including descriptions of the comparator model architectures, the hyperparameter search protocol and ranges, the cross-validation procedure, and the exact statistical tests used for all performance comparisons. These additions directly address concerns about potential implementation artifacts. revision: yes
Circularity Check
No circularity detected in empirical model comparisons or loss evaluations.
full rationale
The paper extends the prior LCEN algorithm (introduced for regression) to classification tasks via a described modification, then reports experimental results on four fixed datasets against 10 other models, plus comparisons of diffMCC loss versus weighted cross-entropy. No mathematical derivation chain exists; claims rest on direct performance metrics (F1, MCC), sparsity counts, and limited statistical tests for feature selection only. Self-citation of the original LCEN paper is present but non-load-bearing, as the new classification results and loss-function gains are independently measured on held-out test sets rather than derived from or fitted to the cited work. No self-definitional reductions, fitted inputs renamed as predictions, or ansatz smuggling occur.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
- [1]
-
[2]
Proceedings of the ACM on Human-Computer Interaction , month =
Hong, Sungsoo Ray and Hullman, Jessica and Bertini, Enrico , title =. Proceedings of the ACM on Human-Computer Interaction , month =. 2020 , issue_date =
work page 2020
-
[3]
Lloyd Stowell Shapley , journal =. Notes on the n-Person Game --
-
[4]
Ribeiro, Marco Tulio and Singh, Sameer and Guestrin, Carlos , title =. 2016 , isbn =. doi:10.1145/2939672.2939778 , booktitle =
-
[5]
Cynthia Rudin , url =. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead , volume =. Nature Machine Intelligence , months =
-
[6]
Dasgupta, Anirban and Drineas, Petros and Harb, Boulos and Josifovski, Vanja and Mahoney, Michael W. , title =. 2007 , isbn =. doi:10.1145/1281192.1281220 , booktitle =
-
[7]
Jović, A. and Brkić, K. and Bogunović, N. , booktitle=. A review of feature selection methods with applications , year=
-
[8]
Jiao, Ruwang and Nguyen, Bach Hoai and Xue, Bing and Zhang, Mengjie , journal=. A Survey on Evolutionary Multiobjective Feature Selection in Classification: Approaches, Applications, and Challenges , year=
-
[9]
G. Cybenko , doi =. Approximation by superpositions of a sigmoidal function , volume =. Mathematics of Control, Signals, and Systems , month =
-
[10]
Multilayer feedforward networks are universal approximators
Multilayer feedforward networks are universal approximators , journal =. 1989 , _issn =. doi:https://doi.org/10.1016/0893-6080(89)90020-8 , url =
-
[11]
Approximation capabilities of multilayer feedforward networks
Approximation capabilities of multilayer feedforward networks , journal =. 1991 , issn =. doi:https://doi.org/10.1016/0893-6080(91)90009-T , url =
- [12]
-
[13]
Seber, Pedro and Braatz, Richard D. , elocation-id =. Machine-Learning-Based Prediction of. 2025 , doi =
work page 2025
-
[14]
Akaike, H. , journal=. A new look at the statistical model identification , year=
-
[15]
The Annals of Statistics , number =
Gideon Schwarz , title =. The Annals of Statistics , number =. 1978 , _doi =
work page 1978
- [16]
-
[17]
Journal of the Royal Statistical Society: Series B (Methodological) , volume =
Tibshirani, Robert , title =. Journal of the Royal Statistical Society: Series B (Methodological) , volume =
-
[18]
Proceedings of The 27th International Conference on Artificial Intelligence and Statistics , pages =
Sparse and Faithful Explanations Without Sparse Models , author =. Proceedings of The 27th International Conference on Artificial Intelligence and Statistics , pages =. 2024 , editor =
work page 2024
-
[19]
Biometrical Journal , volume =
Heinze, Georg and Wallisch, Christine and Dunkler, Daniela , title =. Biometrical Journal , volume =
-
[20]
Whittingham, Mark J. and Stephens, Philip A. and Bradbury, Richard B. and Freckleton, Robert P. , title =. Journal of Animal Ecology , volume =
-
[21]
Step away from stepwise , volume =
Gary Smith , url =. Step away from stepwise , volume =. Journal of Big Data , months =
-
[22]
Variable selection and corporate bankruptcy forecasts , journal =. 2015 , _issn =
work page 2015
-
[23]
Journal of the Royal Statistical Society Series B: Statistical Methodology , volume =
Zou, Hui and Hastie, Trevor , title = ". Journal of the Royal Statistical Society Series B: Statistical Methodology , volume =. 2005 , months =
work page 2005
-
[24]
How Correlations Influence Lasso Prediction , year=
Hebiri, Mohamed and Lederer, Johannes , journal=. How Correlations Influence Lasso Prediction , year=
-
[25]
Dalalyan and Mohamed Hebiri and Johannes Lederer , title =
Arnak S. Dalalyan and Mohamed Hebiri and Johannes Lederer , title =. Bernoulli , number =. 2017 , _doi =
work page 2017
-
[26]
Statistics in Medicine , volume =
Pavlou, Menelaos and Ambler, Gareth and Seaman, Shaun and De Iorio, Maria and Omar, Rumana Z , title =. Statistics in Medicine , volume =
-
[27]
FFX: Fast, Scalable, Deterministic Symbolic Regression Technology
McConaghy, Trent. FFX: Fast, Scalable, Deterministic Symbolic Regression Technology. Genetic Programming Theory and Practice IX. 2011
work page 2011
-
[28]
Enabling reduced-order data-driven nonlinear identification and modeling through naïve elastic net regularization , journal =. 2017 , _note =
work page 2017
- [29]
-
[30]
Journal of the American Statistical Association , volume =
Jianqing Fan and Runze Li , title =. Journal of the American Statistical Association , volume =. 2001 , publisher =
work page 2001
-
[31]
Journal of the Royal Statistical Society Series B: Statistical Methodology , volume =
Ravikumar, Pradeep and Lafferty, John and Liu, Han and Wasserman, Larry , title = ". Journal of the Royal Statistical Society Series B: Statistical Methodology , volume =. 2009 , _month =
work page 2009
-
[32]
Proceedings of The 25th International Conference on Artificial Intelligence and Statistics , pages =
Fast Sparse Classification for Generalized Linear and Additive Models , author =. Proceedings of The 25th International Conference on Artificial Intelligence and Statistics , pages =. 2022 , editor =
work page 2022
-
[33]
The Annals of Statistics , number =
Hui Zou and Hao Helen Zhang , title =. The Annals of Statistics , number =. 2009 , _doi =
work page 2009
-
[34]
The Annals of Statistics , number =
Cun-Hui Zhang , title =. The Annals of Statistics , number =. 2010 , _doi =
work page 2010
-
[35]
Electronic Journal of Statistics , _number =
Sara van de Geer and Peter B. Electronic Journal of Statistics , _number =. 2011 , _doi =
work page 2011
-
[36]
Journal of Computational Biology , volume =
De Mol, Christine and Mosci, Sofia and Traskine, Magali and Verri, Alessandro , title =. Journal of Computational Biology , volume =. 2009 , _doi =
work page 2009
-
[37]
Ultra High-Dimensional Nonlinear Feature Selection for Big Biological Data , year=
Yamada, Makoto and Tang, Jiliang and Lugo-Martinez, Jose and Hodzic, Ermin and Shrestha, Raunak and Saha, Avishek and Ouyang, Hua and Yin, Dawei and Mamitsuka, Hiroshi and Sahinalp, Cenk and Radivojac, Predrag and Menczer, Filippo and Chang, Yi , journal=. Ultra High-Dimensional Nonlinear Feature Selection for Big Biological Data , year=
-
[38]
The Annals of Statistics , number =
Dimitris Bertsimas and Bart Van Parys , title =. The Annals of Statistics , number =. 2020 , _doi =
work page 2020
-
[39]
Xu, Kai and Srivastava, Akash and Gutfreund, Dan and Sosa, Felix and Ullman, Tomer and Tenenbaum, Josh and Sutton, Charles , booktitle =. A
-
[40]
Consistent feature selection for analytic deep neural networks , url =
Dinh, Vu C and Ho, Lam S , booktitle =. Consistent feature selection for analytic deep neural networks , url =
-
[41]
Group sparse regularization for deep neural networks , volume=
Scardapane, Simone and Comminiello, Danilo and Hussain, Amir and Uncini, Aurelio , year=. Group sparse regularization for deep neural networks , volume=. Neurocomputing , publisher=
-
[42]
Jian Wang and Huaqing Zhang and Junze Wang and Yifei Pu and Nikhil R. Pal , url =. Feature Selection Using a Neural Network With Group. IEEE Transactions on Neural Networks and Learning Systems , _month =
-
[43]
Lemhadri, Ismael and Ruan, Feng and Tibshirani, Rob , booktitle =. 2021 , _editor =
work page 2021
-
[44]
Heterogeneous Feature Selection With Multi-Modal Deep Neural Networks and Sparse Group
Zhao, Lei and Hu, Qinghua and Wang, Wenwu , journal=. Heterogeneous Feature Selection With Multi-Modal Deep Neural Networks and Sparse Group. 2015 , volume=
work page 2015
-
[45]
Proceedings of the 36th International Conference on Machine Learning , pages =
Concrete Autoencoders: Differentiable Feature Selection and Reconstruction , author =. Proceedings of the 36th International Conference on Machine Learning , pages =. 2019 , _editor =
work page 2019
-
[46]
Deep feature selection using a teacher-student network , journal =. 2020 , _issn =. doi:https://doi.org/10.1016/j.neucom.2019.12.017 , url =
-
[47]
Proceedings of the 37th International Conference on Machine Learning , pages =
Feature Selection using Stochastic Gates , author =. Proceedings of the 37th International Conference on Machine Learning , pages =. 2020 , _editor =
work page 2020
-
[48]
Patch Shortcuts: Interpretable Proxy Models Efficiently Find Black-Box Vulnerabilities , year=
Rosenzweig, Julia and Sicking, Joachim and Houben, Sebastian and Mock, Michael and Akila, Maram , booktitle=. Patch Shortcuts: Interpretable Proxy Models Efficiently Find Black-Box Vulnerabilities , year=
- [49]
-
[50]
John W. Tukey , journal =. Comparing Individual Means in the Analysis of Variance , _urldate =
-
[51]
Thresholding Procedures for High Dimensional Variable Selection and Statistical Estimation , url =
Zhou, Shuheng , booktitle =. Thresholding Procedures for High Dimensional Variable Selection and Statistical Estimation , url =
-
[52]
The Annals of Statistics , number =
Nicolai Meinshausen and Bin Yu , title =. The Annals of Statistics , number =. 2009 , _doi =
work page 2009
-
[53]
Thresholded Lasso for high dimensional variable selection and statistical estimation , author=. 2010 , eprint=
work page 2010
-
[54]
Alexandre Belloni and Victor Chernozhukov , title =. Bernoulli , number =. 2013 , _doi =
work page 2013
-
[55]
Extending and Benchmarking Cascade-Correlation: Extensions to the Cascade-Correlation Architecture and Benchmarking of Feed-forward Supervised Artificial Neural Networks , author=. 1995 , url=
work page 1995
-
[56]
Tikhonov, A. N. Solution of incorrectly formulated problems and the regularization method. Doklady Akademii Nauk SSSR. 1963
work page 1963
-
[57]
Quantitative Sociology , publisher =
11 -. Quantitative Sociology , publisher =. 1975 , _series =
work page 1975
-
[58]
Journal of Applied Probability , author=
Soft Modelling by Latent Variables: The Non-Linear Iterative Partial Least Squares (. Journal of Applied Probability , author=. 1975 , pages=
work page 1975
- [59]
-
[60]
Jerome H. Friedman , title =. The Annals of Statistics , number =. 2001 , _doi =
work page 2001
-
[61]
A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting , journal =. 1997 , _issn =
work page 1997
-
[62]
Boser, Bernhard E. and Guyon, Isabelle M. and Vapnik, Vladimir N. , title =. Proceedings of the Fifth Annual Workshop on Computational Learning Theory , pages =. 1992 , _isbn =
work page 1992
-
[63]
Survival analysis of heart failure patients: A case study , volume =
Tanvir Ahmad and Assia Munir and Sajjad Haider Bhatti and Muhammad Aftab and Muhammad Ali Raza , doi =. Survival analysis of heart failure patients: A case study , volume =. PLOS ONE , month =
-
[64]
Davide Chicco and Giuseppe Jurman , doi =. Machine learning can predict survival of patients with heart failure from serum creatinine and ejection fraction alone , volume =. BMC Medical Informatics and Decision Making , month =
-
[65]
Decision Support Systems62, 22–31 (2014) https://doi.org/ 10.1016/j.dss.2014.03.001 26
A data-driven approach to predict the success of bank telemarketing , journal =. 2014 , _issn =. doi:https://doi.org/10.1016/j.dss.2014.03.001 , url =
-
[66]
Modeling wine preferences by data mining from physicochemical properties , journal =. 2009 , _issn =. doi:https://doi.org/10.1016/j.dss.2009.05.016 , url =
- [67]
-
[68]
The Use of Spark Source Mass Spectrometry for the Analysis of Glass Fragments Encountered in Forensic Applications, Part 2 , journal =. 1973 , _issn =. doi:https://doi.org/10.1016/S0015-7368(73)70826-4 , url =
-
[69]
A Report on an Investigation into the Trace Elements Present in Vehicle Headlamp and Auxiliary Lamp Glasses , journal =. 1974 , _issn =. doi:https://doi.org/10.1016/S0015-7368(74)70850-7 , url =
-
[70]
Grace C. Y. Peng and Mark Alber and Adrian Buganza Tepole and William R. Cannon and Suvranu De and Savador Dura-Bernal and Krishna Garikipati and George Karniadakis and William W. Lytton and Paris Perdikaris and Linda Petzold and Ellen Kuhl , url =. Multiscale Modeling Meets Machine Learning: What Can We Learn? , volume =. Archives of Computational Method...
-
[71]
Willard, Jared and Jia, Xiaowei and Xu, Shaoming and Steinbach, Michael and Kumar, Vipin , title =. ACM Comput. Surv. , month =. 2022 , issue_date =
work page 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.