pith. sign in

arxiv: 2604.21252 · v1 · submitted 2026-04-23 · 💻 cs.LG

Improving Performance in Classification Tasks with LCEN and the Weighted Focal Differentiable MCC Loss

Pith reviewed 2026-05-09 23:10 UTC · model grok-4.3

classification 💻 cs.LG
keywords LCENfeature selectionclassificationsparsityinterpretable modelsdifferentiable MCC lossmacro F1 scoreMatthews correlation coefficient
0
0 comments X

The pith

LCEN adapted for classification keeps models sparse and interpretable while diffMCC loss raises macro F1 by 4.9 percent and MCC by 8.5 percent over weighted cross-entropy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper adapts the LASSO-Clip-EN algorithm from regression to classification, preserving its ability to produce nonlinear yet interpretable models that discard unnecessary input features. On four standard binary and multiclass datasets, the modified LCEN eliminates an average of 56 percent of features and still matches or exceeds the macro F1 and MCC scores of ten competing model types. When other models are retrained solely on the LCEN-selected features, performance improves significantly in three experiments. At the same time, replacing the usual weighted cross-entropy objective with the weighted focal differentiable MCC loss produces the highest scores in every trial. These results indicate that sparsity and a better-suited loss can be achieved together without sacrificing accuracy on the tested tasks.

Core claim

A classification-ready version of LCEN performs nonlinear interpretable feature selection, discards 56 percent of inputs on average, and yields competitive or superior macro F1 and MCC values; the same experiments show that training with the weighted focal differentiable MCC loss consistently beats weighted cross-entropy by average margins of 4.9 percent in F1 and 8.5 percent in MCC.

What carries the argument

Modified LCEN algorithm that extends LASSO-Clip-EN feature selection to classification while enforcing sparsity and interpretability, together with the weighted focal differentiable MCC loss used as a training objective.

If this is right

  • LCEN models remain sparse enough to eliminate roughly half the input features while matching or exceeding the accuracy of non-sparse alternatives.
  • Retraining any model on only the LCEN-chosen features produces statistically significant gains in three of the four experiments.
  • The diffMCC loss is the top performer in every experiment and delivers measurable lifts in both macro F1 and MCC.
  • Feature selection by LCEN can be combined with standard retraining to improve results without increasing model complexity.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the sparsity pattern generalizes, LCEN could be applied upstream of any classifier to reduce data-collection costs in high-dimensional settings such as sensor arrays or genomics.
  • The consistent advantage of diffMCC suggests it may serve as a drop-in replacement for cross-entropy in other imbalanced or multi-class problems where correlation-based metrics matter.
  • Because LCEN-selected features improve downstream models, the method could be used as a diagnostic tool to identify which variables truly drive class separation.

Load-bearing premise

The four standard datasets used are representative enough that the observed sparsity and accuracy gains will hold for other real-world classification problems without extra tuning.

What would settle it

A new dataset drawn from a different domain in which LCEN models either retain fewer than 30 percent of features while losing accuracy or in which diffMCC-trained models fail to exceed weighted cross-entropy performance would contradict the reported pattern.

read the original abstract

The LASSO-Clip-EN (LCEN) algorithm was previously introduced for nonlinear, interpretable feature selection and machine learning. However, its design and use was limited to regression tasks. In this work, we create a modified version of the LCEN algorithm that is suitable for classification tasks and maintains its desirable properties, such as interpretability. This modified LCEN algorithm is evaluated on four widely used binary and multiclass classification datasets. In these experiments, LCEN is compared against 10 other model types and consistently reaches high test-set macro F$_1$ score and Matthews correlation coefficient (MCC) metrics, higher than that of the majority of investigated models. LCEN models for classification remain sparse, eliminating an average of 56% of all input features in the experiments performed. Furthermore, LCEN-selected features are used to retrain all models using the same data, leading to statistically significant performance improvements in three of the experiments and insignificant differences in the fourth when compared to using all features or other feature selection methods. Simultaneously, the weighted focal differentiable MCC (diffMCC) loss function is evaluated on the same datasets. Models trained with the diffMCC loss function are always the best-performing methods in these experiments, and reach test-set macro F$_1$ scores that are, on average, 4.9% higher and MCCs that are 8.5% higher than those obtained by models trained with the weighted cross-entropy loss. These results highlight the performance of LCEN as a feature selection and machine learning algorithm also for classification tasks, and how the diffMCC loss function can train very accurate models, surpassing the weighted cross-entropy loss in the tasks investigated.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript extends the LASSO-Clip-EN (LCEN) algorithm to classification tasks while preserving interpretability and sparsity, evaluating the modified LCEN against 10 other models on four binary and multiclass datasets. It reports that LCEN eliminates an average of 56% of input features, that retraining with LCEN-selected features yields statistically significant performance gains in three of four experiments, and that the weighted focal differentiable MCC (diffMCC) loss consistently produces the best results, with average gains of 4.9% in macro F1 and 8.5% in MCC over weighted cross-entropy.

Significance. If the empirical claims hold under rigorous statistical controls, the work would be significant for supplying an interpretable sparse feature selector for classification and a differentiable loss that demonstrably improves upon cross-entropy. The reported sparsity and consistent outperformance could be useful in domains requiring both accuracy and feature transparency. The absence of variance estimates and reproducibility details, however, currently limits the strength of this contribution.

major comments (2)
  1. [Abstract] Abstract: The central claim that diffMCC-trained models are 'always the best-performing' and deliver fixed average improvements (4.9% F1, 8.5% MCC) over weighted cross-entropy is presented without standard deviations, number of independent runs, or any statistical test. This omission is load-bearing because the abstract contrasts it with the feature-selection results, which are explicitly labeled statistically significant; without these controls the headline margins cannot be distinguished from run-to-run variability or unequal hyperparameter effort.
  2. [Experiments] Experiments section: No information is supplied on the precise architectures of the 10 comparator models, the hyperparameter search protocol, the cross-validation scheme, or the exact statistical tests used for the performance comparisons. These details are required to evaluate whether the reported superiority of LCEN and diffMCC is reproducible and not an artifact of implementation choices.
minor comments (2)
  1. [Abstract] The abstract states that LCEN 'remains sparse' and eliminates 56% of features on average; a table or figure quantifying per-dataset sparsity and the precise definition of 'eliminated' features would improve clarity.
  2. Ensure that all reported averages in tables are accompanied by standard deviations and the number of runs; this is a presentation issue that does not affect the core claims but is needed for completeness.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful and constructive review. We address each major comment below and have revised the manuscript to incorporate additional details and statistical reporting where feasible.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim that diffMCC-trained models are 'always the best-performing' and deliver fixed average improvements (4.9% F1, 8.5% MCC) over weighted cross-entropy is presented without standard deviations, number of independent runs, or any statistical test. This omission is load-bearing because the abstract contrasts it with the feature-selection results, which are explicitly labeled statistically significant; without these controls the headline margins cannot be distinguished from run-to-run variability or unequal hyperparameter effort.

    Authors: We agree that the abstract would be strengthened by including measures of variability and experimental repetition details for the diffMCC results. We have revised the abstract to report the average improvements together with their standard deviations across repeated runs and to note the statistical tests performed. This change ensures the reporting is consistent with the statistically significant feature-selection claims and allows readers to assess the robustness of the observed margins. revision: yes

  2. Referee: [Experiments] Experiments section: No information is supplied on the precise architectures of the 10 comparator models, the hyperparameter search protocol, the cross-validation scheme, or the exact statistical tests used for the performance comparisons. These details are required to evaluate whether the reported superiority of LCEN and diffMCC is reproducible and not an artifact of implementation choices.

    Authors: We appreciate the referee's emphasis on reproducibility. We have expanded the Experiments section to provide the requested information, including descriptions of the comparator model architectures, the hyperparameter search protocol and ranges, the cross-validation procedure, and the exact statistical tests used for all performance comparisons. These additions directly address concerns about potential implementation artifacts. revision: yes

Circularity Check

0 steps flagged

No circularity detected in empirical model comparisons or loss evaluations.

full rationale

The paper extends the prior LCEN algorithm (introduced for regression) to classification tasks via a described modification, then reports experimental results on four fixed datasets against 10 other models, plus comparisons of diffMCC loss versus weighted cross-entropy. No mathematical derivation chain exists; claims rest on direct performance metrics (F1, MCC), sparsity counts, and limited statistical tests for feature selection only. Self-citation of the original LCEN paper is present but non-load-bearing, as the new classification results and loss-function gains are independently measured on held-out test sets rather than derived from or fitted to the cited work. No self-definitional reductions, fitted inputs renamed as predictions, or ansatz smuggling occur.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract provides no explicit free parameters, axioms, or invented entities. The work rests on standard supervised learning assumptions that the chosen datasets are representative and that F1 and MCC are appropriate summary metrics.

pith-pipeline@v0.9.0 · 5612 in / 1167 out tokens · 49564 ms · 2026-05-09T23:10:30.067356+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

71 extracted references · 71 canonical work pages

  1. [1]

    2025 , url=

    Pedro Seber and Richard Braatz , journal=. 2025 , url=

  2. [2]

    Proceedings of the ACM on Human-Computer Interaction , month =

    Hong, Sungsoo Ray and Hullman, Jessica and Bertini, Enrico , title =. Proceedings of the ACM on Human-Computer Interaction , month =. 2020 , issue_date =

  3. [3]

    Notes on the n-Person Game --

    Lloyd Stowell Shapley , journal =. Notes on the n-Person Game --

  4. [4]

    Why Should I Trust You?

    Ribeiro, Marco Tulio and Singh, Sameer and Guestrin, Carlos , title =. 2016 , isbn =. doi:10.1145/2939672.2939778 , booktitle =

  5. [5]

    Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead , volume =

    Cynthia Rudin , url =. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead , volume =. Nature Machine Intelligence , months =

  6. [6]

    , title =

    Dasgupta, Anirban and Drineas, Petros and Harb, Boulos and Josifovski, Vanja and Mahoney, Michael W. , title =. 2007 , isbn =. doi:10.1145/1281192.1281220 , booktitle =

  7. [7]

    and Brkić, K

    Jović, A. and Brkić, K. and Bogunović, N. , booktitle=. A review of feature selection methods with applications , year=

  8. [8]

    A Survey on Evolutionary Multiobjective Feature Selection in Classification: Approaches, Applications, and Challenges , year=

    Jiao, Ruwang and Nguyen, Bach Hoai and Xue, Bing and Zhang, Mengjie , journal=. A Survey on Evolutionary Multiobjective Feature Selection in Classification: Approaches, Applications, and Challenges , year=

  9. [9]

    Cybenko , doi =

    G. Cybenko , doi =. Approximation by superpositions of a sigmoidal function , volume =. Mathematics of Control, Signals, and Systems , month =

  10. [10]

    Multilayer feedforward networks are universal approximators

    Multilayer feedforward networks are universal approximators , journal =. 1989 , _issn =. doi:https://doi.org/10.1016/0893-6080(89)90020-8 , url =

  11. [11]

    Approximation capabilities of multilayer feedforward networks

    Approximation capabilities of multilayer feedforward networks , journal =. 1991 , issn =. doi:https://doi.org/10.1016/0893-6080(91)90009-T , url =

  12. [12]

    Predicting

    Pedro Seber , year=. Predicting. 2402.17131 , archivePrefix=

  13. [13]

    , elocation-id =

    Seber, Pedro and Braatz, Richard D. , elocation-id =. Machine-Learning-Based Prediction of. 2025 , doi =

  14. [14]

    , journal=

    Akaike, H. , journal=. A new look at the statistical model identification , year=

  15. [15]

    The Annals of Statistics , number =

    Gideon Schwarz , title =. The Annals of Statistics , number =. 1978 , _doi =

  16. [16]

    , title =

    Santosa, Fadil and Symes, William W. , title =. 1986 , _doi =

  17. [17]

    Journal of the Royal Statistical Society: Series B (Methodological) , volume =

    Tibshirani, Robert , title =. Journal of the Royal Statistical Society: Series B (Methodological) , volume =

  18. [18]

    Proceedings of The 27th International Conference on Artificial Intelligence and Statistics , pages =

    Sparse and Faithful Explanations Without Sparse Models , author =. Proceedings of The 27th International Conference on Artificial Intelligence and Statistics , pages =. 2024 , editor =

  19. [19]

    Biometrical Journal , volume =

    Heinze, Georg and Wallisch, Christine and Dunkler, Daniela , title =. Biometrical Journal , volume =

  20. [20]

    and Stephens, Philip A

    Whittingham, Mark J. and Stephens, Philip A. and Bradbury, Richard B. and Freckleton, Robert P. , title =. Journal of Animal Ecology , volume =

  21. [21]

    Step away from stepwise , volume =

    Gary Smith , url =. Step away from stepwise , volume =. Journal of Big Data , months =

  22. [22]

    2015 , _issn =

    Variable selection and corporate bankruptcy forecasts , journal =. 2015 , _issn =

  23. [23]

    Journal of the Royal Statistical Society Series B: Statistical Methodology , volume =

    Zou, Hui and Hastie, Trevor , title = ". Journal of the Royal Statistical Society Series B: Statistical Methodology , volume =. 2005 , months =

  24. [24]

    How Correlations Influence Lasso Prediction , year=

    Hebiri, Mohamed and Lederer, Johannes , journal=. How Correlations Influence Lasso Prediction , year=

  25. [25]

    Dalalyan and Mohamed Hebiri and Johannes Lederer , title =

    Arnak S. Dalalyan and Mohamed Hebiri and Johannes Lederer , title =. Bernoulli , number =. 2017 , _doi =

  26. [26]

    Statistics in Medicine , volume =

    Pavlou, Menelaos and Ambler, Gareth and Seaman, Shaun and De Iorio, Maria and Omar, Rumana Z , title =. Statistics in Medicine , volume =

  27. [27]

    FFX: Fast, Scalable, Deterministic Symbolic Regression Technology

    McConaghy, Trent. FFX: Fast, Scalable, Deterministic Symbolic Regression Technology. Genetic Programming Theory and Practice IX. 2011

  28. [28]

    2017 , _note =

    Enabling reduced-order data-driven nonlinear identification and modeling through naïve elastic net regularization , journal =. 2017 , _note =

  29. [29]

    2020 , _issn =

    Computers & Chemical Engineering , volume =. 2020 , _issn =

  30. [30]

    Journal of the American Statistical Association , volume =

    Jianqing Fan and Runze Li , title =. Journal of the American Statistical Association , volume =. 2001 , publisher =

  31. [31]

    Journal of the Royal Statistical Society Series B: Statistical Methodology , volume =

    Ravikumar, Pradeep and Lafferty, John and Liu, Han and Wasserman, Larry , title = ". Journal of the Royal Statistical Society Series B: Statistical Methodology , volume =. 2009 , _month =

  32. [32]

    Proceedings of The 25th International Conference on Artificial Intelligence and Statistics , pages =

    Fast Sparse Classification for Generalized Linear and Additive Models , author =. Proceedings of The 25th International Conference on Artificial Intelligence and Statistics , pages =. 2022 , editor =

  33. [33]

    The Annals of Statistics , number =

    Hui Zou and Hao Helen Zhang , title =. The Annals of Statistics , number =. 2009 , _doi =

  34. [34]

    The Annals of Statistics , number =

    Cun-Hui Zhang , title =. The Annals of Statistics , number =. 2010 , _doi =

  35. [35]

    Electronic Journal of Statistics , _number =

    Sara van de Geer and Peter B. Electronic Journal of Statistics , _number =. 2011 , _doi =

  36. [36]

    Journal of Computational Biology , volume =

    De Mol, Christine and Mosci, Sofia and Traskine, Magali and Verri, Alessandro , title =. Journal of Computational Biology , volume =. 2009 , _doi =

  37. [37]

    Ultra High-Dimensional Nonlinear Feature Selection for Big Biological Data , year=

    Yamada, Makoto and Tang, Jiliang and Lugo-Martinez, Jose and Hodzic, Ermin and Shrestha, Raunak and Saha, Avishek and Ouyang, Hua and Yin, Dawei and Mamitsuka, Hiroshi and Sahinalp, Cenk and Radivojac, Predrag and Menczer, Filippo and Chang, Yi , journal=. Ultra High-Dimensional Nonlinear Feature Selection for Big Biological Data , year=

  38. [38]

    The Annals of Statistics , number =

    Dimitris Bertsimas and Bart Van Parys , title =. The Annals of Statistics , number =. 2020 , _doi =

  39. [39]

    Xu, Kai and Srivastava, Akash and Gutfreund, Dan and Sosa, Felix and Ullman, Tomer and Tenenbaum, Josh and Sutton, Charles , booktitle =. A

  40. [40]

    Consistent feature selection for analytic deep neural networks , url =

    Dinh, Vu C and Ho, Lam S , booktitle =. Consistent feature selection for analytic deep neural networks , url =

  41. [41]

    Group sparse regularization for deep neural networks , volume=

    Scardapane, Simone and Comminiello, Danilo and Hussain, Amir and Uncini, Aurelio , year=. Group sparse regularization for deep neural networks , volume=. Neurocomputing , publisher=

  42. [42]

    Pal , url =

    Jian Wang and Huaqing Zhang and Junze Wang and Yifei Pu and Nikhil R. Pal , url =. Feature Selection Using a Neural Network With Group. IEEE Transactions on Neural Networks and Learning Systems , _month =

  43. [43]

    2021 , _editor =

    Lemhadri, Ismael and Ruan, Feng and Tibshirani, Rob , booktitle =. 2021 , _editor =

  44. [44]

    Heterogeneous Feature Selection With Multi-Modal Deep Neural Networks and Sparse Group

    Zhao, Lei and Hu, Qinghua and Wang, Wenwu , journal=. Heterogeneous Feature Selection With Multi-Modal Deep Neural Networks and Sparse Group. 2015 , volume=

  45. [45]

    Proceedings of the 36th International Conference on Machine Learning , pages =

    Concrete Autoencoders: Differentiable Feature Selection and Reconstruction , author =. Proceedings of the 36th International Conference on Machine Learning , pages =. 2019 , _editor =

  46. [46]

    2020 , _issn =

    Deep feature selection using a teacher-student network , journal =. 2020 , _issn =. doi:https://doi.org/10.1016/j.neucom.2019.12.017 , url =

  47. [47]

    Proceedings of the 37th International Conference on Machine Learning , pages =

    Feature Selection using Stochastic Gates , author =. Proceedings of the 37th International Conference on Machine Learning , pages =. 2020 , _editor =

  48. [48]

    Patch Shortcuts: Interpretable Proxy Models Efficiently Find Black-Box Vulnerabilities , year=

    Rosenzweig, Julia and Sicking, Joachim and Houben, Sebastian and Mock, Michael and Akila, Maram , booktitle=. Patch Shortcuts: Interpretable Proxy Models Efficiently Find Black-Box Vulnerabilities , year=

  49. [49]

    Unmasking

    Sebastian Lapuschkin and Stephan Wäldchen and Alexander Binder and Grégoire Montavon and Wojciech Samek and Klaus-Robert Müller , doi =. Unmasking. Nature Communications , month =

  50. [50]

    Tukey , journal =

    John W. Tukey , journal =. Comparing Individual Means in the Analysis of Variance , _urldate =

  51. [51]

    Thresholding Procedures for High Dimensional Variable Selection and Statistical Estimation , url =

    Zhou, Shuheng , booktitle =. Thresholding Procedures for High Dimensional Variable Selection and Statistical Estimation , url =

  52. [52]

    The Annals of Statistics , number =

    Nicolai Meinshausen and Bin Yu , title =. The Annals of Statistics , number =. 2009 , _doi =

  53. [53]

    2010 , eprint=

    Thresholded Lasso for high dimensional variable selection and statistical estimation , author=. 2010 , eprint=

  54. [54]

    Bernoulli , number =

    Alexandre Belloni and Victor Chernozhukov , title =. Bernoulli , number =. 2013 , _doi =

  55. [55]

    1995 , url=

    Extending and Benchmarking Cascade-Correlation: Extensions to the Cascade-Correlation Architecture and Benchmarking of Feed-forward Supervised Artificial Neural Networks , author=. 1995 , url=

  56. [56]

    Tikhonov, A. N. Solution of incorrectly formulated problems and the regularization method. Doklady Akademii Nauk SSSR. 1963

  57. [57]

    Quantitative Sociology , publisher =

    11 -. Quantitative Sociology , publisher =. 1975 , _series =

  58. [58]

    Journal of Applied Probability , author=

    Soft Modelling by Latent Variables: The Non-Linear Iterative Partial Least Squares (. Journal of Applied Probability , author=. 1975 , pages=

  59. [59]

    Random decision forests , year=

    Tin Kam Ho , booktitle=. Random decision forests , year=

  60. [60]

    Friedman , title =

    Jerome H. Friedman , title =. The Annals of Statistics , number =. 2001 , _doi =

  61. [61]

    1997 , _issn =

    A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting , journal =. 1997 , _issn =

  62. [62]

    and Guyon, Isabelle M

    Boser, Bernhard E. and Guyon, Isabelle M. and Vapnik, Vladimir N. , title =. Proceedings of the Fifth Annual Workshop on Computational Learning Theory , pages =. 1992 , _isbn =

  63. [63]

    Survival analysis of heart failure patients: A case study , volume =

    Tanvir Ahmad and Assia Munir and Sajjad Haider Bhatti and Muhammad Aftab and Muhammad Ali Raza , doi =. Survival analysis of heart failure patients: A case study , volume =. PLOS ONE , month =

  64. [64]

    Machine learning can predict survival of patients with heart failure from serum creatinine and ejection fraction alone , volume =

    Davide Chicco and Giuseppe Jurman , doi =. Machine learning can predict survival of patients with heart failure from serum creatinine and ejection fraction alone , volume =. BMC Medical Informatics and Decision Making , month =

  65. [65]

    Decision Support Systems62, 22–31 (2014) https://doi.org/ 10.1016/j.dss.2014.03.001 26

    A data-driven approach to predict the success of bank telemarketing , journal =. 2014 , _issn =. doi:https://doi.org/10.1016/j.dss.2014.03.001 , url =

  66. [66]

    Cortez, A

    Modeling wine preferences by data mining from physicochemical properties , journal =. 2009 , _issn =. doi:https://doi.org/10.1016/j.dss.2009.05.016 , url =

  67. [67]

    German , title =

    B. German , title =. 1987 , type =

  68. [68]

    1973 , _issn =

    The Use of Spark Source Mass Spectrometry for the Analysis of Glass Fragments Encountered in Forensic Applications, Part 2 , journal =. 1973 , _issn =. doi:https://doi.org/10.1016/S0015-7368(73)70826-4 , url =

  69. [69]

    1974 , _issn =

    A Report on an Investigation into the Trace Elements Present in Vehicle Headlamp and Auxiliary Lamp Glasses , journal =. 1974 , _issn =. doi:https://doi.org/10.1016/S0015-7368(74)70850-7 , url =

  70. [70]

    Grace C. Y. Peng and Mark Alber and Adrian Buganza Tepole and William R. Cannon and Suvranu De and Savador Dura-Bernal and Krishna Garikipati and George Karniadakis and William W. Lytton and Paris Perdikaris and Linda Petzold and Ellen Kuhl , url =. Multiscale Modeling Meets Machine Learning: What Can We Learn? , volume =. Archives of Computational Method...

  71. [71]

    ACM Comput

    Willard, Jared and Jia, Xiaowei and Xu, Shaoming and Steinbach, Michael and Kumar, Vipin , title =. ACM Comput. Surv. , month =. 2022 , issue_date =