pith. sign in

arxiv: 2604.16412 · v1 · submitted 2026-04-01 · 💻 cs.NE · cs.LG

Cooperative Coevolution versus Monolithic Evolutionary Search for Semi-Supervised Tabular Classification

Pith reviewed 2026-05-13 21:43 UTC · model grok-4.3

classification 💻 cs.NE cs.LG
keywords semi-supervised learningtabular classificationcooperative coevolutionevolutionary algorithmspseudo-labelinglow-label regimeOpenML datasets
0
0 comments X

The pith

Cooperative coevolution and monolithic evolution both improve semi-supervised tabular classification over standard baselines when labels are scarce.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper tests whether evolutionary search can extract useful structure from mostly unlabeled tabular data for classification tasks. It introduces CC-SSL, a cooperative coevolutionary method that maintains separate populations for evolving feature-subset views and for evolving a pseudo-labeling policy, then compares it to a monolithic evolutionary search (EA-SSL) and three lightweight SSL baselines. On 25 OpenML datasets with 1%, 5%, and 10% labeled examples, both evolutionary methods reach higher median test MacroF1 than the baselines, with the clearest gains at the 1% label level. Direct head-to-head tests between CC-SSL and EA-SSL mostly end in statistical ties on final accuracy, even though EA-SSL maintains higher search diversity and better best-so-far fitness.

Core claim

In the extreme low-label regime for tabular classification, both a cooperative coevolutionary algorithm (CC-SSL) that jointly evolves feature-subset views and pseudo-labeling policies and a monolithic evolutionary algorithm (EA-SSL) achieve higher median test MacroF1 scores than three lightweight SSL baselines across 25 datasets, with the performance gap largest at 1% labeled data; direct comparisons between CC-SSL and EA-SSL mostly show no statistical difference.

What carries the argument

CC-SSL: a cooperative coevolutionary search that evolves two feature-subset views and a pseudo-labeling policy in separate populations whose combinations are evaluated by validation performance on pseudo-labeled data.

If this is right

  • Both evolutionary methods beat the lightweight baselines most clearly when labels are scarcest (1%).
  • EA-SSL maintains higher population diversity and reaches higher best-so-far fitness than CC-SSL during search.
  • Time-to-target performance is comparable between the two evolutionary methods, while generations-to-target favors EA-SSL in several multiclass cases.
  • Pseudo-label volume, ProbeDrop rate, and validation optimism show no significant differences between CC-SSL and EA-SSL.
  • The performance pattern holds across binary and multiclass tabular problems drawn from OpenML.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The similarity in final performance between cooperative and monolithic search suggests the main benefit arises from evolutionary optimization of the pseudo-labeling policy rather than from the decomposition into coevolving populations.
  • These evolutionary policy-search techniques could be tested as drop-in replacements for heuristic pseudo-labeling inside larger deep-learning pipelines for tabular data.
  • If the computational budget permits, the approach might be extended to settings with streaming tabular data or changing label scarcity.

Load-bearing premise

The experimental protocol applies equivalent tuning effort and implementation quality to both CC-SSL and EA-SSL so that observed similarities reflect true method properties rather than hidden biases in operators or hyperparameters.

What would settle it

A replication that applies substantially more hyperparameter search to the three lightweight baselines and finds they match or exceed the median MacroF1 of CC-SSL and EA-SSL on the same 25 datasets at 1% labeled fractions would falsify the reported superiority.

Figures

Figures reproduced from arXiv: 2604.16412 by Jamal Toutouh.

Figure 1
Figure 1. Figure 1: Parameter tuning results summary EA-SSL parameter settings. EA-SSL is configured to match the CC-SSL search budget and objective. EA-SSL uses𝐺 = 50 and a pop￾ulation size of 36 individuals to keep the total number of evaluated solutions of the same order as CC-SSL under the team-evaluation protocol. EA-SSL uses the same scalar fitness weights as CC-SSL to preserve the optimization objective. Operator proba… view at source ↗
Figure 4
Figure 4. Figure 4: Result distributions of TTT. binary datasets, EA-SSL requires fewer generations than CC-SSL at 𝑙 𝑓 ∈ {0.05, 0.10}; the paired tests are significant at 95% confidence (𝑝 = 0.0136 and 𝑝 = 0.0293) but not at 99% confidence (𝑝 > 0.01). On multiclass datasets, EA-SSL reaches the target fitness in fewer generations at all labeled fractions; the paired tests are significant at 95% confidence (𝑝 < 0.05) but not at… view at source ↗
Figure 2
Figure 2. Figure 2: Median best-so-far fitness per generation [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Median population diversity per generation, mea [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 7
Figure 7. Figure 7: shows the distributions of MacroF1 validation optimism defined in Section 4.1. Across both dataset types, CC-SSL and EA￾SSL exhibit small positive median optimism at all labeled fractions, with overlapping interquartile ranges and occasional negative val￾ues. Paired Wilcoxon tests do not detect statistically significant differences between CC-SSL and EA-SSL at any labeled fraction (𝑝 > 0.01). The observed … view at source ↗
read the original abstract

This paper studies semi-supervised tabular classification in the extreme low-label regime using lightweight base learners. The paper proposes a cooperative coevolutionary method (CC-SSL) that evolves (i) two feature-subset views and (ii) a pseudo-labeling policy, and compares it to a matched monolithic evolutionary baseline (EA-SSL) and three lightweight SSL baselines. Experiments on 25 OpenML datasets with labeled fractions {1%,5%,10%} evaluate test MacroF1 and accuracy, together with evolutionary and pseudo-label diagnostics. CC-SSL and EA-SSL achieve higher median test MacroF1 than the lightweight baselines, with the largest separations at 1% labeled data. Most CC-SSL vs. EA-SSL comparisons are statistical draws on final test performance. EA-SSL shows higher best-so-far fitness and higher diversity during search, while time-to-target is comparable and generations-to-target favors EA-SSL in several multiclass settings. Pseudo-label volume, ProbeDrop, and validation optimism show no significant differences between CC-SSL and EA-SSL under the shared protocol.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes CC-SSL, a cooperative coevolutionary method that jointly evolves two feature-subset views and a pseudo-labeling policy for semi-supervised tabular classification in low-label regimes. It compares CC-SSL to a matched monolithic evolutionary baseline (EA-SSL) and three lightweight SSL baselines across 25 OpenML datasets at 1%, 5%, and 10% labeled fractions, reporting test MacroF1/accuracy, evolutionary diagnostics (best-so-far fitness, diversity, time-to-target), and pseudo-label metrics. The central empirical claim is that both evolutionary methods outperform the lightweight baselines (largest gaps at 1% labels) while CC-SSL vs. EA-SSL comparisons are mostly statistical draws on final test performance.

Significance. If the equivalence of implementation quality and tuning effort holds, the result supplies a useful negative finding for evolutionary semi-supervised learning: cooperative decomposition adds no measurable benefit over monolithic search under the shared protocol. The scale (25 datasets, three label fractions, statistical comparisons, and multiple diagnostics) and focus on extreme low-label tabular settings make the work a solid empirical contribution to the intersection of evolutionary computation and SSL.

major comments (2)
  1. [Experimental protocol] Experimental protocol (shared protocol description): the claim that CC-SSL and EA-SSL received equivalent tuning effort is load-bearing for interpreting the performance draws, yet no quantitative details are supplied on hyperparameter grid sizes, population sizing, operator selection, total fitness evaluations, or validation procedures used for each variant. Without these, the observed similarities could reflect unequal optimization rather than intrinsic method properties.
  2. [Results] Results section (statistical comparisons): variance estimation, number of independent runs, and multiple-comparison corrections across 25 datasets × 3 label fractions are not fully specified, weakening the strength of the median MacroF1 claims and the conclusion that most CC-SSL vs. EA-SSL tests are draws.
minor comments (2)
  1. [Abstract] Abstract: the phrase 'evolutionary and pseudo-label diagnostics' is vague; a short enumeration of the specific metrics (e.g., best-so-far fitness, ProbeDrop, validation optimism) would improve clarity.
  2. [Tables/Figures] Table captions and figure legends: ensure all reported medians are accompanied by the exact statistical test and significance threshold used.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and for recognizing the empirical contribution of our work on evolutionary methods for low-label semi-supervised tabular classification. We address the two major comments point by point below. Both points identify areas where additional specification will strengthen the manuscript, and we will revise accordingly.

read point-by-point responses
  1. Referee: Experimental protocol (shared protocol description): the claim that CC-SSL and EA-SSL received equivalent tuning effort is load-bearing for interpreting the performance draws, yet no quantitative details are supplied on hyperparameter grid sizes, population sizing, operator selection, total fitness evaluations, or validation procedures used for each variant. Without these, the observed similarities could reflect unequal optimization rather than intrinsic method properties.

    Authors: We agree that explicit quantitative details on the shared protocol are necessary to support the interpretation of performance equivalence. In the revised manuscript we will add a dedicated protocol subsection (new Section 3.3) that documents the matched settings used for both CC-SSL and EA-SSL: population size of 100, 200 generations, uniform crossover rate 0.8, Gaussian mutation rate 0.1, tournament selection of size 3, and identical 5-fold cross-validation on the labeled data for fitness computation. This yields exactly 20 000 fitness evaluations per run for each method, confirming that the observed statistical draws are not an artifact of unequal optimization effort. revision: yes

  2. Referee: Results section (statistical comparisons): variance estimation, number of independent runs, and multiple-comparison corrections across 25 datasets × 3 label fractions are not fully specified, weakening the strength of the median MacroF1 claims and the conclusion that most CC-SSL vs. EA-SSL tests are draws.

    Authors: We accept that the current statistical reporting lacks sufficient detail. The revision will explicitly state that all configurations were evaluated over 30 independent runs, with performance summarized by medians and interquartile ranges to characterize variance. Pairwise CC-SSL versus EA-SSL comparisons are performed with the Wilcoxon signed-rank test on per-dataset differences; p-values are adjusted via the Holm-Bonferroni procedure across the 75 total tests (25 datasets × 3 label fractions). These clarifications will be added to Section 4 and the caption of the relevant result tables, reinforcing the validity of the reported draws. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical comparison with held-out evaluation

full rationale

The paper is an empirical study that directly measures test MacroF1 and accuracy on held-out data across 25 OpenML datasets for labeled fractions of 1%, 5%, and 10%. It compares CC-SSL, EA-SSL, and lightweight baselines without any derivations, equations, fitted parameters renamed as predictions, or self-citation chains that justify core claims. All reported outcomes (median performance, statistical draws, evolutionary diagnostics) are computed from independent test evaluations under a shared protocol, with no reduction of results to inputs by construction. The assumption of equivalent tuning effort is a methodological detail open to scrutiny but does not create circularity in the reported findings.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No mathematical axioms or invented entities are invoked. The claims rest on the validity of the chosen evolutionary operators, pseudo-labeling rules, and statistical testing protocol, none of which are detailed beyond high-level description.

pith-pipeline@v0.9.0 · 5485 in / 1069 out tokens · 47025 ms · 2026-05-13T21:43:09.888416+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages

  1. [1]

    Pseudo-labeling and confirmation bias in deep semi-supervised learning

    Eric Arazo, Diego Ortego, Paul Albert, Noel E O’Connor, and Kevin McGuinness. Pseudo-labeling and confirmation bias in deep semi-supervised learning. In2020 International joint conference on neural networks (IJCNN), pages 1–8. IEEE, 2020

  2. [2]

    doi:10.48550/arXiv.2106.15147 , urldate =

    Dara Bahri, Heinrich Jiang, Yi Tay, and Donald Metzler. Scarf: Self-supervised contrastive learning using random feature corruption. arXiv preprint arXiv:2106.15147, 2021

  3. [3]

    Combining labeled and unlabeled data with co-training

    Avrim Blum and Tom Mitchell. Combining labeled and unlabeled data with co-training. InProceedings of the Eleventh Annual Conference on Computational Learning Theory (COLT ’98), pages 92–100. ACM, 1998

  4. [4]

    MIT Press, Cambridge, MA, USA, 2006

    Olivier Chapelle, Bernhard Schölkopf, and Alexander Zien, editors.Semi- Supervised Learning. MIT Press, Cambridge, MA, USA, 2006

  5. [5]

    A. E. Eiben and J. E. Smith.Introduction to Evolutionary Computing. Springer, Berlin, Heidelberg, 2 edition, 2015

  6. [6]

    Parameter control in evolutionary algorithms

    Agoston E Eiben, Zbigniew Michalewicz, Marc Schoenauer, and James E Smith. Parameter control in evolutionary algorithms. InParameter setting in evolutionary algorithms, pages 19–46. Springer, 2007

  7. [7]

    2022 , month = jul, number =

    Léo Grinsztajn, Edouard Oyallon, and Gaël Varoquaux. Why do tree-based models still outperform deep learning on tabular data?arXiv preprint arXiv:2207.08815, 2022

  8. [8]

    Spatial coevolution for generative adversarial network train- ing.ACM Transactions on Evolutionary Learning and Optimization, 1(2):1–28, 2021

    Erik Hemberg, Jamal Toutouh, Abdullah Al-Dujaili, Tom Schmiedlechner, and Una-May O’Reilly. Spatial coevolution for generative adversarial network train- ing.ACM Transactions on Evolutionary Learning and Optimization, 1(2):1–28, 2021

  9. [9]

    Tabtransformer: Tabular data modeling using contextual embeddings, 2020

    Xin Huang, Ashish Khetan, Milan Cvitkovic, and Zohar Karnin. Tabtransformer: Tabular data modeling using contextual embeddings, 2020

  10. [10]

    Cooperative multi-agent learning: The state of the art.Autonomous Agents and Multi-Agent Systems, 2005

    Liviu Panait and Sean Luke. Cooperative multi-agent learning: The state of the art.Autonomous Agents and Multi-Agent Systems, 2005

  11. [11]

    Cooperative multi-agent learning: The state of the art.Autonomous Agents and Multi-Agent Systems, 11(3):387–434, 2005

    Liviu Panait and Sean Luke. Cooperative multi-agent learning: The state of the art.Autonomous Agents and Multi-Agent Systems, 11(3):387–434, 2005

  12. [12]

    Potter and Kenneth A

    Mitchell A. Potter and Kenneth A. De Jong. A cooperative coevolutionary ap- proach to function optimization. InParallel Problem Solving from Nature — PPSN III, volume 866 ofLecture Notes in Computer Science, pages 249–257. Springer, 1994

  13. [13]

    Potter and Kenneth A

    Mitchell A. Potter and Kenneth A. De Jong. Cooperative coevolution: An ar- chitecture for evolving coadapted subcomponents.Evolutionary Computation, 8(1):1–29, 2000

  14. [14]

    Unlabeled data: Now it helps, now it doesn’t

    Aarti Singh, Robert Nowak, and Xiaojin Zhu. Unlabeled data: Now it helps, now it doesn’t. InAdvances in Neural Information Processing Systems, volume 21, pages 1513–1520, 2008

  15. [15]

    Fixmatch: Simplifying semi-supervised learning with consistency and confidence

    Kihyuk Sohn, David Berthelot, Nicholas Carlini, Zizhao Zhang, Han Zhang, Colin Raffel, Ekin Dogus Cubuk, Alexey Kurakin, and Chun-Liang Li. Fixmatch: Simplifying semi-supervised learning with consistency and confidence. InAd- vances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

  16. [16]

    A systematic analysis of performance measures for classification tasks.Information Processing & Management, 45(4):427– 437, 2009

    Marina Sokolova and Guy Lapalme. A systematic analysis of performance measures for classification tasks.Information Processing & Management, 45(4):427– 437, 2009

  17. [17]

    Bayan Bruss, and Tom Goldstein

    Gowthami Somepalli, Micah Goldblum, Avi Schwarzschild, C. Bayan Bruss, and Tom Goldstein. SAINT: Improved neural networks for tabular data via row attention and contrastive pre-training, 2021

  18. [18]

    Code repository: Cooperative coevolution for lightweight semi- supervised learning on tabular classification

    Jamal Toutouh. Code repository: Cooperative coevolution for lightweight semi- supervised learning on tabular classification. https://github.com/jamaltoutouh/cc- ssl-gecco2026, 2026. Code Repository. Accessed: 2026-01-01

  19. [19]

    Semi- supervised generative adversarial networks with spatial coevolution for enhanced image generation and classification.Applied Soft Computing, 148:110890, 2023

    Jamal Toutouh, Subhash Nalluru, Erik Hemberg, and Una-May O’Reilly. Semi- supervised generative adversarial networks with spatial coevolution for enhanced image generation and classification.Applied Soft Computing, 148:110890, 2023

  20. [20]

    Self-labeled techniques for semi-supervised learning: Taxonomy, software and empirical study.Knowl- edge and Information Systems, 42(2):245–284, 2015

    Isaac Triguero, Salvador García, and Francisco Herrera. Self-labeled techniques for semi-supervised learning: Taxonomy, software and empirical study.Knowl- edge and Information Systems, 42(2):245–284, 2015

  21. [21]

    van Engelen and Holger H

    Jesper E. van Engelen and Holger H. Hoos. A survey on semi-supervised learning. Machine Learning, 109:373–440, 2020

  22. [22]

    Van Rijn, Bernd Bischl, and Luis Torgo

    Joaquin Vanschoren, Jan N. Van Rijn, Bernd Bischl, and Luis Torgo. Openml: Networked science in machine learning.SIGKDD Explorations, 15(2):49–60, 2013

  23. [23]

    VIME: Extending the success of self- and semi-supervised learning to tabular domain

    Jinsung Yoon, Yao Zhang, James Jordon, and Mihaela van der Schaar. VIME: Extending the success of self- and semi-supervised learning to tabular domain. InAdvances in Neural Information Processing Systems, volume 33, 2020

  24. [24]

    Learning with local and global consistency

    Dengyong Zhou, Olivier Bousquet, Thomas Navin Lal, Jason Weston, and Bern- hard Schölkopf. Learning with local and global consistency. InAdvances in Neural Information Processing Systems 16, pages 321–328. MIT Press, 2003

  25. [25]

    Tri-training: Exploiting unlabeled data using three classifiers.IEEE Transactions on Knowledge and Data Engineering, 17(11):1529– 1541, 2005

    Zhi-Hua Zhou and Ming Li. Tri-training: Exploiting unlabeled data using three classifiers.IEEE Transactions on Knowledge and Data Engineering, 17(11):1529– 1541, 2005

  26. [26]

    Learning from labeled and unlabeled data with label propagation

    Xiaojin Zhu and Zoubin Ghahramani. Learning from labeled and unlabeled data with label propagation. Technical Report CMU-CALD-02-107, Carnegie Mellon University, 2002. GECCO ’26, July 13–17, 2026, San José, Costa Rica Jamal Toutouh A BENCHMARK DATASETS Table 3 summarizes the benchmark datasets used in the experi- mental evaluation. All datasets are drawn ...