There Are Many Consistent Explanations of Unlabeled Data: Why You Should Average

· 2018 · cs.LG · arXiv 1806.05594

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open full Pith review browse 3 citing papers arXiv PDF

abstract

Presently the most successful approaches to semi-supervised learning are based on consistency regularization, whereby a model is trained to be robust to small perturbations of its inputs and parameters. To understand consistency regularization, we conceptually explore how loss geometry interacts with training procedures. The consistency loss dramatically improves generalization performance over supervised-only training; however, we show that SGD struggles to converge on the consistency loss and continues to make large steps that lead to changes in predictions on the test data. Motivated by these observations, we propose to train consistency-based methods with Stochastic Weight Averaging (SWA), a recent approach which averages weights along the trajectory of SGD with a modified learning rate schedule. We also propose fast-SWA, which further accelerates convergence by averaging multiple points within each cycle of a cyclical learning rate schedule. With weight averaging, we achieve the best known semi-supervised results on CIFAR-10 and CIFAR-100, over many different quantities of labeled training data. For example, we achieve 5.0% error on CIFAR-10 with only 4000 labels, compared to the previous best result in the literature of 6.3%.

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

The Statistical Cost of Adaptation in Multi-Source Transfer Learning

math.ST · 2026-05-10 · unverdicted · novelty 8.0

Multi-source transfer learning incurs an intrinsic adaptation cost that can exceed one, with phase transitions separating regimes where bias-agnostic estimators match oracle performance from those where they cannot.

Neural Cluster First, Route Second: One-Shot Capacitated Vehicle Routing via Differentiable Optimal Transport

cs.LG · 2026-05-10 · unverdicted · novelty 7.0

Neural CFRS is a non-autoregressive one-shot framework for CVRP that uses entropic optimal transport for capacitated clustering and achieves competitive gaps on large instances.

Causal Fine-Tuning under Latent Confounded Shift

cs.LG · 2024-10-18 · unverdicted · novelty 5.0

Causal Fine-Tuning decomposes BERT representations into causal and spurious parts via SCM inductive bias to improve robustness under latent confounded shifts in text classification.

citing papers explorer

Showing 3 of 3 citing papers.

The Statistical Cost of Adaptation in Multi-Source Transfer Learning math.ST · 2026-05-10 · unverdicted · none · ref 229
Multi-source transfer learning incurs an intrinsic adaptation cost that can exceed one, with phase transitions separating regimes where bias-agnostic estimators match oracle performance from those where they cannot.
Neural Cluster First, Route Second: One-Shot Capacitated Vehicle Routing via Differentiable Optimal Transport cs.LG · 2026-05-10 · unverdicted · none · ref 1
Neural CFRS is a non-autoregressive one-shot framework for CVRP that uses entropic optimal transport for capacitated clustering and achieves competitive gaps on large instances.
Causal Fine-Tuning under Latent Confounded Shift cs.LG · 2024-10-18 · unverdicted · none · ref 4 · internal anchor
Causal Fine-Tuning decomposes BERT representations into causal and spurious parts via SCM inductive bias to improve robustness under latent confounded shifts in text classification.

There Are Many Consistent Explanations of Unlabeled Data: Why You Should Average

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer