pith. machine review for the scientific record. sign in

arxiv: 1701.06548 · v1 · submitted 2017-01-23 · 💻 cs.NE · cs.LG

Recognition: unknown

Regularizing Neural Networks by Penalizing Confident Output Distributions

Authors on Pith no claims yet
classification 💻 cs.NE cs.LG
keywords confidencedistributionsentropylabeloutputpenalizingpenaltysmoothing
0
0 comments X
read the original abstract

We systematically explore regularizing neural networks by penalizing low entropy output distributions. We show that penalizing low entropy output distributions, which has been shown to improve exploration in reinforcement learning, acts as a strong regularizer in supervised learning. Furthermore, we connect a maximum entropy based confidence penalty to label smoothing through the direction of the KL divergence. We exhaustively evaluate the proposed confidence penalty and label smoothing on 6 common benchmarks: image classification (MNIST and Cifar-10), language modeling (Penn Treebank), machine translation (WMT'14 English-to-German), and speech recognition (TIMIT and WSJ). We find that both label smoothing and the confidence penalty improve state-of-the-art models across benchmarks without modifying existing hyperparameters, suggesting the wide applicability of these regularizers.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 8 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. DeepL\'evy: Learning Heavy-Tailed Uncertainty in Highly Volatile Time Series

    cs.LG 2026-05 unverdicted novelty 7.0

    DeepLévy learns context-adaptive mixtures of Lévy stable distributions for multi-horizon probabilistic forecasting by minimizing empirical-parametric characteristic function discrepancies.

  2. DeepL\'evy: Learning Heavy-Tailed Uncertainty in Highly Volatile Time Series

    cs.LG 2026-05 unverdicted novelty 7.0

    DeepLévy learns mixtures of Lévy stable distributions for heavy-tailed time series forecasting by minimizing discrepancies between empirical and parametric characteristic functions, outperforming prior methods on tail...

  3. DeepL\'evy: Learning Heavy-Tailed Uncertainty in Highly Volatile Time Series

    cs.LG 2026-05 unverdicted novelty 7.0

    DeepLévy learns context-dependent mixtures of Lévy stable distributions for multi-horizon time series forecasting by matching empirical and parametric characteristic functions, yielding improved tail risk metrics over...

  4. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

    cs.CL 2019-10 accept novelty 7.0

    BART introduces a denoising pretraining method for seq2seq models that matches RoBERTa on GLUE and SQuAD while setting new state-of-the-art results on abstractive summarization, dialogue, and QA with up to 6 ROUGE gains.

  5. Annotations Mitigate Post-Training Mode Collapse

    cs.CL 2026-05 unverdicted novelty 6.0

    Annotation-anchored training reduces semantic diversity collapse in post-trained language models by a factor of six compared to standard supervised fine-tuning while preserving instruction-following and improving with scale.

  6. Can LLMs Learn to Reason Robustly under Noisy Supervision?

    cs.LG 2026-04 conditional novelty 6.0

    Online Label Refinement lets LLMs learn robust reasoning from noisy supervision by correcting labels when majority answers show rising rollout success and stable history, delivering 3-4% gains on math and reasoning be...

  7. Condensation Transition in Entropy-Constrained Probability Spaces

    cond-mat.stat-mech 2026-05 unverdicted novelty 5.0

    Below a critical entropy H_c ≈ log K - 1 + γ in the large-K limit, the typical fixed-entropy distribution on the probability simplex condenses so that one component holds a macroscopic probability fraction while the r...

  8. A Patch-based Cross-view Regularized Framework for Backdoor Defense in Multimodal Large Language Models

    cs.CV 2026-04 unverdicted novelty 5.0

    A patch-augmented cross-view regularization method reduces backdoor attack success rates in multimodal LLMs by enforcing output differences between original and perturbed views while using entropy constraints to prese...