pith. sign in

arxiv: 1712.09482 · v1 · pith:VO6U4TY7new · submitted 2017-12-27 · 📊 stat.ML · cs.LG

Robust Loss Functions under Label Noise for Deep Neural Networks

classification 📊 stat.ML cs.LG
keywords losslabelnoisefunctionsnetworksdeepunderclassification
0
0 comments X
read the original abstract

In many applications of classifier learning, training data suffers from label noise. Deep networks are learned using huge training data where the problem of noisy labels is particularly relevant. The current techniques proposed for learning deep networks under label noise focus on modifying the network architecture and on algorithms for estimating true labels from noisy labels. An alternate approach would be to look for loss functions that are inherently noise-tolerant. For binary classification there exist theoretical results on loss functions that are robust to label noise. In this paper, we provide some sufficient conditions on a loss function so that risk minimization under that loss function would be inherently tolerant to label noise for multiclass classification problems. These results generalize the existing results on noise-tolerant loss functions for binary classification. We study some of the widely used loss functions in deep networks and show that the loss function based on mean absolute value of error is inherently robust to label noise. Thus standard back propagation is enough to learn the true classifier even under label noise. Through experiments, we illustrate the robustness of risk minimization with such loss functions for learning neural networks.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Universal hidden monotonic trend estimation with contrastive learning

    cs.LG 2022-10 unverdicted novelty 5.0

    The paper proposes contrastive trend estimation (CTE) as a universal method for identifying hidden monotonic trends in any type of temporal data without standard assumptions.

  2. Optimal Query Allocation in Extractive QA with LLMs: A Learning-to-Defer Framework with Theoretical Guarantees

    cs.CL 2024-10 unverdicted novelty 4.0

    A learning-to-defer framework allocates extractive QA queries to LLM experts with theoretical optimality guarantees, shown to improve reliability and cut overhead on SQuAD and TriviaQA.