Embracing Biased Transition Matrices for Complementary-Label Learning with Many Classes

Chao-Kai Chiang; Gang Niu; Han-Hwa Shih; Hsuan-Tien Lin; Masashi Sugiyama; Tan-Ha Mai

arxiv: 2605.15586 · v2 · pith:N5CGKZWZnew · submitted 2026-05-15 · 💻 cs.LG · cs.AI· cs.CV

Embracing Biased Transition Matrices for Complementary-Label Learning with Many Classes

Tan-Ha Mai , Chao-Kai Chiang , Han-Hwa Shih , Gang Niu , Masashi Sugiyama , Hsuan-Tien Lin This is my paper

Pith reviewed 2026-05-20 19:55 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CV

keywords complementary-label learningbiased transition matrixweakly supervised learningmany-class classificationCIFAR-100TinyImageNet

0 comments

The pith

By designing a biased non-uniform process for complementary labels restricted to class subsets, CLL scales to 100+ classes with over sevenfold accuracy gains.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that the uniform generation assumption in complementary-label learning dilutes the signal too severely for large label spaces, confining success to 10-class tasks. It demonstrates that this barrier is overcome by deliberately using a biased generation process that restricts complementary labels to a subset of classes, with the resulting transition matrix incorporated into training. This motivates the Bias-Induced Constrained Labeling (BICL) framework that spans data collection and model fitting. Experiments show BICL delivers effective learning on CIFAR-100 and TinyImageNet-200. A sympathetic reader cares because the approach turns a long-standing limitation into a controllable design choice for real-world many-class applications.

Core claim

The central claim is that a deliberately biased transition matrix, induced by restricting complementary labels to a known subset of classes, preserves a usable learning signal and thereby enables complementary-label learning to succeed on problems with 100 or more classes, as shown by the BICL framework's performance improvements.

What carries the argument

The biased (non-uniform) transition matrix that encodes the restricted complementary-label generation process and is used directly in the training objective.

If this is right

CLL becomes practical for 100-class and 200-class image datasets rather than remaining limited to 10 classes.
Accuracy gains exceeding seven times those of traditional methods are achievable when the bias is known.
Real-world CLL applications become feasible when annotation processes can enforce and record the restricted label generation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Data annotation pipelines could be redesigned to intentionally introduce and document known biases instead of striving for uniformity.
The same bias-leveraging principle may extend to other weak-supervision settings where label noise or incompleteness can be controlled.
Optimal subset size for the restriction could be studied as a tunable parameter for different numbers of classes.

Load-bearing premise

The data collection process can be controlled so the biased complementary-label generation is known and matches the transition matrix used at training time.

What would settle it

If applying BICL with the correctly estimated biased transition matrix on CIFAR-100 produces no accuracy improvement over uniform-assumption baselines, the central claim would be falsified.

Figures

Figures reproduced from arXiv: 2605.15586 by Chao-Kai Chiang, Gang Niu, Han-Hwa Shih, Hsuan-Tien Lin, Masashi Sugiyama, Tan-Ha Mai.

**Figure 1.** Figure 1: BICL Practical Case: Overview of the proposed practical design for bias-induced constrained labeling (BICL) that operates without true label access. Existing CLL models generally assume that complementary labels (CLs) are generated from the true class according to a transition matrix [9, 17, 18, 19, 20], which plays a central role in CLL studies. [9] pioneered the theoretical study of CLL by assuming a zer… view at source ↗

**Figure 2.** Figure 2: BICL Analysis Case: Overview of the proof of concept design for complementary candidate-label selection that operates with true label access. It consists of (1) reducing candidate label selection, (2) VLM-based complementary-label annotation with a negative prompt. Extending CLL to many-class settings raises a key question: “what is the main difficulty?” The challenge is twofold. First, from the labeling p… view at source ↗

**Figure 3.** Figure 3: BICL transition matrix compared to other label collection approaches. [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Test accuracy during training on four datasets. Our method reaches its peak performance [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Per-class accuracy comparison between the uniform complementary-label distribution and [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗

**Figure 6.** Figure 6: Performance comparison of different number of sampled label with BICL. [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗

**Figure 7.** Figure 7: Label distributions across CIFAR-10 variants. CIFAR-10 represents an idealized setting with noiseless, uniformly distributed labels. CLCIFAR-10 corresponds to a human-annotated and ACLCIFAR-10 is VLM-annotated under a uniform distribution design. 0 1 2 3 4 5 6 7 8 9 10111213141516171819 Class Index 0 300 600 900 1200 1500 1800 2100 2400 2700 Count (a) CIFAR-20 0 1 2 3 4 5 6 7 8 9 10111213141516171819 Class… view at source ↗

**Figure 8.** Figure 8: Label distributions across CIFAR-20 variants. [PITH_FULL_IMAGE:figures/full_fig_p019_8.png] view at source ↗

**Figure 9.** Figure 9: Label distributions across CIFAR-100 variants. [PITH_FULL_IMAGE:figures/full_fig_p019_9.png] view at source ↗

**Figure 10.** Figure 10: Bias-Induced Constrained Labeling transition matrix on CIFAR-20 variants. [PITH_FULL_IMAGE:figures/full_fig_p020_10.png] view at source ↗

**Figure 11.** Figure 11: Performance comparison of different data augmentation strategies integrated with BICL [PITH_FULL_IMAGE:figures/full_fig_p021_11.png] view at source ↗

**Figure 12.** Figure 12: Transition matrix induced by the BICL protocol when using different encoder backbones [PITH_FULL_IMAGE:figures/full_fig_p022_12.png] view at source ↗

**Figure 13.** Figure 13: BICL performance with different encoder networks used in the label-selection stage, [PITH_FULL_IMAGE:figures/full_fig_p022_13.png] view at source ↗

**Figure 14.** Figure 14: Performance comparison of different prompt on model performance across datasets. [PITH_FULL_IMAGE:figures/full_fig_p024_14.png] view at source ↗

**Figure 15.** Figure 15: Effect of label-space size on performance within CIFAR-100 and TinyImageNet-200. [PITH_FULL_IMAGE:figures/full_fig_p028_15.png] view at source ↗

**Figure 16.** Figure 16: Bias-Induced Constrained Labeling transition matrix on CIFAR-100 variants. [PITH_FULL_IMAGE:figures/full_fig_p033_16.png] view at source ↗

read the original abstract

Complementary-label learning (CLL) is a weakly supervised paradigm where instances are labeled with classes they do not belong to. Despite a decade of research, CLL methods remain competitive mainly on 10-class classification, with scaling to large label spaces continuing to be an enduring bottleneck. This limitation stems from the common assumption of uniform label generation in traditional methods, which fatally dilutes the learning signal in many-class settings. In this paper, we demonstrate that this long-standing barrier can be overcome by deliberately designing a biased (non-uniform) generation process that restricts complementary labels to a subset of classes. This finding motivates us to propose Bias-Induced Constrained Labeling (BICL), a principled framework spanning data collection to training that leverages this bias. BICL enables effective learning on CIFAR-100 and TinyImageNet-200, achieving more than sevenfold accuracy improvements over traditional methods. Our findings establish a new trajectory for making CLL feasible for many classes in real-world applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

BICL shows biased non-uniform complementary labels can scale CLL to 100-class problems with large gains, but the whole thing depends on knowing the exact transition matrix.

read the letter

The core advance is showing that the uniform complementary-label assumption was the real limiter for scaling past 10 classes. By deliberately restricting the complementary labels to a subset via a controllable biased generation process, they get more than sevenfold accuracy lifts on CIFAR-100 and TinyImageNet-200 compared with prior CLL methods. The BICL framework ties the biased transition matrix into both data collection and the training objective, which is a clean way to keep the learning signal from washing out in large label spaces. That part is new relative to the uniform-generation literature they cite and looks practically motivated for image tasks where you can influence how labels are collected. The empirical claims are the strongest part of what is visible. The main soft spot is the assumption that the biased transition matrix T is known exactly and matches the true data-generating process. If T is even modestly misspecified, the unbiased risk estimator derived from it will be off, and the reported gains may not hold. The abstract gives no sensitivity checks or estimation procedure for T, so it is unclear how robust the method is when the bias cannot be perfectly controlled or measured in a real pipeline. This is a real limitation rather than a minor detail. The paper is aimed at researchers working on weakly supervised classification who need to handle many classes. Anyone trying to move CLL out of toy 10-class settings would find the empirical results and the biased-matrix idea worth examining. It is coherent on its own terms and deserves a serious referee to check the derivations, ablations, and how the bias is actually enforced in the experiments.

Referee Report

2 major / 1 minor

Summary. The paper claims that traditional complementary-label learning (CLL) fails to scale beyond 10 classes due to the uniform transition matrix assumption, which dilutes the signal in large label spaces. It proposes Bias-Induced Constrained Labeling (BICL), a framework that deliberately imposes a biased (non-uniform) complementary-label generation process restricting labels to class subsets, derives an unbiased risk estimator from the known biased transition matrix T, and reports more than sevenfold accuracy gains on CIFAR-100 and TinyImageNet-200 over prior CLL methods.

Significance. If the central claim holds under the stated assumptions, BICL would represent a meaningful shift in CLL by moving from passive uniform labeling to controlled biased data collection, potentially making the paradigm viable for real-world many-class problems. The empirical scale of the reported gains, if reproducible with proper controls, would be notable; however, the significance hinges on whether the bias can be practically enforced and whether the estimator remains robust outside idealized settings.

major comments (2)

[BICL framework] The unbiased risk estimator derivation (BICL framework section) treats the biased transition matrix T as known exactly and matching the data-generating process. No estimation procedure, sensitivity analysis, or robustness experiments are provided for cases where the assumed T deviates from the true generation process; this is load-bearing because even modest misspecification would bias the estimator and undermine the reported gains.
[Experiments] Experiments on CIFAR-100 and TinyImageNet-200 claim >7x accuracy improvements, but lack details on how the biased complementary labels were actually generated and imposed during data collection, exact parameterization of the bias distribution, error bars across runs, and ablations isolating the effect of the bias parameters versus other modeling choices.

minor comments (1)

[Introduction / Framework] Notation for the biased transition matrix T and the subset restriction should be introduced with an explicit equation early in the framework section to improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed review. We address each major comment below and indicate the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: [BICL framework] The unbiased risk estimator derivation (BICL framework section) treats the biased transition matrix T as known exactly and matching the data-generating process. No estimation procedure, sensitivity analysis, or robustness experiments are provided for cases where the assumed T deviates from the true generation process; this is load-bearing because even modest misspecification would bias the estimator and undermine the reported gains.

Authors: We appreciate this observation. In the BICL framework, the transition matrix T is deliberately designed and imposed as part of the controlled data collection process, so it is known exactly by construction rather than estimated. This enables the exact unbiased risk estimator under the stated assumptions. We agree that robustness to misspecification is important to demonstrate. In the revised manuscript we will add a sensitivity analysis together with experiments that quantify estimator degradation under controlled deviations from the assumed T. revision: yes
Referee: [Experiments] Experiments on CIFAR-100 and TinyImageNet-200 claim >7x accuracy improvements, but lack details on how the biased complementary labels were actually generated and imposed during data collection, exact parameterization of the bias distribution, error bars across runs, and ablations isolating the effect of the bias parameters versus other modeling choices.

Authors: We agree that these details are necessary for reproducibility and for isolating the contribution of the bias. In the revision we will expand the experimental section to describe the exact procedure used to generate and impose the biased complementary labels, provide the precise parameterization of the bias distribution, report error bars from multiple independent runs, and include ablation studies that vary the bias parameters while holding other modeling choices fixed. revision: yes

Circularity Check

0 steps flagged

BICL framework derivation is self-contained with no reduction to inputs by construction

full rationale

The paper introduces BICL by proposing a deliberately biased complementary-label generation process whose transition matrix T is treated as known and controlled at data collection time. The derivation of the unbiased risk estimator follows directly from this known T via standard risk correction techniques for complementary labels; this is a forward mathematical construction rather than a tautology or fitted quantity renamed as prediction. No equations in the abstract or described framework reduce the final performance claim to a parameter fit on the target data, nor does the argument rest on self-citation chains or imported uniqueness theorems. Empirical gains on CIFAR-100 and TinyImageNet-200 are presented as validation under the stated assumption, not as evidence that forces the result. The central premise therefore remains an external modeling choice (controllable biased labeling) rather than a self-referential loop.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

BICL rests on the ability to control and model a non-uniform transition matrix during data collection; this introduces free parameters for the bias distribution and a domain assumption that the matrix remains known and stable at training time.

free parameters (1)

bias distribution parameters
Parameters that define the non-uniform probabilities over the restricted subset of complementary classes; these must be set or estimated to realize the claimed gains.

axioms (1)

domain assumption The biased generation process can be enforced at data collection time and the resulting transition matrix is known exactly for training.
Invoked when the paper states that BICL spans data collection to training by leveraging the bias.

pith-pipeline@v0.9.0 · 5716 in / 1317 out tokens · 56833 ms · 2026-05-20T19:55:09.505054+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We provide an information-theoretic perspective... Theorem 1 (Lower Bound on Supervision Error)... H_Q(Y|¯Y) quantifies the uncertainty...
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

BICL... deliberately designing a biased (non-uniform) generation process that restricts complementary labels to a subset of classes.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

57 extracted references · 57 canonical work pages · 1 internal anchor

[1]

The unexplored potential of vision-language models for generating large-scale complementary-label learning data

Tan-Ha Mai, Nai-Xuan Ye, Yu-Wei Kuan, Po-Yi Lu, and Hsuan-Tien Lin. The unexplored potential of vision-language models for generating large-scale complementary-label learning data. InPacific-Asia Conference on Knowledge Discovery and Data Mining, pages 90–102, 2025

work page 2025
[2]

MIT Press, 2022

Masashi Sugiyama, Han Bao, Takashi Ishida, Nan Lu, Tomoya Sakai, and Gang Niu.Machine learning from weak supervision: An empirical risk minimization approach. MIT Press, 2022

work page 2022
[3]

Learning classifiers from only positive and unlabeled data

Charles Elkan and Keith Noto. Learning classifiers from only positive and unlabeled data. InProceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 213–220, 2008

work page 2008
[4]

du Plessis, and Masashi Sugiyama

Ryuichi Kiryo, Gang Niu, Marthinus C. du Plessis, and Masashi Sugiyama. Positive-unlabeled learning with non-negative risk estimator. InAdvances in Neural Information Processing Systems, volume 30, pages 1675–1685, 2017

work page 2017
[5]

Learning from similarity- confidence data

Yuzhou Cao, Lei Feng, Yitian Xu, Bo An, Gang Niu, and Masashi Sugiyama. Learning from similarity- confidence data. InProceedings of the 38th International Conference on Machine Learning, pages 1272–1282, 2021

work page 2021
[6]

Binary classification with confidence difference

Wei Wang, Lei Feng, Yuchen Jiang, Gang Niu, Min-Ling Zhang, and Masashi Sugiyama. Binary classification with confidence difference. InAdvances in Neural Information Processing Systems 36, pages 5936–5960, 2023

work page 2023
[7]

Classification from pairwise similarity and unlabeled data

Han Bao, Gang Niu, and Masashi Sugiyama. Classification from pairwise similarity and unlabeled data. In Proceedings of the 35th International Conference on Machine Learning, pages 461–470, 2018

work page 2018
[8]

Pairwise supervision can provably elicit a decision boundary

Han Bao, Takuya Shimada, Liyuan Xu, Issei Sato, and Masashi Sugiyama. Pairwise supervision can provably elicit a decision boundary. InProceedings of the 25th International Conference on Artificial Intelligence and Statistics, pages 2618–2640, 2022

work page 2022
[9]

Learning from complementary labels

Takashi Ishida, Gang Niu, Weihua Hu, and Masashi Sugiyama. Learning from complementary labels. In Advances in Neural Information Processing Systems, page 5639–5649, 2017

work page 2017
[10]

Unbiased risk estimators can mislead: A case study of learning with complementary labels

Yu-Ting Chou, Gang Niu, Hsuan-Tien Lin, and Masashi Sugiyama. Unbiased risk estimators can mislead: A case study of learning with complementary labels. InInternational Conference on Machine Learning, pages 1929–1938, 2020

work page 1929
[11]

Learning with multiple labels.Advances in Neural Information Processing Systems, 15, 2002

Rong Jin and Zoubin Ghahramani. Learning with multiple labels.Advances in Neural Information Processing Systems, 15, 2002

work page 2002
[12]

Progressive identification of true labels for partial-label learning

Jiaqi Lv, Miao Xu, Lei Feng, Gang Niu, Xin Geng, and Masashi Sugiyama. Progressive identification of true labels for partial-label learning. InInternational Conference on Machine Learning, pages 6500–6510, 2020

work page 2020
[13]

Learning with noisy labels

Nagarajan Natarajan, Inderjit S Dhillon, Pradeep K Ravikumar, and Ambuj Tewari. Learning with noisy labels. InAdvances in Neural Information Processing Systems, volume 26, 2013

work page 2013
[14]

Making deep neural networks robust to label noise: A loss correction approach

Giorgio Patrini, Alessandro Rozza, Aditya Krishna Menon, Richard Nock, and Lizhen Qu. Making deep neural networks robust to label noise: A loss correction approach. InIEEE Conference on Computer Vision and Pattern Recognition, pages 1944–1952, 2017

work page 1944
[15]

Learning with complementary labels revisited: The selected-completely-at-random setting is more practical

Wei Wang, Takashi Ishida, Yu-Jie Zhang, Gang Niu, and Masashi Sugiyama. Learning with complementary labels revisited: The selected-completely-at-random setting is more practical. InProceedings of the 41st International Conference on Machine Learning, 2024. 10

work page 2024
[16]

CLImage: Human-annotated datasets for complementary-label learning.Transactions on Machine Learning Research, 2025

Hsiu-Hsuan Wang, Mai Tan Ha, Nai-Xuan Ye, Wei-I Lin, and Hsuan-Tien Lin. CLImage: Human-annotated datasets for complementary-label learning.Transactions on Machine Learning Research, 2025

work page 2025
[17]

Learning with biased complementary labels

Xiyu Yu, Tongliang Liu, Mingming Gong, and Dacheng Tao. Learning with biased complementary labels. InEuropean Conference on Computer Vision, pages 68–83, 2018

work page 2018
[18]

NLNL: Negative learning for noisy labels

Youngdong Kim, Junho Yim, Juseung Yun, and Junmo Kim. NLNL: Negative learning for noisy labels. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 101–110, 2019

work page 2019
[19]

Discriminative complementary-label learning with weighted loss

Yi Gao and Min-Ling Zhang. Discriminative complementary-label learning with weighted loss. In International Conference on Machine Learning, pages 3587–3597, 2021

work page 2021
[20]

Reduction from complementary-label learning to probability estimates

Wei-I Lin and Hsuan-Tien Lin. Reduction from complementary-label learning to probability estimates. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 469–481, 2023

work page 2023
[21]

libcll: an extendable python toolkit for complementary-label learning, 2024

Nai-Xuan Ye, Tan-Ha Mai, Hsiu-Hsuan Wang, Wei-I Lin, and Hsuan-Tien Lin. libcll: an extendable python toolkit for complementary-label learning, 2024

work page 2024
[22]

Learning multiple layers of features from tiny images

Alex Krizhevsky. Learning multiple layers of features from tiny images. Computer Science University of Toronto, Canada, 2009

work page 2009
[23]

Tiny ImageNet visual recognition challenge

Ya Le and Xuan Yang. Tiny ImageNet visual recognition challenge. Report of CS231N: Deep Learning for Computer Vision Course, 2015. Stanford University

work page 2015
[24]

Visual instruction tuning

Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. Visual instruction tuning. InAdvances in Neural Information Processing Systems, volume 36, pages 34892–34916. Curran Associates, Inc., 2023

work page 2023
[25]

Intra-cluster mixup: An effective data augmentation technique for complementary-label learning.Transactions on Machine Learning Research, 2026

Tan-Ha Mai and Hsuan-Tien Lin. Intra-cluster mixup: An effective data augmentation technique for complementary-label learning.Transactions on Machine Learning Research, 2026

work page 2026
[26]

Exploring simple siamese representation learning

Xinlei Chen and Kaiming He. Exploring simple siamese representation learning. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15750–15758, 2021

work page 2021
[27]

Wiley-Interscience, Hoboken, NJ, 2nd edition, 2006

Thomas M Cover and Joy A Thomas.Elements of Information Theory. Wiley-Interscience, Hoboken, NJ, 2nd edition, 2006

work page 2006
[28]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016

work page 2016
[29]

Complementary-label learning for arbitrary losses and models

Takashi Ishida, Gang Niu, Aditya Menon, and Masashi Sugiyama. Complementary-label learning for arbitrary losses and models. InInternational Conference on Machine Learning, pages 2971–2980, 2019

work page 2019
[30]

Learning with multiple complementary labels

Lei Feng, Takuo Kaneko, Bo Han, Gang Niu, Bo An, and Masashi Sugiyama. Learning with multiple complementary labels. InInternational Conference on Machine Learning, pages 3072–3081, 2020

work page 2020
[31]

Comco: Complementary supervised contrastive learning for complementary label learning.Neural Networks, 169:44–56, 2024

Haoran Jiang, Zhihao Sun, and Yingjie Tian. Comco: Complementary supervised contrastive learning for complementary label learning.Neural Networks, 169:44–56, 2024

work page 2024
[32]

Tackling biased complementary label learning with large margin.Information Sciences, 687:121400, 2025

Yiwei You, Jinglong Huang, Qiang Tong, and Bo Wang. Tackling biased complementary label learning with large margin.Information Sciences, 687:121400, 2025

work page 2025
[33]

Learning from noisy complementary labels with robust loss functions.IEICE Transactions on Information and Systems, 105:364–376, 2022

Hiroki Ishiguro, Takashi Ishida, and Masashi Sugiyama. Learning from noisy complementary labels with robust loss functions.IEICE Transactions on Information and Systems, 105:364–376, 2022

work page 2022
[34]

Class-imbalanced complementary-label learning via weighted loss.Neural Networks, 166:555–565, 2023

Meng Wei, Yong Zhou, Zhongnian Li, and Xinzheng Xu. Class-imbalanced complementary-label learning via weighted loss.Neural Networks, 166:555–565, 2023

work page 2023
[35]

Learning with noisy labels revisited: A study using real-world human annotations

Jiaheng Wei, Zhaowei Zhu, Hao Cheng, Tongliang Liu, Gang Niu, and Yang Liu. Learning with noisy labels revisited: A study using real-world human annotations. InInternational Conference on Learning Representations, 2022

work page 2022
[36]

Learning imbalanced datasets with label-distribution-aware margin loss

Kaidi Cao, Colin Wei, Adrien Gaidon, Nikos Arechiga, and Tengyu Ma. Learning imbalanced datasets with label-distribution-aware margin loss. InAdvances in Neural Information Processing Systems, volume 32, pages 1565–1576, 2019

work page 2019
[37]

Class-balanced loss based on effective number of samples

Yin Cui, Menglin Jia, Tsung-Yi Lin, Yang Song, and Serge Belongie. Class-balanced loss based on effective number of samples. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9268–9277, 2019. 11

work page 2019
[38]

Improved Regularization of Convolutional Neural Networks with Cutout

Terrance DeVries and Graham W Taylor. Improved regularization of convolutional neural networks with cutout.arXiv preprint arXiv:1708.04552, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[39]

Autoaugment: Learning augmentation strategies from data

Ekin D Cubuk, Barret Zoph, Dandelion Mane, Vijay Vasudevan, and Quoc V Le. Autoaugment: Learning augmentation strategies from data. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 113–123, 2019

work page 2019
[40]

Randaugment: Practical automated data augmentation with a reduced search space

Ekin D Cubuk, Barret Zoph, Jonathon Shlens, and Quoc V Le. Randaugment: Practical automated data augmentation with a reduced search space. InIEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 702–703, 2020

work page 2020
[41]

A simple framework for contrastive learning of visual representations

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations. InInternational conference on machine learning, pages 1597–1607. PmLR, 2020

work page 2020
[42]

Byol works even without batch statistics

Pierre H Richemond, Jean-Bastien Grill, Florent Altché, Corentin Tallec, Florian Strub, Andrew Brock, Samuel Smith, Soham De, Razvan Pascanu, Bilal Piot, et al. Byol works even without batch statistics. arXiv preprint arXiv:2010.10241, 2020

work page arXiv 2010
[43]

An empirical study of training self-supervised vision transformers

Xinlei Chen, Saining Xie, and Kaiming He. An empirical study of training self-supervised vision transformers. InIEEE/CVF international conference on computer vision, pages 9640–9649, 2021

work page 2021
[44]

Qwen3 technical report, 2025

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, Fan Zhou, Fei Huang, Feng Hu, Hao Ge, Haoran Wei, Huan Lin, Jialong Tang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jing Zhou, Jingren Zhou, Junyang Lin, Kai Dang, Keqin Bao, Kexin Yang, ...

work page 2025
[45]

Gemma Team, Thomas Mesnard, Cassidy Hardin, Robert Dadashi, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, Pouya Tafti, Léonard Hussenot, Pier Giuseppe Sessa, Aakanksha Chowdhery, Adam Roberts, Aditya Barua, Alex Botev, Alex Castro-Ros, Ambrose Slone, Amélie Héliou, Andrea Tacchetti, Anna Bulanova, Anto...

work page 2024
[46]

Learning with complementary labels revisited: The selected-completely-at-random setting is more practical

Wei Wang, Takashi Ishida, Yu-Jie Zhang, Gang Niu, and Masashi Sugiyama. Learning with complementary labels revisited: The selected-completely-at-random setting is more practical. InInternational Conference on Machine Learning, volume 235, pages 50683–50710, 2024

work page 2024
[47]

Consistent complementary-label learning via order-preserving losses

Shuqi Liu, Yuzhou Cao, Qiaozhen Zhang, Lei Feng, and Bo An. Consistent complementary-label learning via order-preserving losses. InInternational Conference on Artificial Intelligence and Statistics, pages 8734–8748, 2023

work page 2023
[48]

PiCO: Contrastive label disambiguation for partial label learning

Haobo Wang, Ruixuan Xiao, Yixuan Li, Lei Feng, Gang Niu, Gang Chen, and Junbo Zhao. PiCO: Contrastive label disambiguation for partial label learning. InInternational Conference on Learning Representations, 2022

work page 2022
[49]

Solar: Sinkhorn label refinery for imbalanced partial-label learning.Advances in neural information processing systems, 35:8104–8117, 2022

Haobo Wang, Mingxuan Xia, Yixuan Li, Yuren Mao, Lei Feng, Gang Chen, and Junbo Zhao. Solar: Sinkhorn label refinery for imbalanced partial-label learning.Advances in neural information processing systems, 35:8104–8117, 2022. 12 A Proofs of Theoretical Results A.1 Proof of Theorem 1 Proof. We start with the standard Fano’s Inequality [ 27], which bounds th...

work page 2022
[50]

Rows were normalized to sum to 1

Dense Bias:We generated a random transition matrix QBias ∈R C×C where QBias ij ∼U[0,1] fori̸=jandQ Bias ii = 0. Rows were normalized to sum to 1

work page
[51]

Results.We computed the conditional entropy H(Y| ¯Y) for both matrices

Sparse Bias (Ours):From QBias, we derived a sparse matrix QOurs by retaining k (k= 4 ) randomly selected elements per row and re-normalizing. Results.We computed the conditional entropy H(Y| ¯Y) for both matrices. The simulation revealed that: HOurs(Y| ¯Y)≤H Bias(Y| ¯Y) holds in100 percentage pointof the trials across all tested dimensions ( 10×10 , 100×1...

work page arXiv 1999
[52]

Effect of the Number of Sam- pled Labels

VLM annotator: Which are provided in Appendix C.6 “Effect of the Number of Sam- pled Labels”. We selected 4 candidate labels as CLs for each class. Take note that the Appendix C.6 is also discard the true label from candidate labels

work page
[53]

preference

A rule-based annotator: discard the true label (reduce the candidate set to 4 classes), and then uniformly select one from the remaining 4 (all 4 are CLs) classes. We can do so since in Figure 2, we have the true class. Table 6: Comparison between the VLM annotator and the rule-based annotator on CIFAR-20 (accuracy (%), mean±std). Annotator Method Dataset...

work page arXiv 2047
[54]

A candidate set of 4 labels is uniformly sampled from the label space

work page
[55]

The VLM (LLaV A) is prompted to select the label from this set that doesnotdescribe the image. Characteristics.While using the same protocol, the VLM annotator significantly reduces label noise, achieving a noise rate of approximately 0.24 percentage points on CIFAR-10, which is much lower than CLImage. However, contrary to the expectation that uniform ca...

work page
[56]

The core idea relies on an inverse transition matrix to recover the unbiased risk of the true classifier

proposed a framework to estimate the classification risk unbiasedly using CLs. The core idea relies on an inverse transition matrix to recover the unbiased risk of the true classifier. The general loss formulation is: RURE(g) = 1 N NX i=1 e⊤ ¯yi Q−1L(g(xi)), where e¯yi is the one-hot vector of the complementary label, and L(g(xi)) denotes the vector of lo...

work page
[57]

Unlike risk-correction methods, CPE focuses on directly estimating the probability of a label being complementary, de- noted as p(¯y|x)

introduced the Complementary Probability Estimation (CPE) framework. Unlike risk-correction methods, CPE focuses on directly estimating the probability of a label being complementary, de- noted as p(¯y|x). The objective is to minimize the divergence between the model’s output and the complementary target. CPE employs a surrogate complementary estimation l...

work page 2000

[1] [1]

The unexplored potential of vision-language models for generating large-scale complementary-label learning data

Tan-Ha Mai, Nai-Xuan Ye, Yu-Wei Kuan, Po-Yi Lu, and Hsuan-Tien Lin. The unexplored potential of vision-language models for generating large-scale complementary-label learning data. InPacific-Asia Conference on Knowledge Discovery and Data Mining, pages 90–102, 2025

work page 2025

[2] [2]

MIT Press, 2022

Masashi Sugiyama, Han Bao, Takashi Ishida, Nan Lu, Tomoya Sakai, and Gang Niu.Machine learning from weak supervision: An empirical risk minimization approach. MIT Press, 2022

work page 2022

[3] [3]

Learning classifiers from only positive and unlabeled data

Charles Elkan and Keith Noto. Learning classifiers from only positive and unlabeled data. InProceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 213–220, 2008

work page 2008

[4] [4]

du Plessis, and Masashi Sugiyama

Ryuichi Kiryo, Gang Niu, Marthinus C. du Plessis, and Masashi Sugiyama. Positive-unlabeled learning with non-negative risk estimator. InAdvances in Neural Information Processing Systems, volume 30, pages 1675–1685, 2017

work page 2017

[5] [5]

Learning from similarity- confidence data

Yuzhou Cao, Lei Feng, Yitian Xu, Bo An, Gang Niu, and Masashi Sugiyama. Learning from similarity- confidence data. InProceedings of the 38th International Conference on Machine Learning, pages 1272–1282, 2021

work page 2021

[6] [6]

Binary classification with confidence difference

Wei Wang, Lei Feng, Yuchen Jiang, Gang Niu, Min-Ling Zhang, and Masashi Sugiyama. Binary classification with confidence difference. InAdvances in Neural Information Processing Systems 36, pages 5936–5960, 2023

work page 2023

[7] [7]

Classification from pairwise similarity and unlabeled data

Han Bao, Gang Niu, and Masashi Sugiyama. Classification from pairwise similarity and unlabeled data. In Proceedings of the 35th International Conference on Machine Learning, pages 461–470, 2018

work page 2018

[8] [8]

Pairwise supervision can provably elicit a decision boundary

Han Bao, Takuya Shimada, Liyuan Xu, Issei Sato, and Masashi Sugiyama. Pairwise supervision can provably elicit a decision boundary. InProceedings of the 25th International Conference on Artificial Intelligence and Statistics, pages 2618–2640, 2022

work page 2022

[9] [9]

Learning from complementary labels

Takashi Ishida, Gang Niu, Weihua Hu, and Masashi Sugiyama. Learning from complementary labels. In Advances in Neural Information Processing Systems, page 5639–5649, 2017

work page 2017

[10] [10]

Unbiased risk estimators can mislead: A case study of learning with complementary labels

Yu-Ting Chou, Gang Niu, Hsuan-Tien Lin, and Masashi Sugiyama. Unbiased risk estimators can mislead: A case study of learning with complementary labels. InInternational Conference on Machine Learning, pages 1929–1938, 2020

work page 1929

[11] [11]

Learning with multiple labels.Advances in Neural Information Processing Systems, 15, 2002

Rong Jin and Zoubin Ghahramani. Learning with multiple labels.Advances in Neural Information Processing Systems, 15, 2002

work page 2002

[12] [12]

Progressive identification of true labels for partial-label learning

Jiaqi Lv, Miao Xu, Lei Feng, Gang Niu, Xin Geng, and Masashi Sugiyama. Progressive identification of true labels for partial-label learning. InInternational Conference on Machine Learning, pages 6500–6510, 2020

work page 2020

[13] [13]

Learning with noisy labels

Nagarajan Natarajan, Inderjit S Dhillon, Pradeep K Ravikumar, and Ambuj Tewari. Learning with noisy labels. InAdvances in Neural Information Processing Systems, volume 26, 2013

work page 2013

[14] [14]

Making deep neural networks robust to label noise: A loss correction approach

Giorgio Patrini, Alessandro Rozza, Aditya Krishna Menon, Richard Nock, and Lizhen Qu. Making deep neural networks robust to label noise: A loss correction approach. InIEEE Conference on Computer Vision and Pattern Recognition, pages 1944–1952, 2017

work page 1944

[15] [15]

Learning with complementary labels revisited: The selected-completely-at-random setting is more practical

Wei Wang, Takashi Ishida, Yu-Jie Zhang, Gang Niu, and Masashi Sugiyama. Learning with complementary labels revisited: The selected-completely-at-random setting is more practical. InProceedings of the 41st International Conference on Machine Learning, 2024. 10

work page 2024

[16] [16]

CLImage: Human-annotated datasets for complementary-label learning.Transactions on Machine Learning Research, 2025

Hsiu-Hsuan Wang, Mai Tan Ha, Nai-Xuan Ye, Wei-I Lin, and Hsuan-Tien Lin. CLImage: Human-annotated datasets for complementary-label learning.Transactions on Machine Learning Research, 2025

work page 2025

[17] [17]

Learning with biased complementary labels

Xiyu Yu, Tongliang Liu, Mingming Gong, and Dacheng Tao. Learning with biased complementary labels. InEuropean Conference on Computer Vision, pages 68–83, 2018

work page 2018

[18] [18]

NLNL: Negative learning for noisy labels

Youngdong Kim, Junho Yim, Juseung Yun, and Junmo Kim. NLNL: Negative learning for noisy labels. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 101–110, 2019

work page 2019

[19] [19]

Discriminative complementary-label learning with weighted loss

Yi Gao and Min-Ling Zhang. Discriminative complementary-label learning with weighted loss. In International Conference on Machine Learning, pages 3587–3597, 2021

work page 2021

[20] [20]

Reduction from complementary-label learning to probability estimates

Wei-I Lin and Hsuan-Tien Lin. Reduction from complementary-label learning to probability estimates. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 469–481, 2023

work page 2023

[21] [21]

libcll: an extendable python toolkit for complementary-label learning, 2024

Nai-Xuan Ye, Tan-Ha Mai, Hsiu-Hsuan Wang, Wei-I Lin, and Hsuan-Tien Lin. libcll: an extendable python toolkit for complementary-label learning, 2024

work page 2024

[22] [22]

Learning multiple layers of features from tiny images

Alex Krizhevsky. Learning multiple layers of features from tiny images. Computer Science University of Toronto, Canada, 2009

work page 2009

[23] [23]

Tiny ImageNet visual recognition challenge

Ya Le and Xuan Yang. Tiny ImageNet visual recognition challenge. Report of CS231N: Deep Learning for Computer Vision Course, 2015. Stanford University

work page 2015

[24] [24]

Visual instruction tuning

Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. Visual instruction tuning. InAdvances in Neural Information Processing Systems, volume 36, pages 34892–34916. Curran Associates, Inc., 2023

work page 2023

[25] [25]

Intra-cluster mixup: An effective data augmentation technique for complementary-label learning.Transactions on Machine Learning Research, 2026

Tan-Ha Mai and Hsuan-Tien Lin. Intra-cluster mixup: An effective data augmentation technique for complementary-label learning.Transactions on Machine Learning Research, 2026

work page 2026

[26] [26]

Exploring simple siamese representation learning

Xinlei Chen and Kaiming He. Exploring simple siamese representation learning. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15750–15758, 2021

work page 2021

[27] [27]

Wiley-Interscience, Hoboken, NJ, 2nd edition, 2006

Thomas M Cover and Joy A Thomas.Elements of Information Theory. Wiley-Interscience, Hoboken, NJ, 2nd edition, 2006

work page 2006

[28] [28]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016

work page 2016

[29] [29]

Complementary-label learning for arbitrary losses and models

Takashi Ishida, Gang Niu, Aditya Menon, and Masashi Sugiyama. Complementary-label learning for arbitrary losses and models. InInternational Conference on Machine Learning, pages 2971–2980, 2019

work page 2019

[30] [30]

Learning with multiple complementary labels

Lei Feng, Takuo Kaneko, Bo Han, Gang Niu, Bo An, and Masashi Sugiyama. Learning with multiple complementary labels. InInternational Conference on Machine Learning, pages 3072–3081, 2020

work page 2020

[31] [31]

Comco: Complementary supervised contrastive learning for complementary label learning.Neural Networks, 169:44–56, 2024

Haoran Jiang, Zhihao Sun, and Yingjie Tian. Comco: Complementary supervised contrastive learning for complementary label learning.Neural Networks, 169:44–56, 2024

work page 2024

[32] [32]

Tackling biased complementary label learning with large margin.Information Sciences, 687:121400, 2025

Yiwei You, Jinglong Huang, Qiang Tong, and Bo Wang. Tackling biased complementary label learning with large margin.Information Sciences, 687:121400, 2025

work page 2025

[33] [33]

Learning from noisy complementary labels with robust loss functions.IEICE Transactions on Information and Systems, 105:364–376, 2022

Hiroki Ishiguro, Takashi Ishida, and Masashi Sugiyama. Learning from noisy complementary labels with robust loss functions.IEICE Transactions on Information and Systems, 105:364–376, 2022

work page 2022

[34] [34]

Class-imbalanced complementary-label learning via weighted loss.Neural Networks, 166:555–565, 2023

Meng Wei, Yong Zhou, Zhongnian Li, and Xinzheng Xu. Class-imbalanced complementary-label learning via weighted loss.Neural Networks, 166:555–565, 2023

work page 2023

[35] [35]

Learning with noisy labels revisited: A study using real-world human annotations

Jiaheng Wei, Zhaowei Zhu, Hao Cheng, Tongliang Liu, Gang Niu, and Yang Liu. Learning with noisy labels revisited: A study using real-world human annotations. InInternational Conference on Learning Representations, 2022

work page 2022

[36] [36]

Learning imbalanced datasets with label-distribution-aware margin loss

Kaidi Cao, Colin Wei, Adrien Gaidon, Nikos Arechiga, and Tengyu Ma. Learning imbalanced datasets with label-distribution-aware margin loss. InAdvances in Neural Information Processing Systems, volume 32, pages 1565–1576, 2019

work page 2019

[37] [37]

Class-balanced loss based on effective number of samples

Yin Cui, Menglin Jia, Tsung-Yi Lin, Yang Song, and Serge Belongie. Class-balanced loss based on effective number of samples. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9268–9277, 2019. 11

work page 2019

[38] [38]

Improved Regularization of Convolutional Neural Networks with Cutout

Terrance DeVries and Graham W Taylor. Improved regularization of convolutional neural networks with cutout.arXiv preprint arXiv:1708.04552, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[39] [39]

Autoaugment: Learning augmentation strategies from data

Ekin D Cubuk, Barret Zoph, Dandelion Mane, Vijay Vasudevan, and Quoc V Le. Autoaugment: Learning augmentation strategies from data. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 113–123, 2019

work page 2019

[40] [40]

Randaugment: Practical automated data augmentation with a reduced search space

Ekin D Cubuk, Barret Zoph, Jonathon Shlens, and Quoc V Le. Randaugment: Practical automated data augmentation with a reduced search space. InIEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 702–703, 2020

work page 2020

[41] [41]

A simple framework for contrastive learning of visual representations

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations. InInternational conference on machine learning, pages 1597–1607. PmLR, 2020

work page 2020

[42] [42]

Byol works even without batch statistics

Pierre H Richemond, Jean-Bastien Grill, Florent Altché, Corentin Tallec, Florian Strub, Andrew Brock, Samuel Smith, Soham De, Razvan Pascanu, Bilal Piot, et al. Byol works even without batch statistics. arXiv preprint arXiv:2010.10241, 2020

work page arXiv 2010

[43] [43]

An empirical study of training self-supervised vision transformers

Xinlei Chen, Saining Xie, and Kaiming He. An empirical study of training self-supervised vision transformers. InIEEE/CVF international conference on computer vision, pages 9640–9649, 2021

work page 2021

[44] [44]

Qwen3 technical report, 2025

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, Fan Zhou, Fei Huang, Feng Hu, Hao Ge, Haoran Wei, Huan Lin, Jialong Tang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jing Zhou, Jingren Zhou, Junyang Lin, Kai Dang, Keqin Bao, Kexin Yang, ...

work page 2025

[45] [45]

Gemma Team, Thomas Mesnard, Cassidy Hardin, Robert Dadashi, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, Pouya Tafti, Léonard Hussenot, Pier Giuseppe Sessa, Aakanksha Chowdhery, Adam Roberts, Aditya Barua, Alex Botev, Alex Castro-Ros, Ambrose Slone, Amélie Héliou, Andrea Tacchetti, Anna Bulanova, Anto...

work page 2024

[46] [46]

Learning with complementary labels revisited: The selected-completely-at-random setting is more practical

Wei Wang, Takashi Ishida, Yu-Jie Zhang, Gang Niu, and Masashi Sugiyama. Learning with complementary labels revisited: The selected-completely-at-random setting is more practical. InInternational Conference on Machine Learning, volume 235, pages 50683–50710, 2024

work page 2024

[47] [47]

Consistent complementary-label learning via order-preserving losses

Shuqi Liu, Yuzhou Cao, Qiaozhen Zhang, Lei Feng, and Bo An. Consistent complementary-label learning via order-preserving losses. InInternational Conference on Artificial Intelligence and Statistics, pages 8734–8748, 2023

work page 2023

[48] [48]

PiCO: Contrastive label disambiguation for partial label learning

Haobo Wang, Ruixuan Xiao, Yixuan Li, Lei Feng, Gang Niu, Gang Chen, and Junbo Zhao. PiCO: Contrastive label disambiguation for partial label learning. InInternational Conference on Learning Representations, 2022

work page 2022

[49] [49]

Solar: Sinkhorn label refinery for imbalanced partial-label learning.Advances in neural information processing systems, 35:8104–8117, 2022

Haobo Wang, Mingxuan Xia, Yixuan Li, Yuren Mao, Lei Feng, Gang Chen, and Junbo Zhao. Solar: Sinkhorn label refinery for imbalanced partial-label learning.Advances in neural information processing systems, 35:8104–8117, 2022. 12 A Proofs of Theoretical Results A.1 Proof of Theorem 1 Proof. We start with the standard Fano’s Inequality [ 27], which bounds th...

work page 2022

[50] [50]

Rows were normalized to sum to 1

Dense Bias:We generated a random transition matrix QBias ∈R C×C where QBias ij ∼U[0,1] fori̸=jandQ Bias ii = 0. Rows were normalized to sum to 1

work page

[51] [51]

Results.We computed the conditional entropy H(Y| ¯Y) for both matrices

Sparse Bias (Ours):From QBias, we derived a sparse matrix QOurs by retaining k (k= 4 ) randomly selected elements per row and re-normalizing. Results.We computed the conditional entropy H(Y| ¯Y) for both matrices. The simulation revealed that: HOurs(Y| ¯Y)≤H Bias(Y| ¯Y) holds in100 percentage pointof the trials across all tested dimensions ( 10×10 , 100×1...

work page arXiv 1999

[52] [52]

Effect of the Number of Sam- pled Labels

VLM annotator: Which are provided in Appendix C.6 “Effect of the Number of Sam- pled Labels”. We selected 4 candidate labels as CLs for each class. Take note that the Appendix C.6 is also discard the true label from candidate labels

work page

[53] [53]

preference

A rule-based annotator: discard the true label (reduce the candidate set to 4 classes), and then uniformly select one from the remaining 4 (all 4 are CLs) classes. We can do so since in Figure 2, we have the true class. Table 6: Comparison between the VLM annotator and the rule-based annotator on CIFAR-20 (accuracy (%), mean±std). Annotator Method Dataset...

work page arXiv 2047

[54] [54]

A candidate set of 4 labels is uniformly sampled from the label space

work page

[55] [55]

The VLM (LLaV A) is prompted to select the label from this set that doesnotdescribe the image. Characteristics.While using the same protocol, the VLM annotator significantly reduces label noise, achieving a noise rate of approximately 0.24 percentage points on CIFAR-10, which is much lower than CLImage. However, contrary to the expectation that uniform ca...

work page

[56] [56]

The core idea relies on an inverse transition matrix to recover the unbiased risk of the true classifier

proposed a framework to estimate the classification risk unbiasedly using CLs. The core idea relies on an inverse transition matrix to recover the unbiased risk of the true classifier. The general loss formulation is: RURE(g) = 1 N NX i=1 e⊤ ¯yi Q−1L(g(xi)), where e¯yi is the one-hot vector of the complementary label, and L(g(xi)) denotes the vector of lo...

work page

[57] [57]

Unlike risk-correction methods, CPE focuses on directly estimating the probability of a label being complementary, de- noted as p(¯y|x)

introduced the Complementary Probability Estimation (CPE) framework. Unlike risk-correction methods, CPE focuses on directly estimating the probability of a label being complementary, de- noted as p(¯y|x). The objective is to minimize the divergence between the model’s output and the complementary target. CPE employs a surrogate complementary estimation l...

work page 2000