pith. sign in

arxiv: 2605.15586 · v2 · pith:N5CGKZWZnew · submitted 2026-05-15 · 💻 cs.LG · cs.AI· cs.CV

Embracing Biased Transition Matrices for Complementary-Label Learning with Many Classes

Pith reviewed 2026-05-20 19:55 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CV
keywords complementary-label learningbiased transition matrixweakly supervised learningmany-class classificationCIFAR-100TinyImageNet
0
0 comments X

The pith

By designing a biased non-uniform process for complementary labels restricted to class subsets, CLL scales to 100+ classes with over sevenfold accuracy gains.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that the uniform generation assumption in complementary-label learning dilutes the signal too severely for large label spaces, confining success to 10-class tasks. It demonstrates that this barrier is overcome by deliberately using a biased generation process that restricts complementary labels to a subset of classes, with the resulting transition matrix incorporated into training. This motivates the Bias-Induced Constrained Labeling (BICL) framework that spans data collection and model fitting. Experiments show BICL delivers effective learning on CIFAR-100 and TinyImageNet-200. A sympathetic reader cares because the approach turns a long-standing limitation into a controllable design choice for real-world many-class applications.

Core claim

The central claim is that a deliberately biased transition matrix, induced by restricting complementary labels to a known subset of classes, preserves a usable learning signal and thereby enables complementary-label learning to succeed on problems with 100 or more classes, as shown by the BICL framework's performance improvements.

What carries the argument

The biased (non-uniform) transition matrix that encodes the restricted complementary-label generation process and is used directly in the training objective.

If this is right

  • CLL becomes practical for 100-class and 200-class image datasets rather than remaining limited to 10 classes.
  • Accuracy gains exceeding seven times those of traditional methods are achievable when the bias is known.
  • Real-world CLL applications become feasible when annotation processes can enforce and record the restricted label generation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Data annotation pipelines could be redesigned to intentionally introduce and document known biases instead of striving for uniformity.
  • The same bias-leveraging principle may extend to other weak-supervision settings where label noise or incompleteness can be controlled.
  • Optimal subset size for the restriction could be studied as a tunable parameter for different numbers of classes.

Load-bearing premise

The data collection process can be controlled so the biased complementary-label generation is known and matches the transition matrix used at training time.

What would settle it

If applying BICL with the correctly estimated biased transition matrix on CIFAR-100 produces no accuracy improvement over uniform-assumption baselines, the central claim would be falsified.

Figures

Figures reproduced from arXiv: 2605.15586 by Chao-Kai Chiang, Gang Niu, Han-Hwa Shih, Hsuan-Tien Lin, Masashi Sugiyama, Tan-Ha Mai.

Figure 1
Figure 1. Figure 1: BICL Practical Case: Overview of the proposed practical design for bias-induced constrained labeling (BICL) that operates without true label access. Existing CLL models generally assume that complementary labels (CLs) are generated from the true class according to a transition matrix [9, 17, 18, 19, 20], which plays a central role in CLL studies. [9] pioneered the theoretical study of CLL by assuming a zer… view at source ↗
Figure 2
Figure 2. Figure 2: BICL Analysis Case: Overview of the proof of concept design for complementary candidate-label selection that operates with true label access. It consists of (1) reducing candidate label selection, (2) VLM-based complementary-label annotation with a negative prompt. Extending CLL to many-class settings raises a key question: “what is the main difficulty?” The challenge is twofold. First, from the labeling p… view at source ↗
Figure 3
Figure 3. Figure 3: BICL transition matrix compared to other label collection approaches. [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Test accuracy during training on four datasets. Our method reaches its peak performance [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Per-class accuracy comparison between the uniform complementary-label distribution and [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Performance comparison of different number of sampled label with BICL. [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Label distributions across CIFAR-10 variants. CIFAR-10 represents an idealized setting with noiseless, uniformly distributed labels. CLCIFAR-10 corresponds to a human-annotated and ACLCIFAR-10 is VLM-annotated under a uniform distribution design. 0 1 2 3 4 5 6 7 8 9 10111213141516171819 Class Index 0 300 600 900 1200 1500 1800 2100 2400 2700 Count (a) CIFAR-20 0 1 2 3 4 5 6 7 8 9 10111213141516171819 Class… view at source ↗
Figure 8
Figure 8. Figure 8: Label distributions across CIFAR-20 variants. [PITH_FULL_IMAGE:figures/full_fig_p019_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Label distributions across CIFAR-100 variants. [PITH_FULL_IMAGE:figures/full_fig_p019_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Bias-Induced Constrained Labeling transition matrix on CIFAR-20 variants. [PITH_FULL_IMAGE:figures/full_fig_p020_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Performance comparison of different data augmentation strategies integrated with BICL [PITH_FULL_IMAGE:figures/full_fig_p021_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Transition matrix induced by the BICL protocol when using different encoder backbones [PITH_FULL_IMAGE:figures/full_fig_p022_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: BICL performance with different encoder networks used in the label-selection stage, [PITH_FULL_IMAGE:figures/full_fig_p022_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Performance comparison of different prompt on model performance across datasets. [PITH_FULL_IMAGE:figures/full_fig_p024_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Effect of label-space size on performance within CIFAR-100 and TinyImageNet-200. [PITH_FULL_IMAGE:figures/full_fig_p028_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Bias-Induced Constrained Labeling transition matrix on CIFAR-100 variants. [PITH_FULL_IMAGE:figures/full_fig_p033_16.png] view at source ↗
read the original abstract

Complementary-label learning (CLL) is a weakly supervised paradigm where instances are labeled with classes they do not belong to. Despite a decade of research, CLL methods remain competitive mainly on 10-class classification, with scaling to large label spaces continuing to be an enduring bottleneck. This limitation stems from the common assumption of uniform label generation in traditional methods, which fatally dilutes the learning signal in many-class settings. In this paper, we demonstrate that this long-standing barrier can be overcome by deliberately designing a biased (non-uniform) generation process that restricts complementary labels to a subset of classes. This finding motivates us to propose Bias-Induced Constrained Labeling (BICL), a principled framework spanning data collection to training that leverages this bias. BICL enables effective learning on CIFAR-100 and TinyImageNet-200, achieving more than sevenfold accuracy improvements over traditional methods. Our findings establish a new trajectory for making CLL feasible for many classes in real-world applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims that traditional complementary-label learning (CLL) fails to scale beyond 10 classes due to the uniform transition matrix assumption, which dilutes the signal in large label spaces. It proposes Bias-Induced Constrained Labeling (BICL), a framework that deliberately imposes a biased (non-uniform) complementary-label generation process restricting labels to class subsets, derives an unbiased risk estimator from the known biased transition matrix T, and reports more than sevenfold accuracy gains on CIFAR-100 and TinyImageNet-200 over prior CLL methods.

Significance. If the central claim holds under the stated assumptions, BICL would represent a meaningful shift in CLL by moving from passive uniform labeling to controlled biased data collection, potentially making the paradigm viable for real-world many-class problems. The empirical scale of the reported gains, if reproducible with proper controls, would be notable; however, the significance hinges on whether the bias can be practically enforced and whether the estimator remains robust outside idealized settings.

major comments (2)
  1. [BICL framework] The unbiased risk estimator derivation (BICL framework section) treats the biased transition matrix T as known exactly and matching the data-generating process. No estimation procedure, sensitivity analysis, or robustness experiments are provided for cases where the assumed T deviates from the true generation process; this is load-bearing because even modest misspecification would bias the estimator and undermine the reported gains.
  2. [Experiments] Experiments on CIFAR-100 and TinyImageNet-200 claim >7x accuracy improvements, but lack details on how the biased complementary labels were actually generated and imposed during data collection, exact parameterization of the bias distribution, error bars across runs, and ablations isolating the effect of the bias parameters versus other modeling choices.
minor comments (1)
  1. [Introduction / Framework] Notation for the biased transition matrix T and the subset restriction should be introduced with an explicit equation early in the framework section to improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed review. We address each major comment below and indicate the revisions we will make to strengthen the manuscript.

read point-by-point responses
  1. Referee: [BICL framework] The unbiased risk estimator derivation (BICL framework section) treats the biased transition matrix T as known exactly and matching the data-generating process. No estimation procedure, sensitivity analysis, or robustness experiments are provided for cases where the assumed T deviates from the true generation process; this is load-bearing because even modest misspecification would bias the estimator and undermine the reported gains.

    Authors: We appreciate this observation. In the BICL framework, the transition matrix T is deliberately designed and imposed as part of the controlled data collection process, so it is known exactly by construction rather than estimated. This enables the exact unbiased risk estimator under the stated assumptions. We agree that robustness to misspecification is important to demonstrate. In the revised manuscript we will add a sensitivity analysis together with experiments that quantify estimator degradation under controlled deviations from the assumed T. revision: yes

  2. Referee: [Experiments] Experiments on CIFAR-100 and TinyImageNet-200 claim >7x accuracy improvements, but lack details on how the biased complementary labels were actually generated and imposed during data collection, exact parameterization of the bias distribution, error bars across runs, and ablations isolating the effect of the bias parameters versus other modeling choices.

    Authors: We agree that these details are necessary for reproducibility and for isolating the contribution of the bias. In the revision we will expand the experimental section to describe the exact procedure used to generate and impose the biased complementary labels, provide the precise parameterization of the bias distribution, report error bars from multiple independent runs, and include ablation studies that vary the bias parameters while holding other modeling choices fixed. revision: yes

Circularity Check

0 steps flagged

BICL framework derivation is self-contained with no reduction to inputs by construction

full rationale

The paper introduces BICL by proposing a deliberately biased complementary-label generation process whose transition matrix T is treated as known and controlled at data collection time. The derivation of the unbiased risk estimator follows directly from this known T via standard risk correction techniques for complementary labels; this is a forward mathematical construction rather than a tautology or fitted quantity renamed as prediction. No equations in the abstract or described framework reduce the final performance claim to a parameter fit on the target data, nor does the argument rest on self-citation chains or imported uniqueness theorems. Empirical gains on CIFAR-100 and TinyImageNet-200 are presented as validation under the stated assumption, not as evidence that forces the result. The central premise therefore remains an external modeling choice (controllable biased labeling) rather than a self-referential loop.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

BICL rests on the ability to control and model a non-uniform transition matrix during data collection; this introduces free parameters for the bias distribution and a domain assumption that the matrix remains known and stable at training time.

free parameters (1)
  • bias distribution parameters
    Parameters that define the non-uniform probabilities over the restricted subset of complementary classes; these must be set or estimated to realize the claimed gains.
axioms (1)
  • domain assumption The biased generation process can be enforced at data collection time and the resulting transition matrix is known exactly for training.
    Invoked when the paper states that BICL spans data collection to training by leveraging the bias.

pith-pipeline@v0.9.0 · 5716 in / 1317 out tokens · 56833 ms · 2026-05-20T19:55:09.505054+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

57 extracted references · 57 canonical work pages · 1 internal anchor

  1. [1]

    The unexplored potential of vision-language models for generating large-scale complementary-label learning data

    Tan-Ha Mai, Nai-Xuan Ye, Yu-Wei Kuan, Po-Yi Lu, and Hsuan-Tien Lin. The unexplored potential of vision-language models for generating large-scale complementary-label learning data. InPacific-Asia Conference on Knowledge Discovery and Data Mining, pages 90–102, 2025

  2. [2]

    MIT Press, 2022

    Masashi Sugiyama, Han Bao, Takashi Ishida, Nan Lu, Tomoya Sakai, and Gang Niu.Machine learning from weak supervision: An empirical risk minimization approach. MIT Press, 2022

  3. [3]

    Learning classifiers from only positive and unlabeled data

    Charles Elkan and Keith Noto. Learning classifiers from only positive and unlabeled data. InProceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 213–220, 2008

  4. [4]

    du Plessis, and Masashi Sugiyama

    Ryuichi Kiryo, Gang Niu, Marthinus C. du Plessis, and Masashi Sugiyama. Positive-unlabeled learning with non-negative risk estimator. InAdvances in Neural Information Processing Systems, volume 30, pages 1675–1685, 2017

  5. [5]

    Learning from similarity- confidence data

    Yuzhou Cao, Lei Feng, Yitian Xu, Bo An, Gang Niu, and Masashi Sugiyama. Learning from similarity- confidence data. InProceedings of the 38th International Conference on Machine Learning, pages 1272–1282, 2021

  6. [6]

    Binary classification with confidence difference

    Wei Wang, Lei Feng, Yuchen Jiang, Gang Niu, Min-Ling Zhang, and Masashi Sugiyama. Binary classification with confidence difference. InAdvances in Neural Information Processing Systems 36, pages 5936–5960, 2023

  7. [7]

    Classification from pairwise similarity and unlabeled data

    Han Bao, Gang Niu, and Masashi Sugiyama. Classification from pairwise similarity and unlabeled data. In Proceedings of the 35th International Conference on Machine Learning, pages 461–470, 2018

  8. [8]

    Pairwise supervision can provably elicit a decision boundary

    Han Bao, Takuya Shimada, Liyuan Xu, Issei Sato, and Masashi Sugiyama. Pairwise supervision can provably elicit a decision boundary. InProceedings of the 25th International Conference on Artificial Intelligence and Statistics, pages 2618–2640, 2022

  9. [9]

    Learning from complementary labels

    Takashi Ishida, Gang Niu, Weihua Hu, and Masashi Sugiyama. Learning from complementary labels. In Advances in Neural Information Processing Systems, page 5639–5649, 2017

  10. [10]

    Unbiased risk estimators can mislead: A case study of learning with complementary labels

    Yu-Ting Chou, Gang Niu, Hsuan-Tien Lin, and Masashi Sugiyama. Unbiased risk estimators can mislead: A case study of learning with complementary labels. InInternational Conference on Machine Learning, pages 1929–1938, 2020

  11. [11]

    Learning with multiple labels.Advances in Neural Information Processing Systems, 15, 2002

    Rong Jin and Zoubin Ghahramani. Learning with multiple labels.Advances in Neural Information Processing Systems, 15, 2002

  12. [12]

    Progressive identification of true labels for partial-label learning

    Jiaqi Lv, Miao Xu, Lei Feng, Gang Niu, Xin Geng, and Masashi Sugiyama. Progressive identification of true labels for partial-label learning. InInternational Conference on Machine Learning, pages 6500–6510, 2020

  13. [13]

    Learning with noisy labels

    Nagarajan Natarajan, Inderjit S Dhillon, Pradeep K Ravikumar, and Ambuj Tewari. Learning with noisy labels. InAdvances in Neural Information Processing Systems, volume 26, 2013

  14. [14]

    Making deep neural networks robust to label noise: A loss correction approach

    Giorgio Patrini, Alessandro Rozza, Aditya Krishna Menon, Richard Nock, and Lizhen Qu. Making deep neural networks robust to label noise: A loss correction approach. InIEEE Conference on Computer Vision and Pattern Recognition, pages 1944–1952, 2017

  15. [15]

    Learning with complementary labels revisited: The selected-completely-at-random setting is more practical

    Wei Wang, Takashi Ishida, Yu-Jie Zhang, Gang Niu, and Masashi Sugiyama. Learning with complementary labels revisited: The selected-completely-at-random setting is more practical. InProceedings of the 41st International Conference on Machine Learning, 2024. 10

  16. [16]

    CLImage: Human-annotated datasets for complementary-label learning.Transactions on Machine Learning Research, 2025

    Hsiu-Hsuan Wang, Mai Tan Ha, Nai-Xuan Ye, Wei-I Lin, and Hsuan-Tien Lin. CLImage: Human-annotated datasets for complementary-label learning.Transactions on Machine Learning Research, 2025

  17. [17]

    Learning with biased complementary labels

    Xiyu Yu, Tongliang Liu, Mingming Gong, and Dacheng Tao. Learning with biased complementary labels. InEuropean Conference on Computer Vision, pages 68–83, 2018

  18. [18]

    NLNL: Negative learning for noisy labels

    Youngdong Kim, Junho Yim, Juseung Yun, and Junmo Kim. NLNL: Negative learning for noisy labels. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 101–110, 2019

  19. [19]

    Discriminative complementary-label learning with weighted loss

    Yi Gao and Min-Ling Zhang. Discriminative complementary-label learning with weighted loss. In International Conference on Machine Learning, pages 3587–3597, 2021

  20. [20]

    Reduction from complementary-label learning to probability estimates

    Wei-I Lin and Hsuan-Tien Lin. Reduction from complementary-label learning to probability estimates. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 469–481, 2023

  21. [21]

    libcll: an extendable python toolkit for complementary-label learning, 2024

    Nai-Xuan Ye, Tan-Ha Mai, Hsiu-Hsuan Wang, Wei-I Lin, and Hsuan-Tien Lin. libcll: an extendable python toolkit for complementary-label learning, 2024

  22. [22]

    Learning multiple layers of features from tiny images

    Alex Krizhevsky. Learning multiple layers of features from tiny images. Computer Science University of Toronto, Canada, 2009

  23. [23]

    Tiny ImageNet visual recognition challenge

    Ya Le and Xuan Yang. Tiny ImageNet visual recognition challenge. Report of CS231N: Deep Learning for Computer Vision Course, 2015. Stanford University

  24. [24]

    Visual instruction tuning

    Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. Visual instruction tuning. InAdvances in Neural Information Processing Systems, volume 36, pages 34892–34916. Curran Associates, Inc., 2023

  25. [25]

    Intra-cluster mixup: An effective data augmentation technique for complementary-label learning.Transactions on Machine Learning Research, 2026

    Tan-Ha Mai and Hsuan-Tien Lin. Intra-cluster mixup: An effective data augmentation technique for complementary-label learning.Transactions on Machine Learning Research, 2026

  26. [26]

    Exploring simple siamese representation learning

    Xinlei Chen and Kaiming He. Exploring simple siamese representation learning. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15750–15758, 2021

  27. [27]

    Wiley-Interscience, Hoboken, NJ, 2nd edition, 2006

    Thomas M Cover and Joy A Thomas.Elements of Information Theory. Wiley-Interscience, Hoboken, NJ, 2nd edition, 2006

  28. [28]

    Deep residual learning for image recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016

  29. [29]

    Complementary-label learning for arbitrary losses and models

    Takashi Ishida, Gang Niu, Aditya Menon, and Masashi Sugiyama. Complementary-label learning for arbitrary losses and models. InInternational Conference on Machine Learning, pages 2971–2980, 2019

  30. [30]

    Learning with multiple complementary labels

    Lei Feng, Takuo Kaneko, Bo Han, Gang Niu, Bo An, and Masashi Sugiyama. Learning with multiple complementary labels. InInternational Conference on Machine Learning, pages 3072–3081, 2020

  31. [31]

    Comco: Complementary supervised contrastive learning for complementary label learning.Neural Networks, 169:44–56, 2024

    Haoran Jiang, Zhihao Sun, and Yingjie Tian. Comco: Complementary supervised contrastive learning for complementary label learning.Neural Networks, 169:44–56, 2024

  32. [32]

    Tackling biased complementary label learning with large margin.Information Sciences, 687:121400, 2025

    Yiwei You, Jinglong Huang, Qiang Tong, and Bo Wang. Tackling biased complementary label learning with large margin.Information Sciences, 687:121400, 2025

  33. [33]

    Learning from noisy complementary labels with robust loss functions.IEICE Transactions on Information and Systems, 105:364–376, 2022

    Hiroki Ishiguro, Takashi Ishida, and Masashi Sugiyama. Learning from noisy complementary labels with robust loss functions.IEICE Transactions on Information and Systems, 105:364–376, 2022

  34. [34]

    Class-imbalanced complementary-label learning via weighted loss.Neural Networks, 166:555–565, 2023

    Meng Wei, Yong Zhou, Zhongnian Li, and Xinzheng Xu. Class-imbalanced complementary-label learning via weighted loss.Neural Networks, 166:555–565, 2023

  35. [35]

    Learning with noisy labels revisited: A study using real-world human annotations

    Jiaheng Wei, Zhaowei Zhu, Hao Cheng, Tongliang Liu, Gang Niu, and Yang Liu. Learning with noisy labels revisited: A study using real-world human annotations. InInternational Conference on Learning Representations, 2022

  36. [36]

    Learning imbalanced datasets with label-distribution-aware margin loss

    Kaidi Cao, Colin Wei, Adrien Gaidon, Nikos Arechiga, and Tengyu Ma. Learning imbalanced datasets with label-distribution-aware margin loss. InAdvances in Neural Information Processing Systems, volume 32, pages 1565–1576, 2019

  37. [37]

    Class-balanced loss based on effective number of samples

    Yin Cui, Menglin Jia, Tsung-Yi Lin, Yang Song, and Serge Belongie. Class-balanced loss based on effective number of samples. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9268–9277, 2019. 11

  38. [38]

    Improved Regularization of Convolutional Neural Networks with Cutout

    Terrance DeVries and Graham W Taylor. Improved regularization of convolutional neural networks with cutout.arXiv preprint arXiv:1708.04552, 2017

  39. [39]

    Autoaugment: Learning augmentation strategies from data

    Ekin D Cubuk, Barret Zoph, Dandelion Mane, Vijay Vasudevan, and Quoc V Le. Autoaugment: Learning augmentation strategies from data. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 113–123, 2019

  40. [40]

    Randaugment: Practical automated data augmentation with a reduced search space

    Ekin D Cubuk, Barret Zoph, Jonathon Shlens, and Quoc V Le. Randaugment: Practical automated data augmentation with a reduced search space. InIEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 702–703, 2020

  41. [41]

    A simple framework for contrastive learning of visual representations

    Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations. InInternational conference on machine learning, pages 1597–1607. PmLR, 2020

  42. [42]

    Byol works even without batch statistics

    Pierre H Richemond, Jean-Bastien Grill, Florent Altché, Corentin Tallec, Florian Strub, Andrew Brock, Samuel Smith, Soham De, Razvan Pascanu, Bilal Piot, et al. Byol works even without batch statistics. arXiv preprint arXiv:2010.10241, 2020

  43. [43]

    An empirical study of training self-supervised vision transformers

    Xinlei Chen, Saining Xie, and Kaiming He. An empirical study of training self-supervised vision transformers. InIEEE/CVF international conference on computer vision, pages 9640–9649, 2021

  44. [44]

    Qwen3 technical report, 2025

    An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, Fan Zhou, Fei Huang, Feng Hu, Hao Ge, Haoran Wei, Huan Lin, Jialong Tang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jing Zhou, Jingren Zhou, Junyang Lin, Kai Dang, Keqin Bao, Kexin Yang, ...

  45. [45]

    Gemma Team, Thomas Mesnard, Cassidy Hardin, Robert Dadashi, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, Pouya Tafti, Léonard Hussenot, Pier Giuseppe Sessa, Aakanksha Chowdhery, Adam Roberts, Aditya Barua, Alex Botev, Alex Castro-Ros, Ambrose Slone, Amélie Héliou, Andrea Tacchetti, Anna Bulanova, Anto...

  46. [46]

    Learning with complementary labels revisited: The selected-completely-at-random setting is more practical

    Wei Wang, Takashi Ishida, Yu-Jie Zhang, Gang Niu, and Masashi Sugiyama. Learning with complementary labels revisited: The selected-completely-at-random setting is more practical. InInternational Conference on Machine Learning, volume 235, pages 50683–50710, 2024

  47. [47]

    Consistent complementary-label learning via order-preserving losses

    Shuqi Liu, Yuzhou Cao, Qiaozhen Zhang, Lei Feng, and Bo An. Consistent complementary-label learning via order-preserving losses. InInternational Conference on Artificial Intelligence and Statistics, pages 8734–8748, 2023

  48. [48]

    PiCO: Contrastive label disambiguation for partial label learning

    Haobo Wang, Ruixuan Xiao, Yixuan Li, Lei Feng, Gang Niu, Gang Chen, and Junbo Zhao. PiCO: Contrastive label disambiguation for partial label learning. InInternational Conference on Learning Representations, 2022

  49. [49]

    Solar: Sinkhorn label refinery for imbalanced partial-label learning.Advances in neural information processing systems, 35:8104–8117, 2022

    Haobo Wang, Mingxuan Xia, Yixuan Li, Yuren Mao, Lei Feng, Gang Chen, and Junbo Zhao. Solar: Sinkhorn label refinery for imbalanced partial-label learning.Advances in neural information processing systems, 35:8104–8117, 2022. 12 A Proofs of Theoretical Results A.1 Proof of Theorem 1 Proof. We start with the standard Fano’s Inequality [ 27], which bounds th...

  50. [50]

    Rows were normalized to sum to 1

    Dense Bias:We generated a random transition matrix QBias ∈R C×C where QBias ij ∼U[0,1] fori̸=jandQ Bias ii = 0. Rows were normalized to sum to 1

  51. [51]

    Results.We computed the conditional entropy H(Y| ¯Y) for both matrices

    Sparse Bias (Ours):From QBias, we derived a sparse matrix QOurs by retaining k (k= 4 ) randomly selected elements per row and re-normalizing. Results.We computed the conditional entropy H(Y| ¯Y) for both matrices. The simulation revealed that: HOurs(Y| ¯Y)≤H Bias(Y| ¯Y) holds in100 percentage pointof the trials across all tested dimensions ( 10×10 , 100×1...

  52. [52]

    Effect of the Number of Sam- pled Labels

    VLM annotator: Which are provided in Appendix C.6 “Effect of the Number of Sam- pled Labels”. We selected 4 candidate labels as CLs for each class. Take note that the Appendix C.6 is also discard the true label from candidate labels

  53. [53]

    preference

    A rule-based annotator: discard the true label (reduce the candidate set to 4 classes), and then uniformly select one from the remaining 4 (all 4 are CLs) classes. We can do so since in Figure 2, we have the true class. Table 6: Comparison between the VLM annotator and the rule-based annotator on CIFAR-20 (accuracy (%), mean±std). Annotator Method Dataset...

  54. [54]

    A candidate set of 4 labels is uniformly sampled from the label space

  55. [55]

    The VLM (LLaV A) is prompted to select the label from this set that doesnotdescribe the image. Characteristics.While using the same protocol, the VLM annotator significantly reduces label noise, achieving a noise rate of approximately 0.24 percentage points on CIFAR-10, which is much lower than CLImage. However, contrary to the expectation that uniform ca...

  56. [56]

    The core idea relies on an inverse transition matrix to recover the unbiased risk of the true classifier

    proposed a framework to estimate the classification risk unbiasedly using CLs. The core idea relies on an inverse transition matrix to recover the unbiased risk of the true classifier. The general loss formulation is: RURE(g) = 1 N NX i=1 e⊤ ¯yi Q−1L(g(xi)), where e¯yi is the one-hot vector of the complementary label, and L(g(xi)) denotes the vector of lo...

  57. [57]

    Unlike risk-correction methods, CPE focuses on directly estimating the probability of a label being complementary, de- noted as p(¯y|x)

    introduced the Complementary Probability Estimation (CPE) framework. Unlike risk-correction methods, CPE focuses on directly estimating the probability of a label being complementary, de- noted as p(¯y|x). The objective is to minimize the divergence between the model’s output and the complementary target. CPE employs a surrogate complementary estimation l...