pith. machine review for the scientific record. sign in

arxiv: 2604.12568 · v1 · submitted 2026-04-14 · 💻 cs.CV

Recognition: unknown

Evolution-Inspired Sample Competition for Deep Neural Network Optimization

Authors on Pith no claims yet

Pith reviewed 2026-05-10 14:46 UTC · model grok-4.3

classification 💻 cs.CV
keywords natural selectionsample reweightingcomposite imagesdeep neural network trainingimage classificationevolution-inspired optimizationadaptive loss weightingclass imbalance handling
0
0 comments X

The pith

Natural Selection scores from composite images let samples compete to adaptively reweight their training losses in deep networks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that conventional uniform treatment of training samples in deep networks causes problems like bias on imbalanced classes, weak learning of hard examples, and reinforcement of noise. It introduces Natural Selection to fix this by assembling groups of samples into single composite images, running inference once on the resized composite, and deriving a score for each original sample based on how its prediction stands out relative to the group. These scores then scale the individual loss terms dynamically during backpropagation. A sympathetic reader would care because this turns sample handling from static and equal into explicitly competitive and context-dependent, using only one extra forward pass per group. The result is claimed to work across twelve datasets and four image classification tasks without task-specific tuning or extra model changes.

Core claim

By constructing composite images from multiple training samples, rescaling them to standard input size, and computing a natural selection score from the model's group-wise predictions, each sample receives a dynamic weight that reflects its relative competitive status; this weight then multiplies the sample's loss term so that stronger competitors in the artificial group contribute more to the gradient update while weaker ones are down-weighted, thereby injecting an explicit evolution-inspired competition mechanism into otherwise uniform optimization.

What carries the argument

The natural selection score, obtained by comparing a sample's prediction against others inside a composite image and used to scale its loss contribution.

If this is right

  • Class-imbalanced datasets receive automatic emphasis on minority samples that still compete successfully in their groups.
  • Hard samples that produce distinct predictions within composites receive higher effective learning rates without separate mining logic.
  • Noisy or mislabeled samples that lose the artificial competition are down-weighted, reducing their distorting effect on gradients.
  • The same procedure applies unchanged to any image classifier backbone and any of the four tested classification tasks.
  • No extra hyperparameters beyond group size and the standard optimizer schedule are required.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The composite-image trick could be extended to video or 3-D data by mixing frames or voxels, allowing competition-based reweighting in those domains.
  • Because the score depends only on relative prediction strength inside each group, it might serve as a diagnostic tool to identify which samples are currently driving the model's decisions.
  • Pairing NS with curriculum or self-paced learning schedules could test whether competition and difficulty ordering reinforce each other.
  • On very large web-scraped datasets the method might reduce the need for separate noise-cleaning pipelines.

Load-bearing premise

That how well a sample's prediction holds up inside an artificially assembled composite image truly measures its usefulness for the model's training on real separate images.

What would settle it

Training the identical network on the same data with NS turned off (uniform weights) versus on, then checking whether final accuracy and convergence speed remain statistically indistinguishable on a balanced clean dataset.

Figures

Figures reproduced from arXiv: 2604.12568 by Lap-Pui Chau, Ying Zheng, Yi Wang, Yiyi Zhang.

Figure 1
Figure 1. Figure 1: (a) Natural selection in ecosystem evolution. (b) Illustration of the natural selection process in training samples. After performing image stitching and [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The distribution curves of sample weights under different settings. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Visualization of NS score distributions by category on the CIFAR-10 and FI datasets. Red points denote training samples, and the blue solid line [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Correlation analysis on the CIFAR-LT-10/100 and Office-Home datasets. Blue points represent the individual categories, while the red line depicts [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
read the original abstract

Conventional deep network training generally optimizes all samples under a largely uniform learning paradigm, without explicitly modeling the heterogeneous competition among them. Such an oversimplified treatment can lead to several well-known issues, including bias under class imbalance, insufficient learning of hard samples, and the erroneous reinforcement of noisy samples. In this work, we present \textit{Natural Selection} (NS), a novel evolution-inspired optimization method that explicitly incorporates competitive interactions into deep network training. Unlike conventional sample reweighting strategies that rely mainly on predefined heuristics or static criteria, NS estimates the competitive status of each sample in a group-wise context and uses it to adaptively regulate its training contribution. Specifically, NS first assembles multiple samples into a composite image and rescales it to the original input size for model inference. Based on the resulting predictions, a natural selection score is computed for each sample to characterize its relative competitive variation within the constructed group. These scores are then used to dynamically reweight the sample-wise loss, thereby introducing an explicit competition-driven mechanism into the optimization process. In this way, NS provides a simple yet effective means of moving beyond uniform sample treatment and enables more adaptive and balanced model optimization. Extensive experiments on 12 public datasets across four image classification tasks demonstrate the effectiveness of the proposed method. Moreover, NS is compatible with diverse network architectures and does not depend on task-specific assumptions, indicating its strong generality and practical potential. The code will be made publicly available.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes Natural Selection (NS), an evolution-inspired optimization technique for deep neural networks. It constructs composite images from multiple training samples, rescales them to the original input dimensions, infers predictions on these composites, derives a natural selection score for each sample based on its relative competitive variation within the group, and applies these scores to dynamically reweight individual sample losses. The authors assert that this introduces explicit competition into the training process, mitigating issues such as class imbalance, under-learning of hard samples, and reinforcement of noise, and report superior performance across 12 datasets in image classification tasks.

Significance. If the empirical results hold and the NS scores demonstrably improve over uniform weighting, the method could provide a practical heuristic for adaptive sample treatment in DNN training. Its claimed generality across architectures and datasets, combined with the promised public code release, would support reproducibility and allow the community to test whether the evolution-inspired framing yields benefits beyond existing reweighting strategies.

major comments (2)
  1. [Method and Experiments] The central claim rests on the assumption that predictions on artificially assembled and rescaled composite images produce a natural selection score that accurately reflects a sample's relative training utility or competitive status under standard SGD. The construction introduces spatial distortions and non-natural inputs, so it is unclear whether these scores correlate with per-sample loss, gradient magnitude, or hardness metrics. This link is load-bearing for the 'competition-driven' framing; without direct validation (e.g., correlation plots or ablation against random reweighting), NS reduces to another heuristic whose benefits may not stem from the claimed mechanism. (Method description and experimental validation sections.)
  2. [Abstract and Experiments] The abstract states effectiveness on 12 datasets across four tasks but the provided summary supplies no quantitative results, error bars, or statistical tests. If the full experiments section reports only point estimates without controls for the composite construction artifacts or comparisons to strong baselines (e.g., focal loss, self-paced learning), the superiority claim cannot be evaluated. (Abstract and §4 Experiments.)
minor comments (2)
  1. [Method] Clarify the exact procedure for assembling composites and computing the NS score (e.g., how many samples per composite, rescaling method, and the precise formula for the score from predictions) to enable reproduction.
  2. [Abstract] The abstract mentions 'four image classification tasks' without naming them; list the tasks and datasets explicitly in the introduction for context.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the presentation and validation of our method. We address each major point below and outline the revisions we will make to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Method and Experiments] The central claim rests on the assumption that predictions on artificially assembled and rescaled composite images produce a natural selection score that accurately reflects a sample's relative training utility or competitive status under standard SGD. The construction introduces spatial distortions and non-natural inputs, so it is unclear whether these scores correlate with per-sample loss, gradient magnitude, or hardness metrics. This link is load-bearing for the 'competition-driven' framing; without direct validation (e.g., correlation plots or ablation against random reweighting), NS reduces to another heuristic whose benefits may not stem from the claimed mechanism. (Method description and experimental validation sections.)

    Authors: We agree that the composite construction introduces spatial distortions by design, as it creates a shared input space to simulate inter-sample competition analogous to evolutionary group dynamics. The rescaling step ensures compatibility with standard network inputs while preserving relative feature interactions within each composite. To directly validate the link between NS scores and training utility, we have computed correlations between the derived scores and established hardness proxies (per-sample loss and gradient magnitude) across multiple datasets; these show consistent positive associations supporting the competitive interpretation. We will add these correlation plots and analyses to the Method and Experiments sections. We have also included an ablation replacing NS scores with random reweighting, which yields inferior results, indicating that the benefits arise from the competition-based mechanism rather than arbitrary reweighting. These additions will be incorporated in the revision. revision: yes

  2. Referee: [Abstract and Experiments] The abstract states effectiveness on 12 datasets across four tasks but the provided summary supplies no quantitative results, error bars, or statistical tests. If the full experiments section reports only point estimates without controls for the composite construction artifacts or comparisons to strong baselines (e.g., focal loss, self-paced learning), the superiority claim cannot be evaluated. (Abstract and §4 Experiments.)

    Authors: The full manuscript reports results on 12 datasets with comparisons to multiple baselines, including focal loss and self-paced learning, as well as other reweighting approaches. However, we acknowledge that the current presentation uses point estimates without error bars or formal statistical tests and does not explicitly control for potential composite artifacts. We will revise the abstract to include key quantitative performance highlights and update the Experiments section to report mean results with standard deviations across multiple runs, include statistical significance tests (e.g., paired t-tests), and add ablations isolating the effect of composite construction. These enhancements will allow clearer evaluation of the superiority claims. revision: yes

Circularity Check

0 steps flagged

No significant circularity; heuristic procedure with external experimental support

full rationale

The paper defines Natural Selection (NS) explicitly as a procedural heuristic: assemble samples into composites, rescale to input size, compute a score from current-model predictions on those composites, and use the score to reweight per-sample loss. No closed-form derivation, first-principles prediction, or mathematical chain is claimed that would reduce the final performance claim to the construction itself. Effectiveness is asserted via experiments across 12 datasets rather than by tautological equivalence. No self-citations, fitted parameters renamed as predictions, or uniqueness theorems are invoked in the provided text to bear the central load. The method is therefore self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The method rests on the unproven assumption that relative prediction strength inside an artificial composite image is a useful proxy for a sample's training value; no free parameters or new physical entities are explicitly introduced in the abstract.

pith-pipeline@v0.9.0 · 5559 in / 1145 out tokens · 40367 ms · 2026-05-10T14:46:27.408788+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

82 extracted references · 2 canonical work pages

  1. [1]

    On the origin of species by means of natural selection, or the preservation of favoured races in the struggle for life,

    C. Darwinet al., “On the origin of species by means of natural selection, or the preservation of favoured races in the struggle for life,”Oxford Text Archive Core Collection, 1859

  2. [2]

    Gradient-based learning applied to document recognition,

    Y . LeCun, L. Bottou, Y . Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,”Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 2002

  3. [3]

    Goodfellow, Y

    I. Goodfellow, Y . Bengio, A. Courville, and Y . Bengio,Deep learning. MIT press Cambridge, 2016

  4. [4]

    A survey on unbalanced classification: How can evolutionary computation help?

    W. Pei, B. Xue, M. Zhang, L. Shang, X. Yao, and Q. Zhang, “A survey on unbalanced classification: How can evolutionary computation help?” IEEE Transactions on Evolutionary Computation, vol. 28, no. 2, pp. 353–373, 2023

  5. [5]

    Focal loss for dense object detection,

    T. Lin, P. Goyal, R. B. Girshick, K. He, and P. Doll ´ar, “Focal loss for dense object detection,” inIEEE International Conference on Computer Vision, 2017, pp. 2980–2988

  6. [6]

    Learning from noisy labels with deep neural networks: A survey,

    H. Song, M. Kim, D. Park, Y . Shin, and J.-G. Lee, “Learning from noisy labels with deep neural networks: A survey,”IEEE Transactions on Neural Networks and Learning Systems, vol. 34, no. 11, pp. 8135– 8153, 2022

  7. [7]

    Class-balanced loss based on effective number of samples,

    Y . Cui, M. Jia, T.-Y . Lin, Y . Song, and S. Belongie, “Class-balanced loss based on effective number of samples,” inIEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 9268–9277

  8. [8]

    Training region-based object detectors with online hard example mining,

    A. Shrivastava, A. Gupta, and R. Girshick, “Training region-based object detectors with online hard example mining,” inIEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 761–769

  9. [9]

    Data optimization in deep learning: A survey,

    O. Wu and R. Yao, “Data optimization in deep learning: A survey,” IEEE Transactions on Knowledge and Data Engineering, 2025

  10. [10]

    Deep semantic parsing of freehand sketches with homogeneous transformation, soft-weighted loss, and staged learning,

    Y . Zheng, H. Yao, and X. Sun, “Deep semantic parsing of freehand sketches with homogeneous transformation, soft-weighted loss, and staged learning,”IEEE Transactions on Multimedia, vol. 23, pp. 3590– 3602, 2020

  11. [11]

    Self-paced learning for latent variable models,

    M. Kumar, B. Packer, and D. Koller, “Self-paced learning for latent variable models,”Advances in Neural Information Processing Systems, vol. 23, 2010

  12. [12]

    Active bias: Training more accurate neural networks by emphasizing high variance samples,

    H.-S. Chang, E. Learned-Miller, and A. McCallum, “Active bias: Training more accurate neural networks by emphasizing high variance samples,”Advances in Neural Information Processing Systems, vol. 30, 2017

  13. [13]

    An instance selection assisted evolutionary method for high-dimensional feature selection,

    M. Xia, K. Li, J. Wan, Z. Huang, and F. Cheng, “An instance selection assisted evolutionary method for high-dimensional feature selection,” IEEE Transactions on Evolutionary Computation, 2025

  14. [14]

    Region purity-based local feature selec- tion: A multiobjective perspective,

    Y . Zhou, Y . Qiu, and S. Kwong, “Region purity-based local feature selec- tion: A multiobjective perspective,”IEEE Transactions on Evolutionary Computation, vol. 27, no. 4, pp. 787–801, 2022

  15. [15]

    Dynamically weighted balanced loss: class imbalanced learning and confidence calibration of deep neural networks,

    K. R. M. Fernando and C. P. Tsokos, “Dynamically weighted balanced loss: class imbalanced learning and confidence calibration of deep neural networks,”IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 7, pp. 2940–2951, 2021

  16. [16]

    Learning to re-weight examples with optimal transport for imbalanced classification,

    D. Guo, Z. Li, H. Zhao, M. Zhou, H. Zhaet al., “Learning to re-weight examples with optimal transport for imbalanced classification,”Ad- vances in Neural Information Processing Systems, vol. 35, pp. 25 517– 25 530, 2022

  17. [17]

    Learning fast sample re-weighting without reward data,

    Z. Zhang and T. Pfister, “Learning fast sample re-weighting without reward data,” inIEEE/CVF International Conference on Computer Vision, 2021, pp. 725–734

  18. [18]

    Cosw: Conditional sample weighting for smoke segmentation with label noise,

    L. Yao, H. Zhao, Z. Wang, K. Zhao, and J. Peng, “Cosw: Conditional sample weighting for smoke segmentation with label noise,”Advances in Neural Information Processing Systems, vol. 37, pp. 106 743–106 767, 2024

  19. [19]

    Adversarial reweighting for partial domain adaptation,

    X. Gu, X. Yu, J. Sun, Z. Xuet al., “Adversarial reweighting for partial domain adaptation,”Advances in Neural Information Processing Systems, vol. 34, pp. 14 860–14 872, 2021

  20. [20]

    Evolutionary instance selection with multiple partial adaptive classifiers for domain adaptation,

    B. H. Nguyen, B. Xue, P. Andreae, and M. Zhang, “Evolutionary instance selection with multiple partial adaptive classifiers for domain adaptation,”IEEE Transactions on Evolutionary Computation, vol. 29, no. 1, pp. 46–60, 2023

  21. [21]

    A short survey on importance weighting for machine learning,

    M. Kimura and H. Hino, “A short survey on importance weighting for machine learning,”Transactions on Machine Learning Research, 2024

  22. [22]

    Optimizing importance weighting in the presence of sub-population shifts,

    F. Holstege, B. Wouters, N. Van Giersbergen, and C. Diks, “Optimizing importance weighting in the presence of sub-population shifts,” in International Conference on Learning Representations, 2025

  23. [23]

    Meta-weight-net: Learning an explicit mapping for sample weighting,

    J. Shu, Q. Xie, L. Yi, Q. Zhao, S. Zhou, Z. Xu, and D. Meng, “Meta-weight-net: Learning an explicit mapping for sample weighting,” Advances in Neural Information Processing Systems, vol. 32, 2019

  24. [24]

    Cmw-net: Learning a class-aware sample weighting mapping for robust deep learning,

    J. Shu, X. Yuan, D. Meng, and Z. Xu, “Cmw-net: Learning a class-aware sample weighting mapping for robust deep learning,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 10, pp. 11 521–11 539, 2023

  25. [25]

    Which samples should be learned first: Easy or hard?

    X. Zhou and O. Wu, “Which samples should be learned first: Easy or hard?”IEEE Transactions on Neural Networks and Learning Systems, vol. 35, no. 11, pp. 15 264–15 278, 2024

  26. [26]

    Reweighting augmented samples by minimizing the maximal expected loss,

    M. Yi, L. Hou, L. Shang, X. Jiang, Q. Liu, and Z.-M. Ma, “Reweighting augmented samples by minimizing the maximal expected loss,” in International Conference on Learning Representations, 2021

  27. [27]

    Umix: Improving importance weighting for subpopulation shift via uncertainty-aware mixup,

    Z. Han, Z. Liang, F. Yang, L. Liu, L. Li, Y . Bian, P. Zhao, B. Wu, C. Zhang, and J. Yao, “Umix: Improving importance weighting for subpopulation shift via uncertainty-aware mixup,”Advances in Neural Information Processing Systems, vol. 35, pp. 37 704–37 718, 2022

  28. [28]

    A survey of mix-based data augmentation: Taxonomy, methods, applications, and explainability,

    C. Cao, F. Zhou, Y . Dai, J. Wang, and K. Zhang, “A survey of mix-based data augmentation: Taxonomy, methods, applications, and explainability,”ACM Computing Surveys, vol. 57, no. 2, pp. 1–38, 2024

  29. [29]

    mixup: Beyond empirical risk minimization,

    H. Zhang, M. Cisse, Y . N. Dauphin, and D. Lopez-Paz, “mixup: Beyond empirical risk minimization,” inInternational Conference on Learning Representations, 2018

  30. [30]

    Cutmix: Reg- ularization strategy to train strong classifiers with localizable features,

    S. Yun, D. Han, S. J. Oh, S. Chun, J. Choe, and Y . Yoo, “Cutmix: Reg- ularization strategy to train strong classifiers with localizable features,” inIEEE/CVF International Conference on Computer Vision, 2019, pp. 6023–6032

  31. [31]

    Data augmentation using random image cropping and patching for deep cnns,

    R. Takahashi, T. Matsubara, and K. Uehara, “Data augmentation using random image cropping and patching for deep cnns,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 9, pp. 2917– 2931, 2019

  32. [32]

    Rankmixup: Ranking-based mixup training for network calibration,

    J. Noh, H. Park, J. Lee, and B. Ham, “Rankmixup: Ranking-based mixup training for network calibration,” inIEEE/CVF International Conference on Computer Vision, 2023, pp. 1358–1368

  33. [33]

    Tailoring mixup to data for calibration,

    Q. Bouniot, P. Mozharovskyi, and F. d’Alch ´e Buc, “Tailoring mixup to data for calibration,” inInternational Conference on Learning Repre- sentations, 2025

  34. [34]

    Evolutionary computation meets machine learning: A survey,

    J. Zhang, Z.-h. Zhan, Y . Lin, N. Chen, Y .-j. Gong, J.-h. Zhong, H. S. Chung, Y . Li, and Y .-h. Shi, “Evolutionary computation meets machine learning: A survey,”IEEE Computational Intelligence Magazine, vol. 6, no. 4, pp. 68–75, 2011

  35. [35]

    A survey on evolutionary computation for computer vision and image analysis: Past, present, and future trends,

    Y . Bi, B. Xue, P. Mesejo, S. Cagnoni, and M. Zhang, “A survey on evolutionary computation for computer vision and image analysis: Past, present, and future trends,”IEEE Transactions on Evolutionary Computation, vol. 27, no. 1, pp. 5–25, 2022

  36. [36]

    Bridging evolution- ary algorithms and reinforcement learning: A comprehensive survey on hybrid algorithms,

    P. Li, J. Hao, H. Tang, X. Fu, Y . Zhen, and K. Tang, “Bridging evolution- ary algorithms and reinforcement learning: A comprehensive survey on hybrid algorithms,”IEEE Transactions on Evolutionary Computation, 2024

  37. [37]

    Reputation-based interaction promotes cooper- ation with reinforcement learning,

    T. Ren and X.-J. Zeng, “Reputation-based interaction promotes cooper- ation with reinforcement learning,”IEEE Transactions on Evolutionary Computation, vol. 28, no. 4, pp. 1177–1188, 2023

  38. [38]

    Niching genetic programming to learn actions for deep reinforcement learning in dynamic flexible scheduling,

    M. Xu, Y . Mei, F. Zhang, and M. Zhang, “Niching genetic programming to learn actions for deep reinforcement learning in dynamic flexible scheduling,”IEEE Transactions on Evolutionary Computation, 2024

  39. [39]

    Genetic multi-armed bandits: a reinforcement learning inspired approach for simulation optimization,

    D. Preil and M. Krapp, “Genetic multi-armed bandits: a reinforcement learning inspired approach for simulation optimization,”IEEE Transac- tions on Evolutionary Computation, vol. 29, no. 2, pp. 360–374, 2024

  40. [40]

    Evolutionary search for complete neural network architectures with partial weight sharing,

    H. Zhang, Y . Jin, and K. Hao, “Evolutionary search for complete neural network architectures with partial weight sharing,”IEEE Transactions on Evolutionary Computation, vol. 26, no. 5, pp. 1072–1086, 2022

  41. [41]

    Comprehensive-forecast multi- objective genetic programming for neural architecture search,

    B. Cao, X. Luo, X. Liu, and Y . Li, “Comprehensive-forecast multi- objective genetic programming for neural architecture search,”IEEE Transactions on Evolutionary Computation, 2025

  42. [42]

    Learning multiple layers of features from tiny images,

    A. Krizhevsky, “Learning multiple layers of features from tiny images,” Technical Report, 2009. EVOLUTION-INSPIRED SAMPLE COMPETITION FOR DEEP NEURAL NETWORK OPTIMIZATION 12

  43. [43]

    Imagenet large scale visual recognition challenge,

    O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernsteinet al., “Imagenet large scale visual recognition challenge,”International Journal of Computer Vision, vol. 115, no. 3, pp. 211–252, 2015

  44. [44]

    Robust image sentiment analysis using progressively trained and domain transferred deep networks,

    Q. You, J. Luo, H. Jin, and J. Yang, “Robust image sentiment analysis using progressively trained and domain transferred deep networks,” in AAAI Conference on Artificial Intelligence, 2015

  45. [45]

    Large-scale visual sentiment ontology and detectors using adjective noun pairs,

    D. Borth, R. Ji, T. Chen, T. Breuel, and S.-F. Chang, “Large-scale visual sentiment ontology and detectors using adjective noun pairs,” inACM International Conference on Multimedia, 2013, pp. 223–232

  46. [46]

    Image sentiment analysis using latent correlations among visual, textual, and sentiment views,

    M. Katsurai and S. Satoh, “Image sentiment analysis using latent correlations among visual, textual, and sentiment views,” inIEEE International Conference on Acoustics, Speech and Signal Processing, 2016, pp. 2837–2841

  47. [47]

    Building a large scale dataset for image emotion recognition: The fine print and the benchmark,

    Q. You, J. Luo, H. Jin, and J. Yang, “Building a large scale dataset for image emotion recognition: The fine print and the benchmark,” inAAAI Conference on Artificial Intelligence, 2016

  48. [48]

    Emoset: A large-scale visual emotion dataset with rich attributes,

    J. Yang, Q. Huang, T. Ding, D. Lischinski, D. Cohen-Or, and H. Huang, “Emoset: A large-scale visual emotion dataset with rich attributes,” in IEEE/CVF International Conference on Computer Vision, 2023, pp. 20 383–20 394

  49. [49]

    Fuzzy-aware loss for source-free domain adaptation in visual emotion recognition,

    Y . Zheng, Y . Zhang, Y . Wang, and L.-P. Chau, “Fuzzy-aware loss for source-free domain adaptation in visual emotion recognition,”IEEE Transactions on Fuzzy Systems, 2025

  50. [50]

    Deep hashing network for unsupervised domain adaptation,

    H. Venkateswara, J. Eusebio, S. Chakraborty, and S. Panchanathan, “Deep hashing network for unsupervised domain adaptation,” inIEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 5018–5027

  51. [51]

    Imagenet classification with deep convolutional neural networks,

    A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” inAdvances in Neural Infor- mation Processing Systems, 2012, pp. 1097–1105

  52. [52]

    Very deep convolutional networks for large-scale image recognition,

    K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” inInternational Conference on Learning Representations, 2015

  53. [53]

    Deep residual learning for image recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inIEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778

  54. [54]

    Aggregated residual transformations for deep neural networks,

    S. Xie, R. Girshick, P. Doll ´ar, Z. Tu, and K. He, “Aggregated residual transformations for deep neural networks,” inIEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 1492–1500

  55. [55]

    Densely connected convolutional networks,

    G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” inIEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 2261–2269

  56. [56]

    An image is worth 16x16 words: Transformers for image recognition at scale,

    A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gellyet al., “An image is worth 16x16 words: Transformers for image recognition at scale,” inInternational Conference on Learning Representations, 2020

  57. [57]

    Swin transformer: Hierarchical vision transformer using shifted windows,

    Z. Liu, Y . Lin, Y . Cao, H. Hu, Y . Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” inIEEE/CVF International Conference on Computer Vision, 2021, pp. 10 012–10 022

  58. [58]

    Vmamba: Visual state space model,

    Y . Liu, Y . Tian, Y . Zhao, H. Yu, L. Xie, Y . Wang, Q. Ye, J. Jiao, and Y . Liu, “Vmamba: Visual state space model,”Advances in Neural Information Processing Systems, vol. 37, pp. 103 031–103 063, 2024

  59. [59]

    Adam: A method for stochastic optimization,

    D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” inInternational Conference on Learning Representations, 2015

  60. [60]

    Do we really need to access the source data? source hypothesis transfer for unsupervised domain adaptation,

    J. Liang, D. Hu, and J. Feng, “Do we really need to access the source data? source hypothesis transfer for unsupervised domain adaptation,” in International Conference on Machine Learning, 2020, pp. 6028–6039

  61. [61]

    Procal: Probability calibration for neighborhood-guided source-free domain adaptation,

    Y . Zheng, Y . Zhang, Y . Wang, and L.-P. Chau, “Procal: Probability calibration for neighborhood-guided source-free domain adaptation,” arXiv:2603.18764, 2026

  62. [62]

    Generalized cross entropy loss for training deep neural networks with noisy labels,

    Z. Zhang and M. R. Sabuncu, “Generalized cross entropy loss for training deep neural networks with noisy labels,” inAdvances in Neural Information Processing Systems, 2018, pp. 8792–8802

  63. [63]

    When does label smoothing help?

    R. M ¨uller, S. Kornblith, and G. E. Hinton, “When does label smoothing help?”Advances in Neural Information Processing Systems, vol. 32, 2019

  64. [64]

    Nlnl: Negative learning for noisy labels,

    Y . Kim, J. Yim, J. Yun, and J. Kim, “Nlnl: Negative learning for noisy labels,” inIEEE/CVF International Conference on Computer Vision, 2019, pp. 101–110

  65. [65]

    Symmetric cross entropy for robust learning with noisy labels,

    Y . Wang, X. Ma, Z. Chen, Y . Luo, J. Yi, and J. Bailey, “Symmetric cross entropy for robust learning with noisy labels,” inIEEE/CVF International Conference on Computer Vision, 2019, pp. 322–330

  66. [66]

    Normalized loss functions for deep learning with noisy labels,

    X. Ma, H. Huang, Y . Wang, S. Romano, S. Erfani, and J. Bailey, “Normalized loss functions for deep learning with noisy labels,” in International Conference on Machine Learning, 2020, pp. 6543–6553

  67. [67]

    Low: Training deep neural networks by learning optimal sample weights,

    C. Santiago, C. Barata, M. Sasdelli, G. Carneiro, and J. C. Nascimento, “Low: Training deep neural networks by learning optimal sample weights,”Pattern Recognition, vol. 110, p. 107585, 2021

  68. [68]

    Polyloss: A polynomial expansion perspective of classification loss functions,

    Z. Leng, M. Tan, C. Liu, E. D. Cubuk, J. Shi, S. Cheng, and D. Anguelov, “Polyloss: A polynomial expansion perspective of classification loss functions,” inInternational Conference on Learning Representations, 2022

  69. [69]

    Active negative loss func- tions for learning with noisy labels,

    X. Ye, X. Li, T. Liu, Y . Sun, W. Tonget al., “Active negative loss func- tions for learning with noisy labels,”Advances in Neural Information Processing Systems, vol. 36, pp. 6917–6940, 2023

  70. [70]

    Asymmetric loss functions for noise-tolerant learning: Theory and applications,

    X. Zhou, X. Liu, D. Zhai, J. Jiang, and X. Ji, “Asymmetric loss functions for noise-tolerant learning: Theory and applications,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 7, pp. 8094– 8109, 2023

  71. [71]

    ϵ-softmax: Approximating one-hot vectors for mitigating label noise,

    J. Wang, X. Zhou, D. Zhai, J. Jiang, X. Ji, and X. Liu, “ϵ-softmax: Approximating one-hot vectors for mitigating label noise,”Advances in Neural Information Processing Systems, vol. 37, pp. 32 012–32 038, 2024

  72. [72]

    Optimized gradient clipping for noisy label learning,

    X. Ye, Y . Wu, W. Zhang, X. Li, Y . Chen, and C. Jin, “Optimized gradient clipping for noisy label learning,” inAAAI Conference on Artificial Intelligence, vol. 39, 2025, pp. 9463–9471

  73. [73]

    Imagenet-hard: The hardest images remaining from a study of the power of zoom and spatial biases in image classification,

    M. R. Taesiri, G. Nguyen, S. Habchi, C.-P. Bezemer, and A. Nguyen, “Imagenet-hard: The hardest images remaining from a study of the power of zoom and spatial biases in image classification,”Advances in Neural Information Processing Systems, vol. 36, pp. 35 878–35 953, 2023

  74. [74]

    Learning word vectors for sentiment analysis,

    A. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y . Ng, and C. Potts, “Learning word vectors for sentiment analysis,” inAnnual Meeting of The Association for Computational Linguistics: Human Language Technologies, 2011, pp. 142–150

  75. [75]

    Character-level convolutional net- works for text classification,

    X. Zhang, J. Zhao, and Y . LeCun, “Character-level convolutional net- works for text classification,”Advances in Neural Information Process- ing Systems, vol. 28, 2015

  76. [76]

    Convolutional neural networks for sentence classification,

    Y . Kim, “Convolutional neural networks for sentence classification,” inConference on Empirical Methods in Natural Language Processing, 2014, pp. 1746–1751

  77. [77]

    Bert: Pre-training of deep bidirectional transformers for language understanding,

    J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” inNorth American Chapter of The Association for Computational Linguistics: Human Language Technologies, 2019, pp. 4171–4186

  78. [78]

    Leaf cultivar identification via prototype-enhanced learning,

    Y . Zhang, Z. Ying, Y . Zheng, C. Wu, N. Li, F. Wang, J. Wang, X. Feng, and X. Xu, “Leaf cultivar identification via prototype-enhanced learning,”Computer Vision and Image Understanding, vol. 250, p. 104221, 2025

  79. [79]

    How well do self- supervised methods perform in cross-domain few-shot learning?

    Y . Zhang, Y . Zheng, X. Xu, and J. Wang, “How well do self- supervised methods perform in cross-domain few-shot learning?” arXiv:2202.09014, 2022

  80. [80]

    Distinctive action sketch for human action recognition,

    Y . Zheng, H. Yao, X. Sun, S. Zhao, and F. Porikli, “Distinctive action sketch for human action recognition,”Signal Processing, vol. 144, pp. 323–332, 2018

Showing first 80 references.