Why Not Hyperparameter-Friendly Optimisation? A Monotonic Adaptive Norm Rescaling Approach For Long-Tailed Recognition

Chenqi Li; Shuo Zhang; Tingting Zhu

arxiv: 2606.02526 · v1 · pith:YWP6SHMGnew · submitted 2026-06-01 · 💻 cs.CV · cs.AI

Why Not Hyperparameter-Friendly Optimisation? A Monotonic Adaptive Norm Rescaling Approach For Long-Tailed Recognition

Shuo Zhang , Chenqi Li , Tingting Zhu This is my paper

Pith reviewed 2026-06-28 14:45 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords long-tailed recognitionnorm rescalingmonotonic normalizationPool Adjacent Violators Algorithmhyperparameter-free optimizationclassifier retrainingdeep neural networksimage classification

0 comments

The pith

Enforcing monotonicity on per-class weight norms via PAVA eliminates regularization hyperparameters in long-tailed recognition.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses the sensitivity of long-tailed recognition to hyperparameters in adaptive norm rescaling during classifier retraining. It provides a class-conditional distribution perspective to justify norm rescaling and proposes SAMN, which directly enforces monotonic per-class weight norms using the Pool Adjacent Violators Algorithm without any regularization parameters. This hyperparameter-friendly method integrates with other techniques and improves performance on standard benchmarks, often reaching state-of-the-art levels.

Core claim

By applying the Pool Adjacent Violators Algorithm to enforce monotonicity directly on per-class weight norms, SAMN achieves the benefits of adaptive norm rescaling in a way that requires no parameter regularization or post-hoc tuning, as supported by the class-conditional distribution view, and acts as a universal enhancement for long-tailed recognition.

What carries the argument

Self-Adaptive Monotonic Normalization (SAMN) that applies the Pool Adjacent Violators Algorithm to enforce monotonic per-class weight norms.

Load-bearing premise

That directly enforcing monotonic per-class weight norms via PAVA fully captures the benefits justified by the class-conditional distribution perspective without needing regularization.

What would settle it

An experiment demonstrating that SAMN fails to match or exceed the performance of carefully tuned regularization-based norm rescaling methods on multiple long-tailed datasets.

Figures

Figures reproduced from arXiv: 2606.02526 by Chenqi Li, Shuo Zhang, Tingting Zhu.

**Figure 2.** Figure 2: Loss of various methods during retraining with IF = [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Data and class-conditional distributions of the head class and the rare class. In the training set, the head and rare classes are [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 5.** Figure 5: Accuracy (%) curves illustrate the performance sensitivity [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

read the original abstract

Long-tailed recognition poses a significant challenge for deep learning. The two-stage decoupling paradigm, which separates representation learning from classifier retraining, offers a promising solution. During the classifier retraining stage, adaptive norm rescaling is a popular technique. It adjusts the per-class weight norms via parameter regularization, which inevitably introduces hyperparameters. However, many studies report that long-tailed recognition is sensitive to these hyperparameters, as their setup significantly impacts performance. In this paper, we first provide a class-conditional distribution perspective to support norm rescaling methods. Furthermore, we propose a simple but effective approach called Self-Adaptive Monotonic Normalization (SAMN). SAMN avoids the need for parameter regularization. It directly enforces monotonicity on per-class weight norms using the Pool Adjacent Violators Algorithm, making the method hyperparameter-friendly. SAMN is a universal strategy that integrates seamlessly with other methods for enhanced performance. Experiments on benchmark datasets demonstrate that our method significantly boosts long-tailed recognition performance, often achieving state-of-the-art results.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SAMN replaces regularization with a PAVA projection onto monotonic norms, but the class-conditional motivation does not establish that any monotonic assignment preserves the intended benefit.

read the letter

The core move is to drop the regularization term that normally sets per-class weight norms and instead run the Pool Adjacent Violators Algorithm to force the norms into monotonic order. That is the concrete novelty relative to the norm-rescaling papers the abstract cites.

The paper does two things cleanly. It names the practical problem that long-tailed classifier retraining is sensitive to the regularization strength, and it offers a direct, parameter-free fix that can be dropped into existing two-stage pipelines. The claim that it composes with other methods is at least plausible on its face.

The soft spot is the justification. The class-conditional distribution view is invoked to support norm rescaling in general, yet the method then substitutes an isotonic projection without showing that the resulting magnitudes (not just the ordering) are close to what the regularized optimum would have been. If the benefit depends on the specific scale rather than the rank order, the substitution loses the original rationale while appearing hyperparameter-free. The abstract supplies no numbers, no ablation on the projection step, and no comparison of the actual norm values before and after PAVA, so that gap cannot be checked from the given text.

This is aimed at people already working on two-stage long-tailed recognition who want to reduce tuning. A reader who cares about that sub-area might want to see the full experiments and the norm-value tables. The work shows clear thinking about the hyperparameter issue and engages the existing literature on its own terms, so it is worth sending out for review even if the current justification needs tightening.

Referee Report

2 major / 2 minor

Summary. The paper claims that long-tailed recognition benefits from adaptive norm rescaling of per-class classifier weights during the second stage of the decoupling paradigm. It motivates this via a class-conditional distribution perspective, then replaces regularization-based rescaling (which requires hyperparameters) with Self-Adaptive Monotonic Normalization (SAMN): a direct projection onto the monotonic cone via the Pool Adjacent Violators Algorithm (PAVA). The resulting method is asserted to be hyperparameter-free, universal (integrates with other techniques), and empirically superior, often reaching SOTA on standard benchmarks.

Significance. If the central substitution is valid and the reported gains are reproducible, the contribution would be practically useful: it removes a known source of tuning sensitivity in long-tailed methods while preserving the norm-rescaling benefit. The universality claim, if substantiated by controlled ablations, would further increase impact.

major comments (2)

[class-conditional distribution perspective and SAMN definition] The class-conditional distribution perspective is invoked to justify norm rescaling, yet the manuscript does not derive that the specific magnitudes produced by regularization are recovered (or even approximated) by the PAVA monotonic projection; only the ordering constraint is enforced. If magnitudes matter for matching the distribution-derived optimum, the claimed performance equivalence does not follow.
[experimental section] No quantitative comparison is supplied showing that SAMN norms match or improve upon the regularization-derived norms on the same backbone and data; the experiments instead report end-to-end accuracy gains without isolating whether the monotonic projection alone accounts for the improvement or whether other factors (e.g., implicit regularization from PAVA) are at work.

minor comments (2)

[abstract] The abstract states that SAMN 'often achieving state-of-the-art results' but supplies no dataset names, metrics, or baseline comparisons; the full experimental tables should be referenced already in the abstract for clarity.
[method section] Notation for per-class weight norms and the precise input/output of the PAVA step should be defined with an equation before the algorithm description.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major comment below with clarifications and indicate planned revisions.

read point-by-point responses

Referee: [class-conditional distribution perspective and SAMN definition] The class-conditional distribution perspective is invoked to justify norm rescaling, yet the manuscript does not derive that the specific magnitudes produced by regularization are recovered (or even approximated) by the PAVA monotonic projection; only the ordering constraint is enforced. If magnitudes matter for matching the distribution-derived optimum, the claimed performance equivalence does not follow.

Authors: The class-conditional distribution perspective is used to motivate why norm rescaling is beneficial, specifically by establishing that optimal per-class weight norms should increase monotonically with class frequency. Regularization-based methods achieve monotonicity indirectly via hyperparameter tuning. SAMN directly projects onto the monotonic cone with PAVA to enforce this ordering without hyperparameters. The manuscript does not claim that PAVA recovers the exact magnitudes from regularization; the emphasis is on the ordering constraint as the essential property. We will revise the text to make this distinction explicit and avoid any implication of magnitude equivalence. revision: partial
Referee: [experimental section] No quantitative comparison is supplied showing that SAMN norms match or improve upon the regularization-derived norms on the same backbone and data; the experiments instead report end-to-end accuracy gains without isolating whether the monotonic projection alone accounts for the improvement or whether other factors (e.g., implicit regularization from PAVA) are at work.

Authors: We agree that isolating the effect of the monotonic projection would strengthen the experimental section. In the revision we will add a quantitative comparison of the per-class weight norms produced by SAMN versus regularization-based rescaling on identical backbones and datasets, including tables or plots of the norms themselves. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained against external benchmarks

full rationale

The paper motivates adaptive norm rescaling via a class-conditional distribution perspective, then substitutes a direct PAVA projection onto the monotonic cone for regularization. No quoted equation or step reduces the claimed optimum (or performance gain) by construction to a fitted parameter, self-cited prior result, or input quantity defined in terms of the output. The PAVA step is an independent algorithmic choice whose benefit is asserted via experiments on benchmarks, not definitional equivalence. Self-citations, if present, are not load-bearing for the central claim.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on a domain assumption about class-conditional distributions and on the algorithmic correctness of PAVA; no free parameters are introduced and no new entities are postulated.

axioms (1)

domain assumption Class-conditional distributions provide a justification for norm rescaling that can be realized by monotonicity enforcement alone.
Abstract states this perspective is first provided to support the method.

pith-pipeline@v0.9.1-grok · 5710 in / 1255 out tokens · 24949 ms · 2026-06-28T14:45:17.245221+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

51 extracted references · 6 canonical work pages · 1 internal anchor

[1]

CUDA: Curriculum of data augmentation for long-tailed recognition

Sumyeong Ahn, Jongwoo Ko, and Se-Young Yun. CUDA: Curriculum of data augmentation for long-tailed recognition. InThe Eleventh International Conference on Learning Repre- sentations, 2023. 1, 2, 7

2023
[2]

Long-tailed recognition via weight balancing

Shaden Alshammari, Yu-Xiong Wang, Deva Ramanan, and Shu Kong. Long-tailed recognition via weight balancing. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022. 1, 2, 3, 4, 7

2022
[3]

Deep over-sampling frame- work for classifying imbalanced data

Shin Ando and Chun Yuan Huang. Deep over-sampling frame- work for classifying imbalanced data. InMachine Learning and Knowledge Discovery in Databases, 2017. 1, 2

2017
[4]

Miriam Ayer, H. D. Brunk, G. M. Ewing, W. T. Reid, and Edward Silverman. An empirical distribution function for sampling with incomplete information.The Annals of Mathe- matical Statistics, 26(4):641–647, 1955. 2, 4

1955
[5]

Learning imbalanced datasets with label- distribution-aware margin loss

Kaidi Cao, Colin Wei, Adrien Gaidon, Nikos Arechiga, and Tengyu Ma. Learning imbalanced datasets with label- distribution-aware margin loss. InAdvances in Neural Infor- mation Processing Systems, 2019. 7

2019
[6]

Chawla, Kevin W

Nitesh V . Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. Smote: synthetic minority over- sampling technique.Journal of Artificial Intelligence Re- search, 16(1):321–357, 2002. 1, 2

2002
[7]

Area: Adaptive reweighting via effective area for long-tailed classification

Xiaohua Chen, Yucan Zhou, Dayan Wu, Chule Yang, Bo Li, Qinghua Hu, and Weiping Wang. Area: Adaptive reweighting via effective area for long-tailed classification. InProceed- ings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023. 7

2023
[8]

Fea- ture space augmentation for long-tailed data.arXiv preprint arXiv:2008.03673, 2020

Peng Chu, Xiao Bian, Shaopeng Liu, and Haibin Ling. Fea- ture space augmentation for long-tailed data.arXiv preprint arXiv:2008.03673, 2020. 1, 2

work page arXiv 2008
[9]

Class-balanced loss based on effective number of samples

Yin Cui, Menglin Jia, Tsung-Yi Lin, Yang Song, and Serge Belongie. Class-balanced loss based on effective number of samples. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019. 1, 2

2019
[10]

Inverse weight-balancing for deep long-tailed learn- ing

Wenqi Dang, Zhou Yang, Weisheng Dong, Xin Li, and Guang- ming Shi. Inverse weight-balancing for deep long-tailed learn- ing. InProceedings of the AAAI Conference on Artificial Intelligence, 2024. 1, 2, 3, 4, 7

2024
[11]

Imagenet: A large-scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009. 12

2009
[12]

Global and local mixture consistency cumulative learning for long-tailed visual recognitions

Fei Du, Peng Yang, Qi Jia, Fengtao Nan, Xiaoting Chen, and Yun Yang. Global and local mixture consistency cumulative learning for long-tailed visual recognitions. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023. 1, 2, 4, 7, 12

2023
[13]

Adaptive subgra- dient methods for online learning and stochastic optimization

John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgra- dient methods for online learning and stochastic optimization. The Journal of Machine Learning Research, page 2121–2159,
[14]

Comparing biases for minimal network construction with back-propagation

Stephen Hanson and Lorien Pratt. Comparing biases for minimal network construction with back-propagation. In Advances in Neural Information Processing Systems, 1988. 5, 6

1988
[15]

Exploring weight bal- ancing on long-tailed recognition problem.arXiv preprint arXiv:2305.16573, 2024

Naoya Hasegawa and Issei Sato. Exploring weight bal- ancing on long-tailed recognition problem.arXiv preprint arXiv:2305.16573, 2024. 7

work page arXiv 2024
[16]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. 5, 6, 7, 8, 12

2016
[17]

Distilling vir- tual examples for long-tailed recognition

Yin-Yin He, Jianxin Wu, and Xiu-Shen Wei. Distilling vir- tual examples for long-tailed recognition. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021. 7

2021
[18]

Algorithms efficiency measurement on imbalanced data using geometric mean and cross vali- dation

Mustakim Al Helal, Mohammad Salman Haydar, and Seraj Al Mahmud Mostafa. Algorithms efficiency measurement on imbalanced data using geometric mean and cross vali- dation. In2016 International Workshop on Computational Intelligence (IWCI), 2016. 2, 3

2016
[19]

Decoupling representation and classifier for long-tailed recognition,

Bingyi Kang, Saining Xie, Marcus Rohrbach, Zhicheng Yan, Albert Gordo, Jiashi Feng, and Yannis Kalantidis. Decoupling representation and classifier for long-tailed recognition.arXiv preprint arXiv:1910.09217, 2019. 1, 2, 4, 7

work page arXiv 1910
[20]

Adjusting decision boundary for class imbalanced learning.IEEE Access, 8:81674–81685,

Byungju Kim and Junmo Kim. Adjusting decision boundary for class imbalanced learning.IEEE Access, 8:81674–81685,
[21]

Learning multiple lay- ers of features from tiny images.Master’s thesis, University of Tront, 2009

Alex Krizhevsky and Geoffrey Hinton. Learning multiple lay- ers of features from tiny images.Master’s thesis, University of Tront, 2009. 5, 8, 11

2009
[22]

Guillaume Lemaˆıtre, Fernando Nogueira, and Christos K. Aridas. Imbalanced-learn: A python toolbox to tackle the curse of imbalanced datasets in machine learning.Journal of Machine Learning Research, 18(17):1–5, 2017. 1, 2

2017
[23]

Jinyan Li, Simon Fong, Sabah Mohammed, Jinan Fiaidhi, Qian Chen, and Zhen Tan. Solving the under-fitting problem for decision tree algorithms by incremental swarm optimiza- tion in rare-event healthcare classification.Journal of Medical Imaging and Health Informatics, 6(4):1102–1110, 2016. 2, 3

2016
[24]

Wong, and Victor W

Jinyan Li, Simon Fong, Raymond K. Wong, and Victor W. Chu. Adaptive multi-objective swarm fusion for imbalanced data classification.Information Fusion, 39:1–24, 2018. 2, 3

2018
[25]

Long-tailed visual recognition via gaussian clouded logit adjustment

Mengke Li, Yiu-ming Cheung, and Yang Lu. Long-tailed visual recognition via gaussian clouded logit adjustment. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), 2022. 1, 2

2022
[26]

Metasaug: Meta semantic augmentation for long-tailed visual recognition

Shuang Li, Kaixiong Gong, Chi Harold Liu, Yulin Wang, Feng Qiao, and Xinjing Cheng. Metasaug: Meta semantic augmentation for long-tailed visual recognition. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021. 1, 2

2021
[27]

Focal loss for dense object detection

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollar. Focal loss for dense object detection. InPro- ceedings of the IEEE International Conference on Computer Vision (ICCV), 2017. 1, 2, 7 9

2017
[28]

Exploratory undersampling for class-imbalance learning.IEEE Transac- tions on Systems, Man, and Cybernetics, Part B (Cybernetics), 39(2):539–550, 2009

Xu-Ying Liu, Jianxin Wu, and Zhi-Hua Zhou. Exploratory undersampling for class-imbalance learning.IEEE Transac- tions on Systems, Man, and Cybernetics, Part B (Cybernetics), 39(2):539–550, 2009. 1, 2

2009
[29]

Ziwei Liu, Zhongqi Miao, Xiaohang Zhan, Jiayun Wang, Bo- qing Gong, and Stella X. Yu. Large-scale long-tailed recogni- tion in an open world. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition (CVPR),
[30]

SGDR: Stochastic Gradient Descent with Warm Restarts

Ilya Loshchilov and Frank Hutter. SGDR: stochastic gradient descent with restarts.arXiv preprint arXiv:1608.03983, 2016. 6, 12

work page internal anchor Pith review Pith/arXiv arXiv 2016
[31]

Long-tail learning via logit adjustment

Aditya Krishna Menon, Sadeep Jayasumana, Ankit Singh Rawat, Himanshu Jain, Andreas Veit, and Sanjiv Kumar. Long-tail learning via logit adjustment. InInternational Con- ference on Learning Representations, 2021. 7

2021
[32]

The majority can help the minority: Context-rich minority oversampling for long-tailed classi- fication

Seulki Park, Youngkyu Hong, Byeongho Heo, Sangdoo Yun, and Jin Young Choi. The majority can help the minority: Context-rich minority oversampling for long-tailed classi- fication. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022. 1, 2, 7

2022
[33]

Feature directions matter: Long-tailed learning via rotated balanced representa- tion

Gao Peifeng, Qianqian Xu, Peisong Wen, Zhiyong Yang, Huiyang Shao, and Qingming Huang. Feature directions matter: Long-tailed learning via rotated balanced representa- tion. InProceedings of the 40th International Conference on Machine Learning, 2023. 7

2023
[34]

Escaping saddle points for effective gen- eralization on class-imbalanced data

Harsh Rangwani, Sumukh K Aithal, Mayank Mishra, and Venkatesh Babu R. Escaping saddle points for effective gen- eralization on class-imbalanced data. InAdvances in Neural Information Processing Systems, 2022. 1, 2, 7

2022
[35]

Balanced meta-softmax for long- tailed visual recognition

Jiawei Ren, Cunjun Yu, shunan sheng, Xiao Ma, Haiyu Zhao, Shuai Yi, and hongsheng Li. Balanced meta-softmax for long- tailed visual recognition. InAdvances in Neural Information Processing Systems, 2020. 1, 2

2020
[36]

Diffult: How to make diffusion model useful for long-tail recognition

Jie Shao, Ke Zhu, Hanxiao Zhang, and Jianxin Wu. Diffult: How to make diffusion model useful for long-tail recognition. arXiv preprint arXiv:2403.05170, 2024. 1, 2, 7

work page arXiv 2024
[37]

How re-sampling helps for long-tail learning? InAdvances in Neural Information Processing Systems, 2023

Jiang-Xin Shi, Tong Wei, Yuke Xiang, and Yu-Feng Li. How re-sampling helps for long-tail learning? InAdvances in Neural Information Processing Systems, 2023. 1, 2, 7

2023
[38]

The inaturalist species classification and detection dataset

Grant Van Horn, Oisin Mac Aodha, Yang Song, Yin Cui, Chen Sun, Alex Shepard, Hartwig Adam, Pietro Perona, and Serge Belongie. The inaturalist species classification and detection dataset. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. 5, 11

2018
[39]

Long-tailed recognition by routing diverse distribution-aware experts

Xudong Wang, Long Lian, Zhongqi Miao, Ziwei Liu, and Stella Yu. Long-tailed recognition by routing diverse distribution-aware experts. InInternational Conference on Learning Representations, 2021. 1, 2, 7

2021
[40]

Margin calibration for long-tailed visual recognition

Yidong Wang, Bowen Zhang, Wenxin Hou, Zhen Wu, Jin- dong Wang, and Takahiro Shinozaki. Margin calibration for long-tailed visual recognition. InProceedings of The 14th Asian Conference on Machine Learning, 2023. 7

2023
[41]

A unified generalization analysis of re-weighting and logit-adjustment for imbalanced learning

Zitai Wang, Qianqian Xu, Zhiyong Yang, Yuan He, Xiaochun Cao, and Qingming Huang. A unified generalization analysis of re-weighting and logit-adjustment for imbalanced learning. InAdvances in Neural Information Processing Systems, 2023. 7

2023
[42]

Aggregated residual transformations for deep neural networks

Saining Xie, Ross Girshick, Piotr Dollar, Zhuowen Tu, and Kaiming He. Aggregated residual transformations for deep neural networks. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. 7, 12

2017
[43]

Class-balanced regularization for long-tailed recognition.Neural Processing Letters, 56 (128):1–18, 2024

Yuge Xu and Chuanlong Lyu. Class-balanced regularization for long-tailed recognition.Neural Processing Letters, 56 (128):1–18, 2024. 1, 2, 3, 4

2024
[44]

Identifying and compensating for feature deviation in imbalanced deep learning.arXiv preprint arXiv:2001.01385,

Han-Jia Ye, Hong-You Chen, De-Chuan Zhan, and Wei-Lun Chao. Identifying and compensating for feature deviation in imbalanced deep learning.arXiv preprint arXiv:2001.01385,

work page arXiv 2001
[45]

Procrustean training for imbalanced deep learning

Han-Jia Ye, De-Chuan Zhan, and Wei-Lun Chao. Procrustean training for imbalanced deep learning. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021. 2, 3

2021
[46]

Bag of tricks for long-tailed visual recognition with deep convolutional neural networks

Yongshun Zhang, Xiu-Shen Wei, Boyan Zhou, and Jianxin Wu. Bag of tricks for long-tailed visual recognition with deep convolutional neural networks. InProceedings of the AAAI Conference on Artificial Intelligence, 2021. 7

2021
[47]

Self-supervised aggregation of diverse experts for test- agnostic long-tailed recognition

Yifan Zhang, Bryan Hooi, Lanqing Hong, and Jiashi Feng. Self-supervised aggregation of diverse experts for test- agnostic long-tailed recognition. InAdvances in Neural In- formation Processing Systems, 2022. 1, 2

2022
[48]

Deep long-tailed learning: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(9):10795–10816, 2023

Yifan Zhang, Bingyi Kang, Bryan Hooi, Shuicheng Yan, and Jiashi Feng. Deep long-tailed learning: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(9):10795–10816, 2023. 1

2023
[49]

Improv- ing calibration for long-tailed recognition

Zhisheng Zhong, Jiequan Cui, Shu Liu, and Jiaya Jia. Improv- ing calibration for long-tailed recognition. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021. 1, 2, 4, 6, 7, 12

2021
[50]

Many” (nk > 100), “Medium

Boyan Zhou, Quan Cui, Xiu-Shen Wei, and Zhao-Min Chen. Bbn: Bilateral-branch network with cumulative learning for long-tailed visual recognition. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020. 1, 2, 7 10 Why Not Hyperparameter-Friendly Optimisation? A Monotonic Adaptive Norm Rescaling Approach For Long-Ta...

2020
[51]

backbone, employing GLMC for first-stage training and SAMN for second-stage enhancement. Following [12], on ImageNet-LT, the first stage is trained for 135 epochs (batch size = 128, weight decay = 2e-4, learning rate = 0.1), and the second stage is finetuned for 20 epochs (batch size = 512, learning rate = 1e-5). On iNaturalist2018, the first stage is tra...

[1] [1]

CUDA: Curriculum of data augmentation for long-tailed recognition

Sumyeong Ahn, Jongwoo Ko, and Se-Young Yun. CUDA: Curriculum of data augmentation for long-tailed recognition. InThe Eleventh International Conference on Learning Repre- sentations, 2023. 1, 2, 7

2023

[2] [2]

Long-tailed recognition via weight balancing

Shaden Alshammari, Yu-Xiong Wang, Deva Ramanan, and Shu Kong. Long-tailed recognition via weight balancing. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022. 1, 2, 3, 4, 7

2022

[3] [3]

Deep over-sampling frame- work for classifying imbalanced data

Shin Ando and Chun Yuan Huang. Deep over-sampling frame- work for classifying imbalanced data. InMachine Learning and Knowledge Discovery in Databases, 2017. 1, 2

2017

[4] [4]

Miriam Ayer, H. D. Brunk, G. M. Ewing, W. T. Reid, and Edward Silverman. An empirical distribution function for sampling with incomplete information.The Annals of Mathe- matical Statistics, 26(4):641–647, 1955. 2, 4

1955

[5] [5]

Learning imbalanced datasets with label- distribution-aware margin loss

Kaidi Cao, Colin Wei, Adrien Gaidon, Nikos Arechiga, and Tengyu Ma. Learning imbalanced datasets with label- distribution-aware margin loss. InAdvances in Neural Infor- mation Processing Systems, 2019. 7

2019

[6] [6]

Chawla, Kevin W

Nitesh V . Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. Smote: synthetic minority over- sampling technique.Journal of Artificial Intelligence Re- search, 16(1):321–357, 2002. 1, 2

2002

[7] [7]

Area: Adaptive reweighting via effective area for long-tailed classification

Xiaohua Chen, Yucan Zhou, Dayan Wu, Chule Yang, Bo Li, Qinghua Hu, and Weiping Wang. Area: Adaptive reweighting via effective area for long-tailed classification. InProceed- ings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023. 7

2023

[8] [8]

Fea- ture space augmentation for long-tailed data.arXiv preprint arXiv:2008.03673, 2020

Peng Chu, Xiao Bian, Shaopeng Liu, and Haibin Ling. Fea- ture space augmentation for long-tailed data.arXiv preprint arXiv:2008.03673, 2020. 1, 2

work page arXiv 2008

[9] [9]

Class-balanced loss based on effective number of samples

Yin Cui, Menglin Jia, Tsung-Yi Lin, Yang Song, and Serge Belongie. Class-balanced loss based on effective number of samples. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019. 1, 2

2019

[10] [10]

Inverse weight-balancing for deep long-tailed learn- ing

Wenqi Dang, Zhou Yang, Weisheng Dong, Xin Li, and Guang- ming Shi. Inverse weight-balancing for deep long-tailed learn- ing. InProceedings of the AAAI Conference on Artificial Intelligence, 2024. 1, 2, 3, 4, 7

2024

[11] [11]

Imagenet: A large-scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009. 12

2009

[12] [12]

Global and local mixture consistency cumulative learning for long-tailed visual recognitions

Fei Du, Peng Yang, Qi Jia, Fengtao Nan, Xiaoting Chen, and Yun Yang. Global and local mixture consistency cumulative learning for long-tailed visual recognitions. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023. 1, 2, 4, 7, 12

2023

[13] [13]

Adaptive subgra- dient methods for online learning and stochastic optimization

John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgra- dient methods for online learning and stochastic optimization. The Journal of Machine Learning Research, page 2121–2159,

[14] [14]

Comparing biases for minimal network construction with back-propagation

Stephen Hanson and Lorien Pratt. Comparing biases for minimal network construction with back-propagation. In Advances in Neural Information Processing Systems, 1988. 5, 6

1988

[15] [15]

Exploring weight bal- ancing on long-tailed recognition problem.arXiv preprint arXiv:2305.16573, 2024

Naoya Hasegawa and Issei Sato. Exploring weight bal- ancing on long-tailed recognition problem.arXiv preprint arXiv:2305.16573, 2024. 7

work page arXiv 2024

[16] [16]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. 5, 6, 7, 8, 12

2016

[17] [17]

Distilling vir- tual examples for long-tailed recognition

Yin-Yin He, Jianxin Wu, and Xiu-Shen Wei. Distilling vir- tual examples for long-tailed recognition. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021. 7

2021

[18] [18]

Algorithms efficiency measurement on imbalanced data using geometric mean and cross vali- dation

Mustakim Al Helal, Mohammad Salman Haydar, and Seraj Al Mahmud Mostafa. Algorithms efficiency measurement on imbalanced data using geometric mean and cross vali- dation. In2016 International Workshop on Computational Intelligence (IWCI), 2016. 2, 3

2016

[19] [19]

Decoupling representation and classifier for long-tailed recognition,

Bingyi Kang, Saining Xie, Marcus Rohrbach, Zhicheng Yan, Albert Gordo, Jiashi Feng, and Yannis Kalantidis. Decoupling representation and classifier for long-tailed recognition.arXiv preprint arXiv:1910.09217, 2019. 1, 2, 4, 7

work page arXiv 1910

[20] [20]

Adjusting decision boundary for class imbalanced learning.IEEE Access, 8:81674–81685,

Byungju Kim and Junmo Kim. Adjusting decision boundary for class imbalanced learning.IEEE Access, 8:81674–81685,

[21] [21]

Learning multiple lay- ers of features from tiny images.Master’s thesis, University of Tront, 2009

Alex Krizhevsky and Geoffrey Hinton. Learning multiple lay- ers of features from tiny images.Master’s thesis, University of Tront, 2009. 5, 8, 11

2009

[22] [22]

Guillaume Lemaˆıtre, Fernando Nogueira, and Christos K. Aridas. Imbalanced-learn: A python toolbox to tackle the curse of imbalanced datasets in machine learning.Journal of Machine Learning Research, 18(17):1–5, 2017. 1, 2

2017

[23] [23]

Jinyan Li, Simon Fong, Sabah Mohammed, Jinan Fiaidhi, Qian Chen, and Zhen Tan. Solving the under-fitting problem for decision tree algorithms by incremental swarm optimiza- tion in rare-event healthcare classification.Journal of Medical Imaging and Health Informatics, 6(4):1102–1110, 2016. 2, 3

2016

[24] [24]

Wong, and Victor W

Jinyan Li, Simon Fong, Raymond K. Wong, and Victor W. Chu. Adaptive multi-objective swarm fusion for imbalanced data classification.Information Fusion, 39:1–24, 2018. 2, 3

2018

[25] [25]

Long-tailed visual recognition via gaussian clouded logit adjustment

Mengke Li, Yiu-ming Cheung, and Yang Lu. Long-tailed visual recognition via gaussian clouded logit adjustment. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), 2022. 1, 2

2022

[26] [26]

Metasaug: Meta semantic augmentation for long-tailed visual recognition

Shuang Li, Kaixiong Gong, Chi Harold Liu, Yulin Wang, Feng Qiao, and Xinjing Cheng. Metasaug: Meta semantic augmentation for long-tailed visual recognition. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021. 1, 2

2021

[27] [27]

Focal loss for dense object detection

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollar. Focal loss for dense object detection. InPro- ceedings of the IEEE International Conference on Computer Vision (ICCV), 2017. 1, 2, 7 9

2017

[28] [28]

Exploratory undersampling for class-imbalance learning.IEEE Transac- tions on Systems, Man, and Cybernetics, Part B (Cybernetics), 39(2):539–550, 2009

Xu-Ying Liu, Jianxin Wu, and Zhi-Hua Zhou. Exploratory undersampling for class-imbalance learning.IEEE Transac- tions on Systems, Man, and Cybernetics, Part B (Cybernetics), 39(2):539–550, 2009. 1, 2

2009

[29] [29]

Ziwei Liu, Zhongqi Miao, Xiaohang Zhan, Jiayun Wang, Bo- qing Gong, and Stella X. Yu. Large-scale long-tailed recogni- tion in an open world. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition (CVPR),

[30] [30]

SGDR: Stochastic Gradient Descent with Warm Restarts

Ilya Loshchilov and Frank Hutter. SGDR: stochastic gradient descent with restarts.arXiv preprint arXiv:1608.03983, 2016. 6, 12

work page internal anchor Pith review Pith/arXiv arXiv 2016

[31] [31]

Long-tail learning via logit adjustment

Aditya Krishna Menon, Sadeep Jayasumana, Ankit Singh Rawat, Himanshu Jain, Andreas Veit, and Sanjiv Kumar. Long-tail learning via logit adjustment. InInternational Con- ference on Learning Representations, 2021. 7

2021

[32] [32]

The majority can help the minority: Context-rich minority oversampling for long-tailed classi- fication

Seulki Park, Youngkyu Hong, Byeongho Heo, Sangdoo Yun, and Jin Young Choi. The majority can help the minority: Context-rich minority oversampling for long-tailed classi- fication. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022. 1, 2, 7

2022

[33] [33]

Feature directions matter: Long-tailed learning via rotated balanced representa- tion

Gao Peifeng, Qianqian Xu, Peisong Wen, Zhiyong Yang, Huiyang Shao, and Qingming Huang. Feature directions matter: Long-tailed learning via rotated balanced representa- tion. InProceedings of the 40th International Conference on Machine Learning, 2023. 7

2023

[34] [34]

Escaping saddle points for effective gen- eralization on class-imbalanced data

Harsh Rangwani, Sumukh K Aithal, Mayank Mishra, and Venkatesh Babu R. Escaping saddle points for effective gen- eralization on class-imbalanced data. InAdvances in Neural Information Processing Systems, 2022. 1, 2, 7

2022

[35] [35]

Balanced meta-softmax for long- tailed visual recognition

Jiawei Ren, Cunjun Yu, shunan sheng, Xiao Ma, Haiyu Zhao, Shuai Yi, and hongsheng Li. Balanced meta-softmax for long- tailed visual recognition. InAdvances in Neural Information Processing Systems, 2020. 1, 2

2020

[36] [36]

Diffult: How to make diffusion model useful for long-tail recognition

Jie Shao, Ke Zhu, Hanxiao Zhang, and Jianxin Wu. Diffult: How to make diffusion model useful for long-tail recognition. arXiv preprint arXiv:2403.05170, 2024. 1, 2, 7

work page arXiv 2024

[37] [37]

How re-sampling helps for long-tail learning? InAdvances in Neural Information Processing Systems, 2023

Jiang-Xin Shi, Tong Wei, Yuke Xiang, and Yu-Feng Li. How re-sampling helps for long-tail learning? InAdvances in Neural Information Processing Systems, 2023. 1, 2, 7

2023

[38] [38]

The inaturalist species classification and detection dataset

Grant Van Horn, Oisin Mac Aodha, Yang Song, Yin Cui, Chen Sun, Alex Shepard, Hartwig Adam, Pietro Perona, and Serge Belongie. The inaturalist species classification and detection dataset. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. 5, 11

2018

[39] [39]

Long-tailed recognition by routing diverse distribution-aware experts

Xudong Wang, Long Lian, Zhongqi Miao, Ziwei Liu, and Stella Yu. Long-tailed recognition by routing diverse distribution-aware experts. InInternational Conference on Learning Representations, 2021. 1, 2, 7

2021

[40] [40]

Margin calibration for long-tailed visual recognition

Yidong Wang, Bowen Zhang, Wenxin Hou, Zhen Wu, Jin- dong Wang, and Takahiro Shinozaki. Margin calibration for long-tailed visual recognition. InProceedings of The 14th Asian Conference on Machine Learning, 2023. 7

2023

[41] [41]

A unified generalization analysis of re-weighting and logit-adjustment for imbalanced learning

Zitai Wang, Qianqian Xu, Zhiyong Yang, Yuan He, Xiaochun Cao, and Qingming Huang. A unified generalization analysis of re-weighting and logit-adjustment for imbalanced learning. InAdvances in Neural Information Processing Systems, 2023. 7

2023

[42] [42]

Aggregated residual transformations for deep neural networks

Saining Xie, Ross Girshick, Piotr Dollar, Zhuowen Tu, and Kaiming He. Aggregated residual transformations for deep neural networks. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. 7, 12

2017

[43] [43]

Class-balanced regularization for long-tailed recognition.Neural Processing Letters, 56 (128):1–18, 2024

Yuge Xu and Chuanlong Lyu. Class-balanced regularization for long-tailed recognition.Neural Processing Letters, 56 (128):1–18, 2024. 1, 2, 3, 4

2024

[44] [44]

Identifying and compensating for feature deviation in imbalanced deep learning.arXiv preprint arXiv:2001.01385,

Han-Jia Ye, Hong-You Chen, De-Chuan Zhan, and Wei-Lun Chao. Identifying and compensating for feature deviation in imbalanced deep learning.arXiv preprint arXiv:2001.01385,

work page arXiv 2001

[45] [45]

Procrustean training for imbalanced deep learning

Han-Jia Ye, De-Chuan Zhan, and Wei-Lun Chao. Procrustean training for imbalanced deep learning. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021. 2, 3

2021

[46] [46]

Bag of tricks for long-tailed visual recognition with deep convolutional neural networks

Yongshun Zhang, Xiu-Shen Wei, Boyan Zhou, and Jianxin Wu. Bag of tricks for long-tailed visual recognition with deep convolutional neural networks. InProceedings of the AAAI Conference on Artificial Intelligence, 2021. 7

2021

[47] [47]

Self-supervised aggregation of diverse experts for test- agnostic long-tailed recognition

Yifan Zhang, Bryan Hooi, Lanqing Hong, and Jiashi Feng. Self-supervised aggregation of diverse experts for test- agnostic long-tailed recognition. InAdvances in Neural In- formation Processing Systems, 2022. 1, 2

2022

[48] [48]

Deep long-tailed learning: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(9):10795–10816, 2023

Yifan Zhang, Bingyi Kang, Bryan Hooi, Shuicheng Yan, and Jiashi Feng. Deep long-tailed learning: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(9):10795–10816, 2023. 1

2023

[49] [49]

Improv- ing calibration for long-tailed recognition

Zhisheng Zhong, Jiequan Cui, Shu Liu, and Jiaya Jia. Improv- ing calibration for long-tailed recognition. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021. 1, 2, 4, 6, 7, 12

2021

[50] [50]

Many” (nk > 100), “Medium

Boyan Zhou, Quan Cui, Xiu-Shen Wei, and Zhao-Min Chen. Bbn: Bilateral-branch network with cumulative learning for long-tailed visual recognition. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020. 1, 2, 7 10 Why Not Hyperparameter-Friendly Optimisation? A Monotonic Adaptive Norm Rescaling Approach For Long-Ta...

2020

[51] [51]

backbone, employing GLMC for first-stage training and SAMN for second-stage enhancement. Following [12], on ImageNet-LT, the first stage is trained for 135 epochs (batch size = 128, weight decay = 2e-4, learning rate = 0.1), and the second stage is finetuned for 20 epochs (batch size = 512, learning rate = 1e-5). On iNaturalist2018, the first stage is tra...