Rethinking Loss Reweighting for Imbalance Learning as an Inverse Problem: A Neural Collapse Point of View

Jinping Wang; Zhiqiang Gao; Zhiwu Xie; Zixin Tong

arxiv: 2605.10047 · v1 · submitted 2026-05-11 · 💻 cs.LG · cs.AI

Rethinking Loss Reweighting for Imbalance Learning as an Inverse Problem: A Neural Collapse Point of View

Jinping Wang , Zixin Tong , Zhiwu Xie , Zhiqiang Gao This is my paper

Pith reviewed 2026-05-12 02:11 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords loss reweightinglong-tailed classificationneural collapseinverse problemimbalanced learningequiangular tight frameclass imbalance

0 comments

The pith

Loss reweighting for long-tailed classification is reframed as an inverse problem that infers dynamic class weights to equalize per-class losses using neural collapse geometry.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that existing reweighting methods for imbalanced data rely on heuristics without a clear target. Drawing on the terminal geometry that appears under neural collapse, it identifies equal average loss across classes as a well-specified objective. Formulating the search for weights that achieve this equality as an inverse problem produces a dynamic inference procedure for the weights. A sympathetic reader would care because the approach replaces ad-hoc adjustments with a geometry-driven objective that reduces measured loss imbalance and raises accuracy on standard long-tailed benchmarks.

Core claim

Based on the ideal equal loss objective suggested by the simplex Equiangular Tight Frame terminal geometry of Neural Collapse, the authors formulate loss reweighting as an inverse problem and introduce a strategy that dynamically infers class weights to achieve this objective, resulting in reduced loss imbalance and improved performance on long-tailed datasets.

What carries the argument

Inverse-view reweighting strategy that solves for class weights to match the equal per-class average loss target derived from Neural Collapse geometry.

Load-bearing premise

The equal per-class average loss implied by Neural Collapse's ideal simplex Equiangular Tight Frame geometry is both a desirable and attainable target for reweighting.

What would settle it

If experiments on standard long-tailed datasets show that the inferred weights fail to equalize per-class average losses or do not outperform existing baselines on accuracy, the inverse-problem formulation would be falsified.

Figures

Figures reproduced from arXiv: 2605.10047 by Jinping Wang, Zhiqiang Gao, Zhiwu Xie, Zixin Tong.

**Figure 1.** Figure 1: Toy examples with 2-dimensional features and 3 classes to illustrate the feature learning of classification. Black crosses are the class means, and gray lines are the classifier weights. Aiming to improve the generalization ability and learn better representations under such imbalanced scenarios, diverse long-tailed methods emerged (Cui et al., 2019a; Menon et al., 2020; Kang et al., 2019). For example, da… view at source ↗

**Figure 2.** Figure 2: The evolution of NC1-NC3 metrics and class-wise imbalance coefficients across different loss functions on the CIFAR100-LT dataset with the imbalance rate of 100. The third metric NC3 quantifies the ℓ2 distance between the normalized simplex ETF and the normalize matrix WM˙ : NC3 = [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: t-SNE visualizations of the cross-entropy baseline and our proposed reweighting on the CIFAR-10-LT with the imbalance factor of 50. Different dots denote the features, ⋆ and ▲ express the class means and classifier weights for each class, respectively. schemes that explicitly adjust class-wise (Cui et al., 2019a) or instance-wise (Lin et al., 2017) loss weights to prevent head classes from dominating train… view at source ↗

**Figure 4.** Figure 4: Sensitive analysis of hyper-parameters. (Left) Different γ with fixed α = 0 to control the macro-level compensation across mini-batches. (Right) Different α with fixed γ = 1 to control the strength of the Tikhonov regularization towards the prior weights w (0) in the inverse reweighting. G. Computational Cost Discussion Our method introduces only small runtime and memory overhead, enabled by a closed-form … view at source ↗

read the original abstract

Loss reweighting is a widely used strategy for long-tailed classification, but existing reweighting strategies often rely on heuristics and rarely define a well-specified target. Inspired by Neural Collapse (NC), the ideal simplex Equiangular Tight Frame (ETF) terminal geometry suggests equal per-class average loss as a reasonable target for reweighting. Based on the ideal equal loss objective, we consider loss reweighting as an inverse problem and propose an inverse-view reweighting strategy that infers class weights dynamically to match this ideal objective. Empirically, NC metrics suggest our method can effectively reduce the loss imbalance coefficient and closer alignment with NC geometry while consistently outperforming strong long-tailed baselines on different datasets.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript claims that loss reweighting for long-tailed classification can be reframed as an inverse problem whose target is equal per-class average loss, motivated by the ideal simplex Equiangular Tight Frame geometry from Neural Collapse. It proposes an inverse-view reweighting strategy that dynamically infers class weights to achieve this objective, with the abstract stating that the approach reduces the loss imbalance coefficient, improves alignment with Neural Collapse geometry, and outperforms strong baselines on different datasets.

Significance. If the inverse formulation can be rigorously derived and the reported gains hold with full experimental validation, the work would supply a principled, NC-grounded alternative to heuristic reweighting methods, potentially improving theoretical understanding and practical performance in imbalanced classification tasks.

major comments (2)

[Abstract] Abstract: no derivation, algorithm, or optimization procedure is supplied for solving the inverse problem or for dynamically inferring the class weights that match the equal-loss target; this absence is load-bearing because the central claim rests on the correctness and non-circularity of that solver with respect to the NC geometry used to define the target.
[Abstract] Abstract: the statements of reduced loss imbalance coefficient, closer NC alignment, and consistent outperformance are presented without datasets, baselines, quantitative metrics, error bars, or exclusion criteria, preventing evaluation of whether the empirical support for the method is adequate.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive comments. We agree that the abstract, as currently written, lacks sufficient detail on the method and empirical support, and we will revise it in the next version of the manuscript to address these issues.

read point-by-point responses

Referee: [Abstract] Abstract: no derivation, algorithm, or optimization procedure is supplied for solving the inverse problem or for dynamically inferring the class weights that match the equal-loss target; this absence is load-bearing because the central claim rests on the correctness and non-circularity of that solver with respect to the NC geometry used to define the target.

Authors: We acknowledge that the abstract does not contain the derivation, algorithm, or optimization procedure. The full manuscript develops the inverse formulation and the dynamic weight inference procedure, with discussion of its alignment to the NC simplex ETF target. To improve the abstract, we will add a brief description of the inverse-view reweighting strategy and the solver. revision: yes
Referee: [Abstract] Abstract: the statements of reduced loss imbalance coefficient, closer NC alignment, and consistent outperformance are presented without datasets, baselines, quantitative metrics, error bars, or exclusion criteria, preventing evaluation of whether the empirical support for the method is adequate.

Authors: We agree that the abstract summarizes the empirical claims without the supporting specifics. The full paper reports experiments on standard long-tailed datasets with comparisons to strong baselines, including the loss imbalance coefficient, NC metrics, accuracy improvements, and error bars. We will revise the abstract to include key quantitative results and the evaluation setup. revision: yes

standing simulated objections not resolved

The full derivation, algorithm, optimization procedure, and specific experimental details (datasets, baselines, metrics, error bars) are not present in the provided manuscript text, which contains only the abstract; therefore we cannot supply those details in this response.

Circularity Check

0 steps flagged

No circularity detectable from abstract alone

full rationale

Only the abstract is available, providing no equations, derivations, or internal steps that could be inspected for reduction to inputs by construction. The equal per-class loss target is motivated by citing established Neural Collapse results on simplex ETF geometry from prior external literature, which constitutes independent support rather than self-definition or self-citation load-bearing. The inverse-view reweighting strategy is described at a high level as a proposal to match this target, with no indication that the solver itself is fitted to or equivalent to the NC geometry by definition. No self-citation chains, ansatz smuggling, or renaming of known results appear in the text. The derivation chain therefore cannot be shown to collapse, warranting a score of 0.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The method rests on one domain assumption imported from Neural Collapse literature; no explicit free parameters or new entities are introduced in the abstract.

axioms (1)

domain assumption Ideal simplex ETF terminal geometry from Neural Collapse implies equal per-class average loss as a reasonable target for reweighting.
Explicitly stated as the inspiration for the equal-loss objective.

pith-pipeline@v0.9.0 · 5395 in / 1079 out tokens · 58614 ms · 2026-05-12T02:11:11.905513+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we consider loss reweighting as an inverse problem and propose an inverse-view reweighting strategy that infers class weights dynamically to match this ideal objective... w⋆_c(W) = (¯L Lc + α w0_c) / (Lc² + α)
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Theorem 3.1... every class has the same class-wise average loss L1(W)=...=LC(W) under NC1-NC3 simplex ETF

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

13 extracted references · 13 canonical work pages

[1]

Class- balanced loss based on effective number of samples

Cui, Y ., Jia, M., Lin, T.-Y ., Song, Y ., and Belongie, S. Class- balanced loss based on effective number of samples. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9268–9277, 2019a. Cui, Y ., Jia, M., Lin, T.-Y ., Song, Y ., and Belongie, S. Class- balanced loss based on effective number of samples. In Proceedi...

work page arXiv
[2]

Ali Heydari, Craig A

Heydari, A. A., Thompson, C. A., and Mehmood, A. Soft- adapt: Techniques for adaptive loss weighting of neural networks with multi-part loss functions.arXiv preprint arXiv:1912.12355,

work page arXiv 1912
[3]

Learning sam- ple reweighting for accuracy and adversarial robustness

Holtz, C., Weng, T.-W., and Mishne, G. Learning sam- ple reweighting for accuracy and adversarial robustness. arXiv preprint arXiv:2210.11513,

work page arXiv
[4]

Neural collapse for unconstrained feature model under cross-entropy loss with imbalanced data.arXiv preprint arXiv:2309.09725,

Hong, W. and Ling, S. Neural collapse for unconstrained feature model under cross-entropy loss with imbalanced data.arXiv preprint arXiv:2309.09725,

work page arXiv
[5]

Decoupling representa- tion and classifier for long-tailed recognition,

Kang, B., Xie, S., Rohrbach, M., Yan, Z., Gordo, A., Feng, J., and Kalantidis, Y . Decoupling representation and classifier for long-tailed recognition.arXiv preprint arXiv:1910.09217,

work page arXiv 1910
[6]

arXiv preprint arXiv:2505.01660 , year=

Li, S., Xu, Q., Yang, Z., Wang, Z., Zhang, L., Cao, X., and Huang, Q. Focal-sam: Focal sharpness-aware min- imization for long-tailed classification.arXiv preprint arXiv:2505.01660,

work page arXiv
[7]

and Yuan, X

Lin, F. and Yuan, X. Long-tailed recognition via information-preservable two-stage learning.arXiv preprint arXiv:2510.08836,

work page arXiv
[8]

Long-tail learning via logit adjustment

Menon, A. K., Jayasumana, S., Rawat, A. S., Jain, H., Veit, A., and Kumar, S. Long-tail learning via logit adjustment. arXiv preprint arXiv:2007.07314,

work page arXiv 2007
[9]

Wang, X., Lian, L., Miao, Z., Liu, Z., and Yu, S. X. Long- tailed recognition by routing diverse distribution-aware experts.arXiv preprint arXiv:2010.01809,

work page arXiv 2010
[10]

You are your own best teacher: Achieving centralized-level performance in federated learning un- der heterogeneous and long-tailed data.arXiv preprint arXiv:2503.06916,

Yan, S., Li, Z., Wu, C., Pang, M., Lu, Y ., Yan, Y ., and Wang, H. You are your own best teacher: Achieving centralized-level performance in federated learning un- der heterogeneous and long-tailed data.arXiv preprint arXiv:2503.06916,

work page arXiv
[11]

Following the notation we defined before, a sample(xi,c, yi,c) from class c, let h(xi,c)∈R p denote the last-layer feature, and let p(xi,c;W) = p1(xi,c;W),

aims to down-weight training samples that have a disproportionately large influence on the decision boundary. Following the notation we defined before, a sample(xi,c, yi,c) from class c, let h(xi,c)∈R p denote the last-layer feature, and let p(xi,c;W) = p1(xi,c;W), . . . , p C(xi,c;W) ⊤ be the softmax probability vector. We denote byy (c) ∈ {0,1} C the on...

work page 2021
[12]

Overall Loss.Combining both terms yields the Range Loss: LRange =αL intra +βL inter, where α and β control the relative importance of the two components

The inter-class range loss encourages class centers to be well separated by enforcing a margin on this minimum distance: Linter = max M−D center,0 , whereM >0is a margin hyper-parameter. Overall Loss.Combining both terms yields the Range Loss: LRange =αL intra +βL inter, where α and β control the relative importance of the two components. Following (Zhang...

work page 2017
[13]

By importance weighting, the target error can be written as an expectation under the training distribution with sample-wise weights

aims to minimize the expected risk under a target distribution that is more class-balanced than the long-tailed training distribution. By importance weighting, the target error can be written as an expectation under the training distribution with sample-wise weights. TCR decomposes this weight into a class-wisecomponent and aninstance-wisecomponent. 14 Re...

work page 2020

[1] [1]

Class- balanced loss based on effective number of samples

Cui, Y ., Jia, M., Lin, T.-Y ., Song, Y ., and Belongie, S. Class- balanced loss based on effective number of samples. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9268–9277, 2019a. Cui, Y ., Jia, M., Lin, T.-Y ., Song, Y ., and Belongie, S. Class- balanced loss based on effective number of samples. In Proceedi...

work page arXiv

[2] [2]

Ali Heydari, Craig A

Heydari, A. A., Thompson, C. A., and Mehmood, A. Soft- adapt: Techniques for adaptive loss weighting of neural networks with multi-part loss functions.arXiv preprint arXiv:1912.12355,

work page arXiv 1912

[3] [3]

Learning sam- ple reweighting for accuracy and adversarial robustness

Holtz, C., Weng, T.-W., and Mishne, G. Learning sam- ple reweighting for accuracy and adversarial robustness. arXiv preprint arXiv:2210.11513,

work page arXiv

[4] [4]

Neural collapse for unconstrained feature model under cross-entropy loss with imbalanced data.arXiv preprint arXiv:2309.09725,

Hong, W. and Ling, S. Neural collapse for unconstrained feature model under cross-entropy loss with imbalanced data.arXiv preprint arXiv:2309.09725,

work page arXiv

[5] [5]

Decoupling representa- tion and classifier for long-tailed recognition,

Kang, B., Xie, S., Rohrbach, M., Yan, Z., Gordo, A., Feng, J., and Kalantidis, Y . Decoupling representation and classifier for long-tailed recognition.arXiv preprint arXiv:1910.09217,

work page arXiv 1910

[6] [6]

arXiv preprint arXiv:2505.01660 , year=

Li, S., Xu, Q., Yang, Z., Wang, Z., Zhang, L., Cao, X., and Huang, Q. Focal-sam: Focal sharpness-aware min- imization for long-tailed classification.arXiv preprint arXiv:2505.01660,

work page arXiv

[7] [7]

and Yuan, X

Lin, F. and Yuan, X. Long-tailed recognition via information-preservable two-stage learning.arXiv preprint arXiv:2510.08836,

work page arXiv

[8] [8]

Long-tail learning via logit adjustment

Menon, A. K., Jayasumana, S., Rawat, A. S., Jain, H., Veit, A., and Kumar, S. Long-tail learning via logit adjustment. arXiv preprint arXiv:2007.07314,

work page arXiv 2007

[9] [9]

Wang, X., Lian, L., Miao, Z., Liu, Z., and Yu, S. X. Long- tailed recognition by routing diverse distribution-aware experts.arXiv preprint arXiv:2010.01809,

work page arXiv 2010

[10] [10]

You are your own best teacher: Achieving centralized-level performance in federated learning un- der heterogeneous and long-tailed data.arXiv preprint arXiv:2503.06916,

Yan, S., Li, Z., Wu, C., Pang, M., Lu, Y ., Yan, Y ., and Wang, H. You are your own best teacher: Achieving centralized-level performance in federated learning un- der heterogeneous and long-tailed data.arXiv preprint arXiv:2503.06916,

work page arXiv

[11] [11]

Following the notation we defined before, a sample(xi,c, yi,c) from class c, let h(xi,c)∈R p denote the last-layer feature, and let p(xi,c;W) = p1(xi,c;W),

aims to down-weight training samples that have a disproportionately large influence on the decision boundary. Following the notation we defined before, a sample(xi,c, yi,c) from class c, let h(xi,c)∈R p denote the last-layer feature, and let p(xi,c;W) = p1(xi,c;W), . . . , p C(xi,c;W) ⊤ be the softmax probability vector. We denote byy (c) ∈ {0,1} C the on...

work page 2021

[12] [12]

Overall Loss.Combining both terms yields the Range Loss: LRange =αL intra +βL inter, where α and β control the relative importance of the two components

The inter-class range loss encourages class centers to be well separated by enforcing a margin on this minimum distance: Linter = max M−D center,0 , whereM >0is a margin hyper-parameter. Overall Loss.Combining both terms yields the Range Loss: LRange =αL intra +βL inter, where α and β control the relative importance of the two components. Following (Zhang...

work page 2017

[13] [13]

By importance weighting, the target error can be written as an expectation under the training distribution with sample-wise weights

aims to minimize the expected risk under a target distribution that is more class-balanced than the long-tailed training distribution. By importance weighting, the target error can be written as an expectation under the training distribution with sample-wise weights. TCR decomposes this weight into a class-wisecomponent and aninstance-wisecomponent. 14 Re...

work page 2020