pith. sign in

arxiv: 2509.09926 · v5 · submitted 2025-09-12 · 💻 cs.LG · cs.CV

LoFT: Parameter-Efficient Fine-Tuning for Long-tailed Semi-Supervised Learning in Open-World Scenarios

Pith reviewed 2026-05-18 17:58 UTC · model grok-4.3

classification 💻 cs.LG cs.CV
keywords long-tailed semi-supervised learningfoundation modelsparameter-efficient fine-tuningopen-world scenariosgeneralization boundsbalanced posterior errorout-of-distribution samples
0
0 comments X

The pith

Fine-tuning foundation models reduces hypothesis complexity and tightens generalization bounds for long-tailed semi-supervised learning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to establish that switching to foundation model fine-tuning in long-tailed semi-supervised learning cuts down hypothesis complexity, which tightens the generalization bound and lowers the balanced posterior error. This also makes features more compact, shrinking the space where outliers are accepted and giving a geometric basis for robustness to noise and OOD data. Prior methods trained from scratch often suffer from overconfidence and bad pseudo-labels, but foundation models avoid this by starting from rich pre-trained representations. The work proposes the LoFT framework for efficient fine-tuning and LoFT-OW to handle open-world unlabeled data that may include out-of-distribution samples. If correct, this would mean better accuracy on imbalanced datasets with fewer labeled examples in the tails.

Core claim

Utilizing a foundation model significantly reduces the hypothesis complexity, which tightens the generalization bound and in turn minimizes the Balanced Posterior Error (BPE). The feature compactness of foundation models strictly compresses the acceptance region for outliers, providing a geometric guarantee for robustness. This insight leads to the LoFT framework for parameter-efficient fine-tuning in long-tailed semi-supervised learning and an extension LoFT-OW for open-world scenarios with potential OOD samples in unlabeled data.

What carries the argument

Theoretical proofs that foundation models reduce hypothesis complexity and compress outlier acceptance regions, which motivate the parameter-efficient fine-tuning in the LoFT method.

Load-bearing premise

The proofs assume that foundation models provide feature compactness and reduced hypothesis complexity that improve the bounds in long-tailed semi-supervised learning even in the presence of pseudo-label noise.

What would settle it

Finding that the balanced posterior error does not decrease or that the outlier acceptance region does not shrink when using a foundation model instead of training from scratch on long-tailed datasets would falsify the central claim.

Figures

Figures reproduced from arXiv: 2509.09926 by Bing Su, Jiahao Chen, Zhiyuan Huang.

Figure 1
Figure 1. Figure 1: Differences among supervised learning, semi [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The reliability diagrams on (a) ImageNet-LT and (b) Places365-LT based on training from scratch and PEFT, respec [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Illustration of the proposed LoFT-OW. H(p, q) denotes the cross-entropy. where a weakly augmented view is used to generate pseudo￾labels, and a strongly augmented view is used to obtain log￾its for optimization. To better handle uncertain predictions, we partition unlabeled samples into high-confidence and low-confidence subsets based on their Maximum Softmax Probability (MSP), and apply different optimiza… view at source ↗
Figure 4
Figure 4. Figure 4: Visualizations of unlabeled samples and their pre [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Ablation studies on hyper-parameter cu. The hor￾izontal axis represents the value of cu, and the vertical axis represents the accuracy. Compared to PEFT, LoFT consistently achieves higher ac￾curacy with both CLIP and OpenCLIP backbones, reaching 73.3% and 73.9%, respectively. These improvements over strong baselines and prior methods (e.g., FixMatch+CCL at 67.8%) highlight LoFT’s effectiveness beyond small… view at source ↗
read the original abstract

Long-tailed semi-supervised learning (LTSSL) presents a formidable challenge where models must overcome the scarcity of tail samples while mitigating the noise from unreliable pseudo-labels. Most prior LTSSL methods are designed to train models from scratch, which often leads to issues such as overconfidence and low-quality pseudo-labels. To address this problem, we first theoretically prove that utilizing a foundation model significantly reduces the hypothesis complexity, which tightens the generalization bound and in turn minimizes the Balanced Posterior Error (BPE). Furthermore, we demonstrate that the feature compactness of foundation models strictly compresses the acceptance region for outliers, providing a geometric guarantee for robustness. Motivated by these theoretical insights, we extend LTSSL into the foundation model fine-tuning paradigm and propose a novel framework: LoFT (Long-tailed semi-supervised learning via parameter-efficient Fine-Tuning). Furthermore, we explore a more practical setting by investigating semi-supervised learning under open-world conditions, where the unlabeled data may include out-of-distribution (OOD) samples.To handle this problem, we propose LoFT-OW (LoFT under Open-World scenarios) to improve the discriminative ability. Experimental results on multiple benchmarks demonstrate that our method achieves superior performance. Code is available: https://github.com/games-liker/LoFT

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper addresses long-tailed semi-supervised learning (LTSSL) in open-world settings by leveraging foundation models via parameter-efficient fine-tuning. It claims to theoretically prove that foundation models reduce hypothesis complexity (tightening generalization bounds and minimizing Balanced Posterior Error or BPE) and that their feature compactness strictly compresses the outlier acceptance region for geometric robustness. Motivated by this, it proposes the LoFT framework and its open-world extension LoFT-OW, with experiments showing superior performance on benchmarks; code is released.

Significance. If the theoretical claims hold after accounting for pseudo-label noise and tail imbalance, the work could meaningfully advance LTSSL by providing a foundation-model-based paradigm with explicit generalization and robustness guarantees. The release of code supports reproducibility, which strengthens the contribution if the experiments are detailed and the bounds are made rigorous.

major comments (2)
  1. [Theoretical Analysis / Abstract] Abstract and theoretical section: the central claim that foundation models reduce hypothesis complexity and tighten the generalization bound to minimize BPE does not incorporate a pseudo-label error term or tail-class imbalance. The derivations appear to treat the foundation-model properties as invariant to the semi-supervised fine-tuning objective; this is load-bearing because the actual regime uses noisy pseudo-labels on long-tailed data, so the claimed tightening may not transfer (see stress-test concern).
  2. [Theoretical Analysis] Geometric argument: the demonstration that feature compactness strictly compresses the acceptance region for outliers lacks any analysis of how this guarantee persists after parameter-efficient fine-tuning on imbalanced data that may contain OOD samples. Without an explicit error term or invariance proof, the robustness claim for LoFT-OW is not yet supported.
minor comments (2)
  1. [Experiments] Experiments section: the abstract states superior performance but provides no specific metrics, baselines, or error bars; these should be added with clear comparisons to prior LTSSL methods to substantiate the claims.
  2. [Notation / Theory] Notation: ensure BPE and related quantities are defined consistently and that any equations in the theoretical section are numbered for easy reference.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our work. We address each of the major comments below and indicate the revisions we will make to the manuscript.

read point-by-point responses
  1. Referee: [Theoretical Analysis / Abstract] Abstract and theoretical section: the central claim that foundation models reduce hypothesis complexity and tighten the generalization bound to minimize BPE does not incorporate a pseudo-label error term or tail-class imbalance. The derivations appear to treat the foundation-model properties as invariant to the semi-supervised fine-tuning objective; this is load-bearing because the actual regime uses noisy pseudo-labels on long-tailed data, so the claimed tightening may not transfer (see stress-test concern).

    Authors: We appreciate this observation. Our theoretical analysis focuses on the reduction in hypothesis complexity provided by the foundation model itself, which serves as a foundation for the subsequent fine-tuning. The parameter-efficient fine-tuning approach is designed to maintain proximity to the pre-trained model, thereby preserving the benefits in the generalization bound. However, we acknowledge that explicitly incorporating a pseudo-label error term and accounting for tail-class imbalance would provide a more complete picture. In the revised manuscript, we will extend the theoretical section to include a discussion of these factors and their impact on the balanced posterior error. We will also conduct additional stress tests to demonstrate the robustness of the bound under noisy pseudo-labels. revision: yes

  2. Referee: [Theoretical Analysis] Geometric argument: the demonstration that feature compactness strictly compresses the acceptance region for outliers lacks any analysis of how this guarantee persists after parameter-efficient fine-tuning on imbalanced data that may contain OOD samples. Without an explicit error term or invariance proof, the robustness claim for LoFT-OW is not yet supported.

    Authors: Thank you for highlighting this gap. The geometric argument is based on the inherent feature compactness of foundation models. To address how this persists after parameter-efficient fine-tuning, particularly under imbalance and potential OOD samples in the open-world setting, we will add an analysis in the revised version. This will include an invariance argument showing that the updates from fine-tuning do not substantially alter the compressed acceptance region, supported by an error term. This will bolster the theoretical support for LoFT-OW. revision: yes

Circularity Check

0 steps flagged

Theoretical proofs presented as self-contained first-principles results with no reduction to inputs or self-citations

full rationale

The paper states it 'first theoretically prove[s]' that foundation models reduce hypothesis complexity (tightening the generalization bound and minimizing BPE) and that feature compactness 'strictly compresses the acceptance region for outliers'. These are framed as derivations internal to the manuscript rather than outputs of a fit, a renamed empirical pattern, or a load-bearing self-citation chain. No equations or sections in the provided text exhibit a step where the claimed result is equivalent to its own inputs by construction, nor is any prior work by the same authors invoked to justify uniqueness or an ansatz. The derivation chain therefore remains independent of the target claims.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on domain assumptions about foundation model properties rather than new free parameters or invented entities; full details of the proofs and any implicit assumptions are not visible in the abstract.

axioms (2)
  • domain assumption Foundation models provide feature compactness that compresses outlier acceptance regions in LTSSL
    Invoked to provide geometric guarantee for robustness against unreliable pseudo-labels and OOD samples.
  • domain assumption Utilizing foundation models reduces hypothesis complexity and tightens generalization bounds for LTSSL
    Central to minimizing Balanced Posterior Error as stated in the theoretical analysis.

pith-pipeline@v0.9.0 · 5761 in / 1374 out tokens · 39499 ms · 2026-05-18T17:58:37.294759+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages · 6 internal anchors

  1. [1]

    , " * write output.state after.block = add.period write newline

    ENTRY address archivePrefix author booktitle chapter edition editor eid eprint howpublished institution isbn journal key month note number organization pages publisher school series title type volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.a...

  2. [2]

    write newline

    " write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

  3. [3]

    Chen, S.; Ge, C.; Tong, Z.; Wang, J.; Song, Y.; Wang, J.; and Luo, P. 2022. Adaptformer: Adapting vision transformers for scalable visual recognition. Advances in Neural Information Processing Systems, 35: 16664--16678

  4. [4]

    Cui, Y.; Jia, M.; Lin, T.-Y.; Song, Y.; and Belongie, S. 2019. Class-balanced loss based on effective number of samples. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 9268--9277

  5. [5]

    Dong, B.; Zhou, P.; Yan, S.; and Zuo, W. 2022. Lpt: Long-tailed prompt tuning for image classification. arXiv preprint arXiv:2210.01033

  6. [6]

    Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929

  7. [7]

    Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks

    Goodfellow, I. J.; Bulatov, Y.; Ibarz, J.; Arnoud, S.; and Shet, V. 2013. Multi-digit number recognition from street view imagery using deep convolutional neural networks. arXiv preprint arXiv:1312.6082

  8. [8]

    Guo, C.; Pleiss, G.; Sun, Y.; and Weinberger, K. Q. 2017. On calibration of modern neural networks. In International conference on machine learning, 1321--1330. PMLR

  9. [9]

    Hendrycks, D.; and Gimpel, K. 2016. A baseline for detecting misclassified and out-of-distribution examples in neural networks. arXiv preprint arXiv:1610.02136

  10. [10]

    Hendrycks, D.; Mazeika, M.; and Dietterich, T. 2018. Deep anomaly detection with outlier exposure. arXiv preprint arXiv:1812.04606

  11. [11]

    Hou, Y.; and Jia, Y. 2025. A Square Peg in a Square Hole: Meta-Expert for Long-Tailed Semi-Supervised Learning. arXiv preprint arXiv:2505.16341

  12. [12]

    Kang, B.; Xie, S.; Rohrbach, M.; Yan, Z.; Gordo, A.; Feng, J.; and Kalantidis, Y. 2019. Decoupling representation and classifier for long-tailed recognition. arXiv preprint arXiv:1910.09217

  13. [13]

    Krizhevsky, A.; Hinton, G.; et al. 2009. Learning multiple layers of features from tiny images

  14. [14]

    Le, Y.; and Yang, X. 2015. Tiny imagenet visual recognition challenge. CS 231N, 7(7): 3

  15. [15]

    Li, L.; Tao, B.; Han, L.; Zhan, D.-c.; and Ye, H.-j. 2024. Twice class bias correction for imbalanced semi-supervised learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, 13563--13571

  16. [16]

    Lin, T.-Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Doll \'a r, P.; and Zitnick, C. L. 2014. Microsoft coco: Common objects in context. In European conference on computer vision, 740--755. Springer

  17. [17]

    Liu, K.; Fu, Z.; Jin, S.; Chen, C.; Chen, Z.; Jiang, R.; Zhou, F.; Chen, Y.; and Ye, J. 2024. Rethinking out-of-distribution detection on imbalanced data distribution. Advances in Neural Information Processing Systems, 37: 109152--109176

  18. [18]

    Liu, Z.; Miao, Z.; Zhan, X.; Wang, J.; Gong, B.; and Yu, S. X. 2019. Large-scale long-tailed recognition in an open world. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2537--2546

  19. [19]

    Ma, C.; Elezi, I.; Deng, J.; Dong, W.; and Xu, C. 2024. Three heads are better than one: Complementary experts for long-tailed semi-supervised learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, 14229--14237

  20. [20]

    K.; Jayasumana, S.; Rawat, A

    Menon, A. K.; Jayasumana, S.; Rawat, A. S.; Jain, H.; Veit, A.; and Kumar, S. 2020. Long-tail learning via logit adjustment. arXiv preprint arXiv:2007.07314

  21. [21]

    Miao, W.; Pang, G.; Bai, X.; Li, T.; and Zheng, J. 2024. Out-of-distribution detection in long-tailed recognition with calibrated outlier class learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, 4216--4224

  22. [22]

    Ouali, Y.; Hudelot, C.; and Tami, M. 2020. An overview of deep semi-supervised learning. arXiv preprint arXiv:2006.05278

  23. [23]

    Peng, H.; Pian, W.; Sun, M.; and Li, P. 2023. Dynamic re-weighting for long-tailed semi-supervised learning. In Proceedings of the IEEE/CVF winter conference on applications of computer vision, 6464--6474

  24. [24]

    W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al

    Radford, A.; Kim, J. W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. 2021. Learning transferable visual models from natural language supervision. In International conference on machine learning, 8748--8763. PMLR

  25. [25]

    Sanchez Aimar, E.; Helgesen, N.; Xu, Y.; Kuhlmann, M.; and Felsberg, M. 2024. Flexible Distribution Alignment: Towards Long-Tailed Semi-supervised Learning with Proper Calibration. In European Conference on Computer Vision, 307--327. Springer

  26. [26]

    Sanchez Aimar, E.; Jonnarth, A.; Felsberg, M.; and Kuhlmann, M. 2023. Balanced Product of Calibrated Experts for Long-Tailed Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 19967--19977

  27. [27]

    Shi, J.-X.; Wei, T.; Zhou, Z.; Shao, J.-J.; Han, X.-Y.; and Li, Y.-F. 2024. Long-Tail Learning with Foundation Model: Heavy Fine-Tuning Hurts. In Forty-first International Conference on Machine Learning

  28. [28]

    A.; Cubuk, E

    Sohn, K.; Berthelot, D.; Carlini, N.; Zhang, Z.; Zhang, H.; Raffel, C. A.; Cubuk, E. D.; Kurakin, A.; and Li, C.-L. 2020. Fixmatch: Simplifying semi-supervised learning with consistency and confidence. Advances in neural information processing systems, 33: 596--608

  29. [29]

    Tian, C.; Wang, W.; Zhu, X.; Dai, J.; and Qiao, Y. 2022. Vl-ltr: Learning class-wise visual-linguistic representation for long-tailed visual recognition. In European Conference on Computer Vision, 73--91. Springer

  30. [30]

    E.; Cremers, D.; and Buettner, F

    Tomani, C.; Gruber, S.; Erdem, M. E.; Cremers, D.; and Buettner, F. 2021. Post-Hoc Uncertainty Calibration for Domain Drift Scenarios. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 10124--10132

  31. [31]

    Wei, C.; Sohn, K.; Mellina, C.; Yuille, A.; and Yang, F. 2021. Crest: A class-rebalancing self-training framework for imbalanced semi-supervised learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 10857--10866

  32. [32]

    Wei, T.; and Gan, K. 2023. Towards Realistic Long-Tailed Semi-Supervised Learning: Consistency Is All You Need. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 3469--3478

  33. [33]

    Xu, Z.; Chai, Z.; and Yuan, C. 2021. Towards calibrated model for long-tailed visual recognition from prior perspective. Advances in Neural Information Processing Systems, 34: 7139--7152

  34. [34]

    Yu, F.; Seff, A.; Zhang, Y.; Song, S.; Funkhouser, T.; and Xiao, J. 2015. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365

  35. [35]

    mixup: Beyond Empirical Risk Minimization

    Zhang, H.; Cisse, M.; Dauphin, Y. N.; and Lopez-Paz, D. 2017. mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412

  36. [36]

    Zheng, H.; Zhou, L.; Li, H.; Su, J.; Wei, X.; and Xu, X. 2024. Bem: Balanced and entropy-based mix for long-tailed semi-supervised learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 22893--22903

  37. [37]

    Zhong, Z.; Cui, J.; Liu, S.; and Jia, J. 2021. Improving calibration for long-tailed recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 16489--16498

  38. [38]

    Zhou, B.; Lapedriza, A.; Khosla, A.; Oliva, A.; and Torralba, A. 2017. Places: A 10 million Image Database for Scene Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence

  39. [39]

    Zhou, Z.-H.; Fang, S.; Zhou, Z.-J.; Wei, T.; Wan, Y.; and Zhang, M.-L. 2024. Continuous contrastive learning for long-tailed semi-supervised recognition. Advances in Neural Information Processing Systems, 37: 51411--51435