Decouple then Converge: Handling Unknown Unlabeled Distributions in Long-Tailed Semi-Supervised Learning
Pith reviewed 2026-05-23 23:55 UTC · model grok-4.3
The pith
Decoupling training into head-focused and tail-focused branches that converge handles unknown unlabeled distributions in long-tailed semi-supervised learning.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
DeCon decouples learning into two specialized branches: a standard branch that focuses on head classes and a balanced branch that focuses on tail classes. During training, the two branches interact and gradually converge, allowing them to complement each other and ultimately achieve strong performance across all classes.
What carries the argument
Two-branch architecture in which a standard branch and a balanced branch interact and converge during training.
If this is right
- When labeled and unlabeled class distributions mismatch, average test accuracy rises by 2.7 percentage points over existing algorithms.
- The method still outperforms many prior LTSSL algorithms even when labeled and unlabeled distributions are identical.
- Ablation results identify the branch interaction and convergence as the main drivers of the observed gains.
Where Pith is reading between the lines
- The same decoupling-plus-convergence pattern could be tested on other semi-supervised tasks that involve distribution shift between labeled and unlabeled sets.
- If the convergence step is removed, performance would likely drop most sharply on the most imbalanced classes.
- The method suggests that explicit branch specialization may be simpler than refining pseudo-labeling rules for handling unknown imbalance.
Load-bearing premise
The interaction between the two branches produces complementary gains without one branch dominating or destabilizing training.
What would settle it
On standard LTSSL benchmarks with mismatched labeled and unlabeled distributions, DeCon would be falsified if it failed to produce higher test accuracy than prior methods.
Figures
read the original abstract
While long-tailed semi-supervised learning (LTSSL) has attracted growing attention in many real-world classification tasks, existing LTSSL algorithms typically assume that labeled and unlabeled data share nearly identical class distributions. When this assumption is violated, these methods can perform poorly because they rely on biased model-generated pseudo-labels. To address this issue, we propose a simple yet effective approach called DeCon for LTSSL with unknown unlabeled class distributions. Specifically, DeCon decouples learning into two specialized branches: a standard branch that focuses on head classes and a balanced branch that focuses on tail classes. During training, the two branches interact and gradually converge, allowing them to complement each other and ultimately achieve strong performance across all classes. Despite its simplicity, we show that DeCon achieves state-of-the-art performance on a variety of standard LTSSL benchmarks, e.g., an averaged 2.7\% absolute increase in test accuracy against existing algorithms when the class distributions of labeled and unlabeled data are mismatched. Even when the class distributions are identical, DeCon consistently outperforms many sophisticated LTSSL algorithms. Furthermore, we conduct extensive ablation analyses to tease apart the factors that are the most important to the success of DeCon. The source code is available at \url{https://github.com/Gank0078/DeCon}.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes DeCon for long-tailed semi-supervised learning (LTSSL) under mismatched labeled/unlabeled class distributions. It decouples training into a standard branch (head-class focus) and a balanced branch (tail-class focus); the branches interact during training and converge to produce complementary predictions across all classes. The central empirical claim is state-of-the-art accuracy on standard LTSSL benchmarks, including a 2.7% average absolute gain versus prior methods on mismatched distributions and consistent outperformance even when distributions match. The manuscript supplies code and reports extensive ablations on interaction factors.
Significance. If the empirical results hold under the reported controls, the work is significant because it directly targets a practical failure mode of existing LTSSL methods (distribution mismatch) that is common in real data yet rarely handled explicitly. The two-branch decoupling-plus-convergence design is simple, the code release supports reproducibility, and the ablations provide evidence that the interaction mechanism is load-bearing rather than incidental.
minor comments (3)
- [§4] §4 (Experiments): the abstract states an 'averaged 2.7% absolute increase' but the main text should explicitly list the per-benchmark deltas, the number of random seeds, and whether the gains are statistically significant (e.g., via paired t-tests or reported standard deviations) so readers can assess robustness without consulting the code.
- [§3.2] §3.2 (Interaction mechanism): while the high-level description of branch interaction is clear, a short pseudocode block or explicit loss-term equation showing how gradients from the two branches are combined would eliminate any ambiguity about the precise coupling before convergence.
- [Tables 1-2] Table 1 and Table 2: ensure that the 'DeCon' rows are visually distinguished (e.g., bold or shaded) from the baselines so the claimed improvements are immediately readable.
Simulated Author's Rebuttal
We thank the referee for the constructive review and positive recommendation for minor revision. We are encouraged that the practical importance of handling distribution mismatch in LTSSL is recognized, along with the value of the two-branch design and code release. Since no specific major comments were listed in the report, we provide a general response below and stand ready to incorporate any additional feedback.
Circularity Check
No significant circularity
full rationale
The paper describes an algorithmic procedure (decoupling into standard and balanced branches that interact during training) for long-tailed semi-supervised learning, with performance claims resting entirely on empirical benchmark results, ablations, and released code rather than any derivation chain, equations, or fitted parameters presented as predictions. No load-bearing steps reduce to self-definition, self-citation, or renaming; the central claims are externally falsifiable via the reported experiments on mismatched and matched distributions.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Deep residual learning for image recognition,
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778
work page 2016
-
[2]
Imagenet classifi- cation with deep convolutional neural networks,
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classifi- cation with deep convolutional neural networks,” Communications of the ACM, vol. 60, no. 6, pp. 84–90, 2017
work page 2017
-
[3]
Deep speech 2: End-to-end speech recognition in english and mandarin,
D. Amodei, S. Ananthanarayanan, R. Anubhai, J. Bai, E. Batten- berg, C. Case, J. Casper, B. Catanzaro, Q. Cheng, G. Chen et al. , “Deep speech 2: End-to-end speech recognition in english and mandarin,” in International conference on machine learning. PMLR, 2016, pp. 173–182
work page 2016
-
[4]
A. Tarvainen and H. Valpola, “Mean teachers are better role mod- els: Weight-averaged consistency targets improve semi-supervised deep learning results,” Advances in Neural Information Processing Systems, vol. 30, pp. 1195–1204, 2017
work page 2017
-
[5]
T. Miyato, S.-i. Maeda, M. Koyama, and S. Ishii, “Virtual adver- sarial training: a regularization method for supervised and semi- supervised learning,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 8, pp. 1979–1993, 2018
work page 1979
-
[6]
Mixmatch: A holistic approach to semi- supervised learning,
D. Berthelot, N. Carlini, I. Goodfellow, N. Papernot, A. Oliver, and C. A. Raffel, “Mixmatch: A holistic approach to semi- supervised learning,” Advances in Neural Information Processing Systems, vol. 32, pp. 5050–5060, 2019
work page 2019
-
[7]
Fixmatch: Simplifying semi-supervised learning with consistency and confidence,
K. Sohn, D. Berthelot, N. Carlini, Z. Zhang, H. Zhang, C. A. Raffel, E. D. Cubuk, A. Kurakin, and C.-L. Li, “Fixmatch: Simplifying semi-supervised learning with consistency and confidence,” Ad- vances in Neural Information Processing Systems, vol. 33, pp. 596–608, 2020
work page 2020
-
[8]
Unsupervised data augmentation for consistency training,
Q. Xie, Z. Dai, E. H. Hovy, T. Luong, and Q. Le, “Unsupervised data augmentation for consistency training,” in Advances in Neural Information Processing Systems, 2020
work page 2020
-
[9]
Does tail label help for large-scale multi- label learning?
T. Wei and Y.-F. Li, “Does tail label help for large-scale multi- label learning?” IEEE Transactions on Neural Networks and Learning Systems, vol. 31, no. 7, pp. 2315–2324, 2019
work page 2019
-
[10]
Flexmatch: Boosting semi-supervised learning with curriculum pseudo labeling,
B. Zhang, Y. Wang, W. Hou, H. Wu, J. Wang, M. Okumura, and T. Shinozaki, “Flexmatch: Boosting semi-supervised learning with curriculum pseudo labeling,” NeurIPS, vol. 34, pp. 18 408–18 419, 2021
work page 2021
-
[11]
Freematch: Self-adaptive thresholding for semi-supervised learning,
Y. Wang, H. Chen, Q. Heng, W. Hou, Y. Fan, Z. Wu, J. Wang, M. Savvides, T. Shinozaki, B. Raj et al., “Freematch: Self-adaptive thresholding for semi-supervised learning,” arXiv preprint, 2022
work page 2022
-
[12]
Softmatch: Addressing the quantity- quality trade-off in semi-supervised learning,
H. Chen, R. Tao, Y. Fan, Y. Wang, J. Wang, B. Schiele, X. Xie, B. Raj, and M. Savvides, “Softmatch: Addressing the quantity- quality trade-off in semi-supervised learning,”arXiv preprint, 2023
work page 2023
-
[13]
Bbn: Bilateral- branch network with cumulative learning for long-tailed visual recognition,
B. Zhou, Q. Cui, X.-S. Wei, and Z.-M. Chen, “Bbn: Bilateral- branch network with cumulative learning for long-tailed visual recognition,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 9719–9728
work page 2020
-
[14]
Learning from multiple experts: Self-paced knowledge distillation for long-tailed classification,
L. Xiang, G. Ding, and J. Han, “Learning from multiple experts: Self-paced knowledge distillation for long-tailed classification,” in European Conference on Computer Vision . Springer, 2020, pp. 247– 263
work page 2020
-
[15]
Long-tailed recognition by routing diverse distribution-aware experts,
X. Wang, L. Lian, Z. Miao, Z. Liu, and S. X. Yu, “Long-tailed recognition by routing diverse distribution-aware experts,” arXiv preprint arXiv:2010.01809, 2020
-
[16]
Nested collaborative learning for long-tailed visual recognition,
J. Li, Z. Tan, J. Wan, Z. Lei, and G. Guo, “Nested collaborative learning for long-tailed visual recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2022, pp. 6949–6958
work page 2022
-
[17]
Parametric contrastive learning,
J. Cui, Z. Zhong, S. Liu, B. Yu, and J. Jia, “Parametric contrastive learning,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 715–724
work page 2021
-
[18]
Large- scale long-tailed recognition in an open world,
Z. Liu, Z. Miao, X. Zhan, J. Wang, B. Gong, and S. X. Yu, “Large- scale long-tailed recognition in an open world,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 2537–2546
work page 2019
-
[20]
Cross-domain empir- ical risk minimization for unbiased long-tailed classification,
B. Zhu, Y. Niu, X.-S. Hua, and H. Zhang, “Cross-domain empir- ical risk minimization for unbiased long-tailed classification,” in Proceedings of the AAAI Conference on Artificial Intelligence , 2022
work page 2022
-
[21]
Abc: Auxiliary balanced classifier for class-imbalanced semi-supervised learning,
H. Lee, S. Shin, and H. Kim, “Abc: Auxiliary balanced classifier for class-imbalanced semi-supervised learning,” Advances in Neural Information Processing Systems, vol. 34, pp. 7082–7094, 2021
work page 2021
-
[22]
Z. Lai, C. Wang, H. Gunawan, S. S. Cheung, and C. Chuah, “Smoothed adaptive weighting for imbalanced semi-supervised learning: Improve reliability against unknown distribution data,” in International Conference on Machine Learning , 2022, pp. 11 828– 11 843
work page 2022
-
[23]
Transfer and share: Semi-supervised learning from long-tailed data,
T. Wei, Q.-Y. Liu, J.-X. Shi, W.-W. Tu, and L.-Z. Guo, “Transfer and share: Semi-supervised learning from long-tailed data,” Machine Learning, 2022
work page 2022
-
[24]
Dis- tribution aligning refinery of pseudo-label for imbalanced semi- supervised learning,
J. Kim, Y. Hur, S. Park, E. Yang, S. J. Hwang, and J. Shin, “Dis- tribution aligning refinery of pseudo-label for imbalanced semi- supervised learning,” Advances in Neural Information Processing Systems, vol. 33, pp. 14 567–14 579, 2020
work page 2020
-
[25]
Crest: A class-rebalancing self-training framework for imbalanced semi- supervised learning,
C. Wei, K. Sohn, C. Mellina, A. Yuille, and F. Yang, “Crest: A class-rebalancing self-training framework for imbalanced semi- supervised learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 10 857–10 866
work page 2021
-
[26]
Bridging the gap: Learning pace synchronization for open-world semi-supervised learning,
B. Ye, K. Gan, T. Wei, and M.-L. Zhang, “Bridging the gap: Learning pace synchronization for open-world semi-supervised learning,” arXiv preprint arXiv:2309.11930, 2023
-
[27]
Daso: Distribution-aware semantics-oriented pseudo-label for imbalanced semi-supervised learning,
Y. Oh, D.-J. Kim, and I. S. Kweon, “Daso: Distribution-aware semantics-oriented pseudo-label for imbalanced semi-supervised learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 9786–9796
work page 2022
-
[28]
Towards realistic long-tailed semi-supervised learning: Consistency is all you need,
T. Wei and K. Gan, “Towards realistic long-tailed semi-supervised learning: Consistency is all you need,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2023, pp. 3469–3478
work page 2023
-
[29]
Simpro: A simple probabilistic framework towards realistic long-tailed semi-supervised learn- ing,
C. Du, Y. Han, and G. Huang, “Simpro: A simple probabilistic framework towards realistic long-tailed semi-supervised learn- ing,” arXiv preprint arXiv:2402.13505, 2024
-
[30]
Self-supervised aggrega- tion of diverse experts for test-agnostic long-tailed recognition,
Y. Zhang, B. Hooi, L. Hong, and J. Feng, “Self-supervised aggrega- tion of diverse experts for test-agnostic long-tailed recognition,” Advances in Neural Information Processing Systems , vol. 35, pp. 34 077–34 090, 2022
work page 2022
-
[31]
Long-tail learning via logit adjustment,
A. K. Menon, S. Jayasumana, A. S. Rawat, H. Jain, A. Veit, and S. Kumar, “Long-tail learning via logit adjustment,” inInternational Conference on Learning Representations, 2020
work page 2020
-
[32]
Decoupling representation and classifier for long- tailed recognition,
B. Kang, S. Xie, M. Rohrbach, Z. Yan, A. Gordo, J. Feng, and Y. Kalantidis, “Decoupling representation and classifier for long- tailed recognition,” in International Conference on Learning Represen- tations, 2020
work page 2020
-
[33]
Improving calibration for long- tailed recognition,
Z. Zhong, J. Cui, S. Liu, and J. Jia, “Improving calibration for long- tailed recognition,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 16 489–16 498
work page 2021
-
[34]
Balanced meta-softmax for long-tailed visual recognition,
J. Ren, C. Yu, X. Ma, H. Zhao, S. Yi et al., “Balanced meta-softmax for long-tailed visual recognition,” Advances in Neural Information Processing Systems, vol. 33, pp. 4175–4186, 2020
work page 2020
-
[35]
Remixmatch: Semi-supervised learn- ing with distribution matching and augmentation anchoring,
D. Berthelot, N. Carlini, E. D. Cubuk, A. Kurakin, K. Sohn, H. Zhang, and C. Raffel, “Remixmatch: Semi-supervised learn- ing with distribution matching and augmentation anchoring,” in International Conference on Learning Representations, 2019
work page 2019
-
[36]
Cossl: Co-learning of representation and classifier for imbalanced semi-supervised learning,
Y. Fan, D. Dai, A. Kukleva, and B. Schiele, “Cossl: Co-learning of representation and classifier for imbalanced semi-supervised learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 14 574–14 584
work page 2022
-
[37]
mixup: Beyond empirical risk minimization,
H. Zhang, M. Ciss ´e, Y. N. Dauphin, and D. Lopez-Paz, “mixup: Beyond empirical risk minimization,” in International Conference on Learning Representations, 2018
work page 2018
-
[38]
Improved Regularization of Convolutional Neural Networks with Cutout
T. DeVries and G. W. Taylor, “Improved regularization of convolutional neural networks with cutout,” arXiv preprint arXiv:1708.04552, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[39]
Randaugment: Practical automated data augmentation with a reduced search space,
E. D. Cubuk, B. Zoph, J. Shlens, and Q. V . Le, “Randaugment: Practical automated data augmentation with a reduced search space,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, 2020, pp. 702–703
work page 2020
-
[40]
Pseudo-labeling and confirmation bias in deep semi- supervised learning,
E. Arazo, D. Ortego, P . Albert, N. E. O’Connor, and K. McGuin- ness, “Pseudo-labeling and confirmation bias in deep semi- supervised learning,” in IJCNN, 2020, pp. 1–8
work page 2020
-
[41]
Self-tuning for data- efficient deep learning,
X. Wang, J. Gao, M. Long, and J. Wang, “Self-tuning for data- efficient deep learning,” in ICML, 2021, pp. 10 738–10 748
work page 2021
-
[42]
Z. Huang, L. Shen, J. Yu, B. Han, and T. Liu, “Flatmatch: Bridging labeled data and unlabeled data with cross-sharpness for semi- supervised learning,” Advances in Neural Information Processing Systems, vol. 36, pp. 18 474–18 494, 2023
work page 2023
-
[43]
Interlude: In- teractions between labeled and unlabeled data to enhance semi- supervised learning,
Z. Huang, X. Yu, D. Zhu, and M. C. Hughes, “Interlude: In- teractions between labeled and unlabeled data to enhance semi- supervised learning,” arXiv preprint arXiv:2403.10658, 2024. JOURNAL OF LATEX CLASS FILES, VOL. XX, NO. XX, AUGUST XX 14
-
[44]
Learning multiple layers of features from tiny images,
A. Krizhevsky, G. Hinton et al. , “Learning multiple layers of features from tiny images,” 2009
work page 2009
-
[45]
An analysis of single-layer net- works in unsupervised feature learning,
A. Coates, A. Ng, and H. Lee, “An analysis of single-layer net- works in unsupervised feature learning,” in Proceedings of the four- teenth international conference on artificial intelligence and statistics . JMLR Workshop and Conference Proceedings, 2011, pp. 215–223
work page 2011
-
[46]
Realistic evaluation of deep semi-supervised learning algo- rithms,
A. Oliver, A. Odena, C. A. Raffel, E. D. Cubuk, and I. Goodfel- low, “Realistic evaluation of deep semi-supervised learning algo- rithms,” Advances in neural information processing systems , vol. 31, 2018
work page 2018
-
[47]
S. Zagoruyko and N. Komodakis, “Wide residual networks,” arXiv preprint arXiv:1605.07146, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[48]
On the im- portance of initialization and momentum in deep learning,
I. Sutskever, J. Martens, G. Dahl, and G. Hinton, “On the im- portance of initialization and momentum in deep learning,” in International conference on machine learning. PMLR, 2013, pp. 1139– 1147
work page 2013
-
[49]
Some methods of speeding up the convergence of it- eration methods,
B. T. Polyak, “Some methods of speeding up the convergence of it- eration methods,” Ussr computational mathematics and mathematical physics, vol. 4, no. 5, pp. 1–17, 1964
work page 1964
-
[50]
A method of solving a convex programming problem with convergence rate o(1/k2),
Y. Nesterov, “A method of solving a convex programming problem with convergence rate o(1/k2),” in Sov. Math. Dokl, vol. 27
-
[51]
SGDR: Stochastic Gradient Descent with Warm Restarts
I. Loshchilov and F. Hutter, “Sgdr: Stochastic gradient descent with warm restarts,” arXiv preprint arXiv:1608.03983, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[52]
What makes ImageNet good for transfer learning?
M. Huh, P . Agrawal, and A. A. Efros, “What makes imagenet good for transfer learning?” arXiv preprint arXiv:1608.08614, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[53]
Im- agenet: A large-scale hierarchical image database,
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Im- agenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition. Ieee, 2009, pp. 248–255
work page 2009
-
[54]
M. Jia, L. Tang, B.-C. Chen, C. Cardie, S. Belongie, B. Hariharan, and S.-N. Lim, “Visual prompt tuning,” in European Conference on Computer Vision. Springer, 2022, pp. 709–727
work page 2022
-
[55]
Adaptformer: Adapting vision transformers for scalable visual recognition,
S. Chen, C. Ge, Z. Tong, J. Wang, Y. Song, J. Wang, and P . Luo, “Adaptformer: Adapting vision transformers for scalable visual recognition,” NeurIPS, vol. 35, pp. 16 664–16 678, 2022
work page 2022
-
[56]
Robust long-tailed learning under label noise,
T. Wei, J.-X. Shi, W.-W. Tu, and Y.-F. Li, “Robust long-tailed learning under label noise,” arXiv preprint arXiv:2108.11569, 2021
-
[57]
L. Van der Maaten and G. Hinton, “Visualizing data using t-sne,” Journal of Machine Learning Research, vol. 9, no. 11, 2008
work page 2008
-
[58]
Learning transfer- able visual models from natural language supervision,
A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P . Mishkin, J. Clark et al., “Learning transfer- able visual models from natural language supervision,” in ICML, 2021, pp. 8748–8763
work page 2021
-
[59]
Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning,
H. Liu, D. Tam, M. Muqeeth, J. Mohta, T. Huang, M. Bansal, and C. A. Raffel, “Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning,” NeurIPS, vol. 35, pp. 1950– 1965, 2022
work page 1950
-
[60]
Parameter-efficient long-tailed recognition,
J.-X. Shi, T. Wei, Z. Zhou, X.-Y. Han, J.-J. Shao, and Y.-F. Li, “Parameter-efficient long-tailed recognition,” arXiv preprint, 2023
work page 2023
-
[61]
Parameter-efficient tuning makes a good classification head,
Z. Yang, M. Ding, Y. Guo, Q. Lv, and J. Tang, “Parameter-efficient tuning makes a good classification head,” arXiv preprint, 2022
work page 2022
-
[62]
Erasing the bias: Fine-tuning foundation mod- els for semi-supervised learning,
K. Gan and T. Wei, “Erasing the bias: Fine-tuning foundation mod- els for semi-supervised learning,” arXiv preprint arXiv:2405.11756, 2024
-
[63]
An image is worth 16x16 words: Transformers for image recognition at scale,
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al. , “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint, 2020
work page 2020
-
[64]
Revisiting parameter- efficient tuning: Are we really there yet?
G. Chen, F. Liu, Z. Meng, and S. Liang, “Revisiting parameter- efficient tuning: Are we really there yet?” arXiv preprint, 2022
work page 2022
-
[65]
Lora: Low-rank adaptation of large language models,
E. J. Hu, Y. Shen, P . Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen, “Lora: Low-rank adaptation of large language models,” arXiv preprint, 2021
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.