Recognition: unknown
A₃B₂: Adaptive Asymmetric Adapter for Alleviating Branch Bias in Vision-Language Image Classification with Few-Shot Learning
Pith reviewed 2026-05-14 19:11 UTC · model grok-4.3
The pith
An adaptive asymmetric adapter suppresses image-branch updates in vision-language models when uncertainty is high, improving few-shot classification.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central discovery is that branch bias in vision-language image classification can be alleviated by an adaptive asymmetric adapter called A3B2, which uses uncertainty-aware adapter dampening to suppress image-branch adaptation when prediction uncertainty is high. This is implemented through a lightweight design inspired by mixture-of-experts with load balancing regularization. Experiments confirm it outperforms baselines across three few-shot tasks on 11 datasets.
What carries the argument
Uncertainty-Aware Adapter Dampening (UAAD), which automatically reduces the influence of image-branch adaptations based on prediction uncertainty to balance the branches.
Load-bearing premise
Prediction uncertainty reliably signals when image-branch adaptation should be reduced, without creating new errors or needing per-dataset adjustments.
What would settle it
A dataset where high uncertainty predictions still benefit from full image-branch adaptation, or where the dampening mechanism reduces accuracy compared to fixed adaptation.
Figures
read the original abstract
Efficient transfer learning methods for large-scale vision-language models ($e.g.$, CLIP) enable strong few-shot transfer, yet existing adaptation methods follow a fixed fine-tuning paradigm that implicitly assumes a uniform importance of the image and text branches, which has not been systematically studied in image classification. Through extensive analysis, we reveal a Branch Bias issue in vision-language image classification: adapting the image encoder does not always improve performance under out-of-distribution settings. Motivated by this observation, we propose A$_3$B$_2$, an Adaptive Asymmetric Adapter that alleviates Branch Bias in few-shot learning. A$_3$B$_2$ introduces Uncertainty-Aware Adapter Dampening (UAAD), which automatically suppresses image-branch adaptation when prediction uncertainty is high, enabling soft and data-driven control without manual intervention. Architecturally, A$_3$B$_2$ adopts a lightweight asymmetric design inspired by mixture-of-experts with Load Balancing Regularization. Extensive experiments on three few-shot image classification tasks across 11 datasets demonstrate that A$_3$B$_2$ consistently outperforms 11 competitive prompt- and adapter-based baselines.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that vision-language models exhibit a 'Branch Bias' in few-shot image classification, where image-encoder adaptation does not always improve performance under out-of-distribution conditions. Motivated by this, it introduces A₃B₂, an adaptive asymmetric adapter that uses Uncertainty-Aware Adapter Dampening (UAAD) to automatically suppress image-branch adaptation when prediction uncertainty is high. The design incorporates a lightweight mixture-of-experts-inspired asymmetry and load-balancing regularization. Experiments across three few-shot tasks on 11 datasets show consistent outperformance over 11 prompt- and adapter-based baselines.
Significance. If the branch-bias observation holds and UAAD provides a reliable, dataset-agnostic control without new failure modes, the work would strengthen few-shot adaptation for CLIP-style models by replacing fixed fine-tuning paradigms with a data-driven branch-balancing mechanism. The scale of the evaluation (11 datasets, 11 baselines) is a clear strength that would support adoption if the uncertainty proxy is shown to be robust.
major comments (3)
- [§3.2] §3.2 (UAAD definition): the claim that prediction uncertainty serves as a faithful proxy for branch bias is load-bearing for the 'no manual intervention' guarantee, yet the manuscript provides no ablation or diagnostic showing that high uncertainty correlates specifically with image-branch harm rather than label noise, class imbalance, or text-branch issues; without this, suppression could degrade in-distribution performance.
- [§4] §4 (Experiments): the abstract states 'consistent outperformance' across 11 datasets, but no error bars, statistical significance tests, or exact few-shot sampling protocols (e.g., number of seeds, class-balanced splits) are reported; this prevents assessment of whether reported gains exceed variance and undermines the cross-dataset claim.
- [§2] §2 (Branch Bias Analysis): the motivation depends on an 'extensive analysis' revealing when image adaptation hurts, but the specific figures, tables, or quantitative thresholds linking uncertainty to performance drop are not shown; this leaves the UAAD design choice under-motivated relative to its centrality.
minor comments (2)
- [Figure 1] Figure 1 or 2 (architecture diagram): the asymmetric MoE routing and dampening factor should be annotated with the exact mathematical form of the uncertainty-based gate to improve reproducibility.
- [Table 1] Table 1 (baseline comparison): ensure all 11 baselines include their original citation and hyper-parameter settings used in the re-implementation.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments, which have helped us identify areas where the manuscript can be strengthened. We address each major comment below, outlining the specific revisions we will implement in the next version of the paper.
read point-by-point responses
-
Referee: [§3.2] §3.2 (UAAD definition): the claim that prediction uncertainty serves as a faithful proxy for branch bias is load-bearing for the 'no manual intervention' guarantee, yet the manuscript provides no ablation or diagnostic showing that high uncertainty correlates specifically with image-branch harm rather than label noise, class imbalance, or text-branch issues; without this, suppression could degrade in-distribution performance.
Authors: We agree that additional diagnostics are needed to confirm that high uncertainty specifically signals image-branch harm rather than confounding factors. In the revised manuscript, we will add a dedicated ablation subsection with new experiments and plots that measure performance change when forcing image-branch adaptation at varying uncertainty levels, while controlling for label noise and class balance. We will also report in-distribution results to verify that UAAD does not degrade performance when uncertainty is low. revision: yes
-
Referee: [§4] §4 (Experiments): the abstract states 'consistent outperformance' across 11 datasets, but no error bars, statistical significance tests, or exact few-shot sampling protocols (e.g., number of seeds, class-balanced splits) are reported; this prevents assessment of whether reported gains exceed variance and undermines the cross-dataset claim.
Authors: We acknowledge that the current presentation lacks the necessary statistical details. The revised version will report standard deviation error bars over 5 random seeds, specify the exact few-shot protocol (class-balanced random sampling of k examples per class with no overlap across seeds), and include paired t-test p-values comparing A₃B₂ against each baseline on every dataset. These additions will appear in Section 4 and the corresponding tables. revision: yes
-
Referee: [§2] §2 (Branch Bias Analysis): the motivation depends on an 'extensive analysis' revealing when image adaptation hurts, but the specific figures, tables, or quantitative thresholds linking uncertainty to performance drop are not shown; this leaves the UAAD design choice under-motivated relative to its centrality.
Authors: Section 2 presents the branch-bias observation, but we agree that more explicit quantitative support would strengthen the motivation. We will expand Section 2 with new figures and a table that report performance deltas as a function of uncertainty bins, along with concrete thresholds (e.g., uncertainty > 0.7 correlates with >3% drop when image adaptation is applied). These will directly link the observed bias to the UAAD design. revision: yes
Circularity Check
No significant circularity; empirical design with external validation
full rationale
The paper motivates A3B2 from an observed Branch Bias phenomenon and introduces UAAD as an empirical, uncertainty-driven suppression mechanism without any shown equations, derivations, or fitted parameters that reduce to the inputs by construction. No self-definitional loops, fitted-input predictions, or load-bearing self-citations appear in the provided text. The method is presented as a lightweight asymmetric adapter with load-balancing regularization, validated through experiments on 11 datasets against 11 baselines. This keeps the central claim independent of its own fitted values or prior self-citations, qualifying as self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Flamingo: a visual language model for few-shot learning.Advances in neural information pro- cessing systems, 35:23716–23736,
[Alayracet al., 2022 ] Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, Antoine Miech, Iain Barr, Yana Hasson, Karel Lenc, Arthur Mensch, Katherine Millican, Malcolm Reynolds, et al. Flamingo: a visual language model for few-shot learning.Advances in neural information pro- cessing systems, 35:23716–23736,
2022
-
[2]
Food-101–mining discriminative com- ponents with random forests
[Bossardet al., 2014 ] Lukas Bossard, Matthieu Guillaumin, and Luc Van Gool. Food-101–mining discriminative com- ponents with random forests. InComputer vision–ECCV 2014: 13th European conference, zurich, Switzerland, September 6-12, 2014, proceedings, part VI 13, pages 446–461. Springer,
2014
-
[3]
Language models are few-shot learners.Advances in neural information processing sys- tems, 33:1877–1901,
[Brownet al., 2020 ] Tom Brown, Benjamin Mann, Nick Ry- der, Melanie Subbiah, Jared D Kaplan, Prafulla Dhari- wal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners.Advances in neural information processing sys- tems, 33:1877–1901,
2020
-
[4]
Markov chains.Springer- Verlag, New York,
[Chung, 1967] Kai Lai Chung. Markov chains.Springer- Verlag, New York,
1967
-
[5]
Describing textures in the wild
[Cimpoiet al., 2014 ] Mircea Cimpoi, Subhransu Maji, Ia- sonas Kokkinos, Sammy Mohamed, and Andrea Vedaldi. Describing textures in the wild. InProceedings of the IEEE conference on computer vision and pattern recog- nition, pages 3606–3613,
2014
-
[6]
Imagenet: A large-scale hierarchical image database
[Denget al., 2009 ] Jia Deng, Wei Dong, Richard Socher, Li- Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee,
2009
-
[7]
Switch transformers: Scaling to trillion param- eter models with simple and efficient sparsity.Journal of Machine Learning Research, 23(120):1–39,
[Feduset al., 2022 ] William Fedus, Barret Zoph, and Noam Shazeer. Switch transformers: Scaling to trillion param- eter models with simple and efficient sparsity.Journal of Machine Learning Research, 23(120):1–39,
2022
-
[8]
Learning generative visual models from few train- ing examples: An incremental bayesian approach tested on 101 object categories
[Fei-Feiet al., 2004 ] Li Fei-Fei, Rob Fergus, and Pietro Per- ona. Learning generative visual models from few train- ing examples: An incremental bayesian approach tested on 101 object categories. In2004 conference on computer vision and pattern recognition workshop, pages 178–178. IEEE,
2004
-
[9]
Hidden in plain sight: Vlms overlook their visual representations.arXiv preprint arXiv:2506.08008,
[Fuet al., 2025 ] Stephanie Fu, Tyler Bonnen, Devin Guil- lory, and Trevor Darrell. Hidden in plain sight: Vlms overlook their visual representations.arXiv preprint arXiv:2506.08008,
-
[10]
Higher layers need more lora experts.arXiv preprint arXiv:2402.08562,
[Gaoet al., 2024a ] Chongyang Gao, Kezhen Chen, Jinmeng Rao, Baochen Sun, Ruibo Liu, Daiyi Peng, Yawen Zhang, Xiaoyuan Guo, Jie Yang, and VS Subrahmanian. Higher layers need more lora experts.arXiv preprint arXiv:2402.08562,
-
[11]
[Gonget al., 2025 ] Shizhan Gong, Yankai Jiang, Qi Dou, and Farzan Farnia. Kernel-based unsupervised embedding alignment for enhanced visual representation in vision- language models.arXiv preprint arXiv:2506.02557,
-
[12]
[Guo and Gu, 2025a] Yuncheng Guo and Xiaodong Gu. Mmrl: Multi-modal representation learning for vision- language models.arXiv preprint arXiv:2503.08497,
-
[13]
[Guo and Gu, 2025b] Yuncheng Guo and Xiaodong Gu. Mmrl++: Parameter-efficient and interaction-aware rep- resentation learning for vision-language models.arXiv preprint arXiv:2505.10088,
-
[14]
[Helberet al., 2019 ] Patrick Helber, Benjamin Bischke, An- dreas Dengel, and Damian Borth. Eurosat: A novel dataset and deep learning benchmark for land use and land cover classification.IEEE Journal of Selected Top- ics in Applied Earth Observations and Remote Sensing, 12(7):2217–2226,
2019
-
[15]
A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks
[Hendrycks and Gimpel, 2016] Dan Hendrycks and Kevin Gimpel. A baseline for detecting misclassified and out-of- distribution examples in neural networks.arXiv preprint arXiv:1610.02136,
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[16]
[Jianget al., 2025 ] Zhonghua Jiang, Kunxi Li, Yiyun Zhou, Sihao Liu, Zhaode Wang, Shengyu Zhang, et al. Purekv: Plug-and-play kv cache optimization with spatial-temporal sparse attention for vision-language large models.arXiv preprint arXiv:2510.25600,
-
[17]
Acckv: Towards efficient audio-video llms inference via adaptive-focusing and cross-calibration kv cache optimization
[Jianget al., 2026 ] Zhonghua Jiang, Kui Chen, Kunxi Li, Keting Yin, Yiyun Zhou, Zhaode Wang, Chengfei Lv, and Shengyu Zhang. Acckv: Towards efficient audio-video llms inference via adaptive-focusing and cross-calibration kv cache optimization. InProceedings of the AAAI Con- ference on Artificial Intelligence, volume 40, pages 5494– 5502,
2026
-
[18]
Maple: Multi-modal prompt learning
[Khattaket al., 2023 ] Muhammad Uzair Khattak, Hanoona Rasheed, Muhammad Maaz, Salman Khan, and Fa- had Shahbaz Khan. Maple: Multi-modal prompt learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 19113–19122,
2023
-
[19]
Shifts in selective visual attention: towards the un- derlying neural circuitry
[Koch and Ullman, 1987] Christof Koch and Shimon Ull- man. Shifts in selective visual attention: towards the un- derlying neural circuitry. InMatters of intelligence: Con- ceptual structures in cognitive neuroscience, pages 115–
1987
-
[20]
3d object representations for fine- grained categorization
[Krauseet al., 2013 ] Jonathan Krause, Michael Stark, Jia Deng, and Li Fei-Fei. 3d object representations for fine- grained categorization. InProceedings of the IEEE inter- national conference on computer vision workshops, pages 554–561,
2013
-
[21]
Read-only prompt optimization for vision-language few- shot learning
[Leeet al., 2023 ] Dongjun Lee, Seokwon Song, Jihee Suh, Joonmyeong Choi, Sanghyeok Lee, and Hyunwoo J Kim. Read-only prompt optimization for vision-language few- shot learning. InProceedings of the IEEE/CVF interna- tional conference on computer vision, pages 1401–1411,
2023
-
[22]
Language-driven semantic segmentation,
[Liet al., 2022 ] Boyi Li, Kilian Q Weinberger, Serge Be- longie, Vladlen Koltun, and Ren´e Ranftl. Language-driven semantic segmentation.arXiv preprint arXiv:2201.03546,
-
[23]
Scaling language-image pre-training via masking
[Liet al., 2023 ] Yanghao Li, Haoqi Fan, Ronghang Hu, Christoph Feichtenhofer, and Kaiming He. Scaling language-image pre-training via masking. InProceedings of the IEEE/CVF conference on computer vision and pat- tern recognition, pages 23390–23400,
2023
-
[24]
[Liet al., 2024 ] Ming Li, Jike Zhong, Chenxin Li, Li- uzhuozheng Li, Nie Lin, and Masashi Sugiyama. Vision- language model fine-tuning via simple parameter-efficient modification.arXiv preprint arXiv:2409.16718,
-
[25]
[Liet al., 2025a ] Kunxi Li, Yufan Xiong, Zhonghua Jiang, Yiyun Zhou, Zhaode Wang, Chengfei Lv, and Shengyu Zhang. Flowmm: Cross-modal information flow guided kv cache merging for efficient multimodal context infer- ence.arXiv preprint arXiv:2511.05534,
-
[26]
Open-vocabulary se- mantic segmentation with mask-adapted clip
[Lianget al., 2023 ] Feng Liang, Bichen Wu, Xiaoliang Dai, Kunpeng Li, Yinan Zhao, Hang Zhang, Peizhao Zhang, Peter Vajda, and Diana Marculescu. Open-vocabulary se- mantic segmentation with mask-adapted clip. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7061–7070,
2023
-
[27]
Fine-Grained Visual Classification of Aircraft
[Majiet al., 2013 ] Subhransu Maji, Esa Rahtu, Juho Kan- nala, Matthew Blaschko, and Andrea Vedaldi. Fine- grained visual classification of aircraft.arXiv preprint arXiv:1306.5151,
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[28]
arXiv preprint arXiv:2503.07137 , year=
[Mu and Lin, 2025] Siyuan Mu and Sen Lin. A comprehen- sive survey of mixture-of-experts: Algorithms, theory, and applications.arXiv preprint arXiv:2503.07137,
-
[29]
Automated flower classification over a large number of classes
[Nilsback and Zisserman, 2008] Maria-Elena Nilsback and Andrew Zisserman. Automated flower classification over a large number of classes. In2008 Sixth Indian conference on computer vision, graphics & image processing, pages 722–729. IEEE,
2008
-
[30]
Cats and dogs
[Parkhiet al., 2012 ] Omkar M Parkhi, Andrea Vedaldi, An- drew Zisserman, and CV Jawahar. Cats and dogs. In2012 IEEE conference on computer vision and pattern recogni- tion, pages 3498–3505. IEEE,
2012
-
[31]
Understanding fine-tuning clip for open- vocabulary semantic segmentation in hyperbolic space
[Penget al., 2025 ] Zelin Peng, Zhengqin Xu, Zhilin Zeng, Changsong Wen, Yu Huang, Menglin Yang, Feilong Tang, and Wei Shen. Understanding fine-tuning clip for open- vocabulary semantic segmentation in hyperbolic space. In Proceedings of the Computer Vision and Pattern Recogni- tion Conference, pages 4562–4572,
2025
-
[32]
Learning transferable visual models from nat- ural language supervision
[Radfordet al., 2021 ] Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agar- wal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from nat- ural language supervision. InInternational conference on machine learning, pages 8748–8763. PmLR,
2021
-
[33]
Do imagenet classi- fiers generalize to imagenet? InInternational conference on machine learning, pages 5389–5400
[Rechtet al., 2019 ] Benjamin Recht, Rebecca Roelofs, Lud- wig Schmidt, and Vaishaal Shankar. Do imagenet classi- fiers generalize to imagenet? InInternational conference on machine learning, pages 5389–5400. PMLR,
2019
-
[34]
Inter-module credit assignment in modular reinforcement learning.Neural Networks, 16(7):985–994,
[Samejimaet al., 2003 ] Kazuyuki Samejima, Kenji Doya, and Mitsuo Kawato. Inter-module credit assignment in modular reinforcement learning.Neural Networks, 16(7):985–994,
2003
-
[35]
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
[Shazeeret al., 2017 ] Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hin- ton, and Jeff Dean. Outrageously large neural net- works: The sparsely-gated mixture-of-experts layer.arXiv preprint arXiv:1701.06538,
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[36]
UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild
[Soomroet al., 2012 ] Khurram Soomro, Amir Roshan Za- mir, and Mubarak Shah. Ucf101: A dataset of 101 human actions classes from videos in the wild.arXiv preprint arXiv:1212.0402,
work page internal anchor Pith review Pith/arXiv arXiv 2012
-
[37]
Hydralora: An asymmet- ric lora architecture for efficient fine-tuning.Advances in Neural Information Processing Systems, 37:9565–9584,
[Tianet al., 2024 ] Chunlin Tian, Zhan Shi, Zhijiang Guo, Li Li, and Cheng-Zhong Xu. Hydralora: An asymmet- ric lora architecture for efficient fine-tuning.Advances in Neural Information Processing Systems, 37:9565–9584,
2024
-
[38]
Attention is all you need.Advances in neural information processing systems, 30,
[Vaswaniet al., 2017 ] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30,
2017
-
[39]
Learning robust global repre- sentations by penalizing local predictive power.Advances in neural information processing systems, 32,
[Wanget al., 2019 ] Haohan Wang, Songwei Ge, Zachary Lipton, and Eric P Xing. Learning robust global repre- sentations by penalizing local predictive power.Advances in neural information processing systems, 32,
2019
-
[40]
Sun database: Large-scale scene recognition from abbey to zoo
[Xiaoet al., 2010 ] Jianxiong Xiao, James Hays, Krista A Ehinger, Aude Oliva, and Antonio Torralba. Sun database: Large-scale scene recognition from abbey to zoo. In2010 IEEE computer society conference on computer vision and pattern recognition, pages 3485–3492. IEEE,
2010
-
[41]
Side adapter network for open- vocabulary semantic segmentation
[Xuet al., 2023 ] Mengde Xu, Zheng Zhang, Fangyun Wei, Han Hu, and Xiang Bai. Side adapter network for open- vocabulary semantic segmentation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2945–2954,
2023
-
[42]
Go wider instead of deeper
[Xueet al., 2022 ] Fuzhao Xue, Ziji Shi, Futao Wei, Yuxuan Lou, Yong Liu, and Yang You. Go wider instead of deeper. InProceedings of the AAAI Conference on Artificial Intel- ligence, volume 36, pages 8779–8787,
2022
-
[43]
Mma: Multi-modal adapter for vision-language models
[Yanget al., 2024 ] Lingxiao Yang, Ru-Yuan Zhang, Yanchen Wang, and Xiaohua Xie. Mma: Multi-modal adapter for vision-language models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 23826–23837,
2024
-
[44]
Language-image alignment with fixed text en- coders.arXiv preprint arXiv:2506.04209,
[Yanget al., 2025 ] Jingfeng Yang, Ziyang Wu, Yue Zhao, and Yi Ma. Language-image alignment with fixed text en- coders.arXiv preprint arXiv:2506.04209,
-
[45]
Visual-language prompt tuning with knowledge- guided context optimization
[Yaoet al., 2023 ] Hantao Yao, Rui Zhang, and Changsheng Xu. Visual-language prompt tuning with knowledge- guided context optimization. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6757–6767,
2023
-
[46]
Tcp: Textual-based class-aware prompt tuning for visual-language model
[Yaoet al., 2024 ] Hantao Yao, Rui Zhang, and Changsheng Xu. Tcp: Textual-based class-aware prompt tuning for visual-language model. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 23438–23448,
2024
-
[47]
Spurious correla- tions in machine learning: A survey.arXiv preprint arXiv:2402.12715,
[Yeet al., 2024 ] Wenqian Ye, Guangtao Zheng, Xu Cao, Yunsheng Ma, and Aidong Zhang. Spurious correla- tions in machine learning: A survey.arXiv preprint arXiv:2402.12715,
-
[48]
Vision-language models for vision tasks: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence,
[Zhanget al., 2024 ] Jingyi Zhang, Jiaxing Huang, Sheng Jin, and Shijian Lu. Vision-language models for vision tasks: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence,
2024
-
[49]
[Zhanget al., 2025 ] Dacao Zhang, Kun Zhang, Shimao Chu, Le Wu, Xin Li, and Si Wei. More: A mixture of low-rank experts for adaptive multi-task learning.arXiv preprint arXiv:2505.22694,
-
[50]
Multimodal graph-based variational mixture of experts network for zero-shot multi- modal information extraction
[Zhouet al., 2025a ] Baohang Zhou, Ying Zhang, Yu Zhao, Xuhui Sui, and Xiaojie Yuan. Multimodal graph-based variational mixture of experts network for zero-shot multi- modal information extraction. InProceedings of the ACM on Web Conference 2025, pages 4823–4831,
2025
-
[51]
Disentangled knowledge tracing for alleviating cognitive bias
[Zhouet al., 2025c ] Yiyun Zhou, Zheqi Lv, Shengyu Zhang, and Jingyuan Chen. Disentangled knowledge tracing for alleviating cognitive bias. InProceedings of the ACM on Web Conference 2025, pages 2633–2645,
2025
-
[52]
Cola: Collaborative low-rank adaptation
[Zhouet al., 2025d ] Yiyun Zhou, Chang Yao, and Jingyuan Chen. Cola: Collaborative low-rank adaptation. InFind- ings of the Association for Computational Linguistics: ACL 2025, pages 14115–14130,
2025
-
[53]
[Zhouet al., 2026a ] Yiyun Zhou, Jingwei Shi, Mingjing Xu, Zhonghua Jiang, and Jingyuan Chen. Beyond student: An asymmetric network for neural network inheritance.arXiv preprint arXiv:2602.09509,
-
[54]
This strongly demonstrates the effectiveness of the proposed fixed asymmetric design
From these results, we observe thatA3 generally performs worse than A3 across different tasks. This strongly demonstrates the effectiveness of the proposed fixed asymmetric design. In the following, we analyze the underlying reasons behind this outcome. A.2 Theoretical Support We build upon the theoretical analysis developed in our previ- ous work [Zhouet...
1987
-
[55]
one-down-many-ups
Theoretical Analysis.The one-down-many-ups architec- ture imposes a single shared bottleneck: all information fromXtoYmust pass through the same low-dimensional Z. This meansZmust serve as the representation for the entire mixtureofHexperts. Consequently, to maximize the predictive informationI(Z;Y),Zis forced to encode only those features ofXthat are sal...
2024
-
[56]
The second term penalizes large updates and is non-negative, hence: ∥∇V(x) ℓ′∥ ≤ ∥∇ V(x) ℓ∥
Then: ∇V(x) ℓ′ =∇ V(x) ℓ+ (1−κ(x))∇ V(x) ∥∆v(x)∥2. The second term penalizes large updates and is non-negative, hence: ∥∇V(x) ℓ′∥ ≤ ∥∇ V(x) ℓ∥. Taking expectation: Ceff V (T)≤C V (T). Method ImageNetCaltech101OxfordPetsStanfordCarsFlowers102Food101FGVCAircraftSUN397DTDEuroSATUCF101Average CoOpOp 70.62 94.52 90.47 65.91 71.92 86.02 23.34 66.54 45.51 44.43 ...
2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.