BioMedVR: Confusion-Aware Mixture-of-Prompt Experts for Biomedical Visual Reprogramming
Pith reviewed 2026-06-26 00:33 UTC · model grok-4.3
The pith
Visual reprogramming adapts vision-language models to biomedical images by suppressing confusion between similar classes with a mixture of prompt experts.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
BioMedVR is the first VR-based framework for biomedical imaging that enables few-shot adaptation of pretrained VLMs through compact learnable VR modules, using a Confusion Minimization Mechanism that leverages LLM-generated confusion-aware attributes together with a Confusion-Suppression Loss to reduce false-positive alignment, and a Mixture-of-Prompt Experts that combines a positive expert for main-class discrimination and a negative expert for confusion suppression balanced via adaptive gating, achieving superior accuracy and generalization on 18 datasets including 11 biomedical ones.
What carries the argument
Mixture-of-Prompt Experts that combines a positive expert for main-class discrimination and a negative expert for confusion suppression, balanced via adaptive gating, supported by the Confusion Minimization Mechanism using LLM-generated confusion-aware attributes and Confusion-Suppression Loss.
If this is right
- Few-shot adaptation of vision-language models becomes feasible for biomedical tasks without full-model fine-tuning.
- False-positive alignments decrease in fine-grained medical scenarios that have subtle inter-class differences.
- The method improves generalization on both biomedical datasets and natural image benchmarks.
- Parameter-efficient input perturbations replace the need for extensive labeled medical data.
Where Pith is reading between the lines
- The same confusion-suppression structure could apply to other domains with high visual similarity between classes, such as satellite or industrial images.
- Success of the LLM-generated attributes points to potential for automating prompt design in other specialized imaging fields.
- Adaptive gating between positive and negative experts may generalize to other multi-prompt or multi-expert vision systems.
Load-bearing premise
LLM-generated confusion-aware attributes will accurately identify confusing negatives and the suppression loss will reduce false-positive alignments without introducing new biases in fine-grained biomedical scenarios.
What would settle it
On one of the 11 biomedical datasets, BioMedVR shows no reduction in false-positive alignments between similar classes compared to standard visual reprogramming without the confusion mechanism.
Figures
read the original abstract
Recent advances in vision-language models (VLMs) such as CLIP have demonstrated strong generalization across natural-image domains. However, adapting these models to biomedical imaging is non-trivial: full-model fine-tuning is computationally expensive, while medical data are often scarce and exhibit subtle, fine-grained inter-class differences, making parameter-efficient adaptation particularly critical. Visual Reprogramming (VR) offers a parameter-efficient alternative by injecting learnable perturbations into the input space, but existing VR approaches for VLMs mainly focus on positive class prompts and overlook confusing negatives, leading to miscalibrated predictions in fine-grained medical scenarios. We present BioMedVR, the first VR-based framework for biomedical imaging, enabling few-shot adaptation of pretrained VLMs through compact learnable VR modules. To mitigate class confusion, we introduce a Confusion Minimization Mechanism that leverages LLM-generated confusion-aware attributes together with a Confusion-Suppression Loss to explicitly reduce false-positive alignment. Moreover, the designed Mixture-of-Prompt Experts combines a positive expert for main-class discrimination and a negative expert for confusion suppression, balanced via adaptive gating. Extensive experiments on 18 datasets, including 11 biomedical datasets and 7 natural image benchmarks, demonstrate that BioMedVR achieves superior accuracy and generalization, effectively bridging VR and VLMs in biomedical domains.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes BioMedVR, the first visual reprogramming (VR) framework for biomedical imaging that adapts pretrained VLMs in a parameter-efficient manner. It introduces a Confusion Minimization Mechanism using LLM-generated confusion-aware attributes paired with a Confusion-Suppression Loss to reduce false-positive alignments, and a Mixture-of-Prompt Experts architecture with positive and negative experts balanced by adaptive gating. The central claim is superior accuracy and generalization on 18 datasets (11 biomedical + 7 natural-image benchmarks) compared to existing VR and adaptation methods.
Significance. If the empirical results hold under rigorous validation, the work would offer a practical, low-parameter route for fine-grained biomedical VLM adaptation where data scarcity and class confusion are acute. The explicit handling of negative-class confusion via LLM attributes and MoE-style gating is a targeted extension of prior VR literature.
major comments (2)
- [Abstract] Abstract: the claim of 'superior accuracy and generalization' on 18 datasets is presented without any mention of baselines, number of shots, error bars, statistical significance tests, or ablation studies, rendering the central empirical claim unverifiable from the provided text.
- [Abstract] The weakest assumption—that LLM-generated confusion-aware attributes plus the proposed loss and gating will reliably suppress false positives in fine-grained biomedical settings without introducing new biases—receives no supporting analysis or counter-example testing in the visible sections.
minor comments (1)
- Notation for the adaptive gating function and the exact form of the Confusion-Suppression Loss should be defined with equations in the main text rather than left at a high-level description.
Simulated Author's Rebuttal
We thank the referee for the comments. The abstract is necessarily concise, but the full manuscript contains the requested experimental details and analyses; we address each point below.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim of 'superior accuracy and generalization' on 18 datasets is presented without any mention of baselines, number of shots, error bars, statistical significance tests, or ablation studies, rendering the central empirical claim unverifiable from the provided text.
Authors: The abstract prioritizes brevity. Sections 4 and 5 of the full manuscript report comparisons against multiple VR and adaptation baselines, results across 1/2/4/8/16-shot settings, error bars from repeated runs, statistical significance testing, and ablation studies. We will revise the abstract to include a concise qualifier referencing these elements. revision: yes
-
Referee: [Abstract] The weakest assumption—that LLM-generated confusion-aware attributes plus the proposed loss and gating will reliably suppress false positives in fine-grained biomedical settings without introducing new biases—receives no supporting analysis or counter-example testing in the visible sections.
Authors: The manuscript supplies supporting evidence through loss ablations (Section 4.2) that quantify false-positive reduction on biomedical data and gating analysis (Section 4.3). Qualitative examples and failure-case discussion appear in the supplement. We agree that an explicit limitations paragraph on potential biases would strengthen the work and will add it. revision: partial
Circularity Check
No significant circularity detected
full rationale
The paper proposes an empirical framework (BioMedVR) for visual reprogramming of VLMs in biomedical imaging, introducing components such as LLM-generated confusion-aware attributes, a Confusion-Suppression Loss, and a Mixture-of-Prompt Experts architecture with adaptive gating. No mathematical derivation chain, uniqueness theorems, fitted parameters renamed as predictions, or self-citation load-bearing arguments are present in the abstract or described method. The work is a standard parameter-efficient adaptation pipeline augmented with domain-specific mechanisms; all elements are independently specified and evaluated on external datasets rather than reducing to inputs by construction. This is the expected outcome for an applied empirical method paper without theoretical derivations.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Data in brief28, 104863 (2020) 9
Al-Dhabyani, W., Gomaa, M., Khaled, H., Fahmy, A.: Dataset of breast ultrasound images. Data in brief28, 104863 (2020) 9
2020
-
[2]
arXiv (2022) 2, 4, 10, 11
Bahng, H., Jahanian, A., Sankaranarayanan, S., Isola, P.: Exploring visual prompts for adapting large-scale models. arXiv (2022) 2, 4, 10, 11
2022
- [3]
-
[4]
In: ECCV (2014) 9
Bossard, L., Guillaumin, M., Van Gool, L.: Food-101–mining discriminative components with random forests. In: ECCV (2014) 9
2014
-
[5]
In: ICML (2024) 2
Cai, C., Ye, Z., Feng, L., Qi, J., Liu, F.: Sample-specific masks for visual reprogramming-based prompting. In: ICML (2024) 2
2024
-
[6]
In: The Thirteenth International Confer- ence on Learning Representations (2025) 2, 4, 6, 7, 9, 10, 11
Cai, C., Ye, Z., Feng, L., Qi, J., Liu, F.: Attribute-based visual reprogram- ming for vision-language models. In: The Thirteenth International Confer- ence on Learning Representations (2025) 2, 4, 6, 7, 9, 10, 11
2025
-
[7]
In: CVPR (2023) 2, 4, 10, 11
Chen, A., Yao, Y., Chen, P.Y., Zhang, Y., Liu, S.: Understanding and im- proving visual prompting: A label-mapping perspective. In: CVPR (2023) 2, 4, 10, 11
2023
-
[8]
In: ICLR (2024) 2
Chen, H., Wang, J., Shah, A., Tao, R., Wei, H., Xie, X., Sugiyama, M., Raj, B.: Understanding and mitigating the label noise in pre-training on downstream tasks. In: ICLR (2024) 2
2024
-
[9]
In: AAAI (2024) 2
Chen,P.Y.:Modelreprogramming:Resource-efficientcross-domainmachine learning. In: AAAI (2024) 2
2024
-
[10]
org/10.17632/56rmx5bjcr.1,https://www.kaggle.com/ds/35059919
Chen,P.:Kneeosteoarthritisseveritygradingdataset(2018).https://doi. org/10.17632/56rmx5bjcr.1,https://www.kaggle.com/ds/35059919
-
[11]
In: ICLR 2025 Workshop on Human-AI Coevolution 2
Chen, X., Lai, Z., Ruan, K., Chen, S., Liu, J., Liu, Z.: R-llava: Improving med-vqa understanding through visual region of interest. In: ICLR 2025 Workshop on Human-AI Coevolution 2
2025
-
[12]
In: CVPR (2014) 9
Cimpoi, M., Maji, S., Kokkinos, I., Mohamed, S., Vedaldi, A.: Describing textures in the wild. In: CVPR (2014) 9
2014
-
[13]
Codella, N., Rotemberg, V., Tschandl, P., Celebi, M.E., Dusza, S., Gutman, D., Helba, B., Kalloo, A., Liopyris, K., Marchetti, M., et al.: Skin lesion analysis toward melanoma detection 2018: A challenge hosted by the inter- national skin imaging collaboration (isic). arXiv preprint arXiv:1902.03368 (2019) 9
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[14]
arXiv preprint arXiv:2508.10528 (2025) 4
Deng, Z., He, R., Liu, J., Wang, Y., Meng, Z., Jiang, S., Xie, Y., Liu, Z.: Med-glip: Advancing medical language-image pre-training with large-scale grounded dataset. arXiv preprint arXiv:2508.10528 (2025) 4
-
[15]
In: ICLR (2019) 4
Elsayed, G.F., Goodfellow, I., Sohl-Dickstein, J.: Adversarial reprogram- ming of neural networks. In: ICLR (2019) 4
2019
-
[16]
In: CVPR workshop (2004) 9 BioMedVR 17
Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. In: CVPR workshop (2004) 9 BioMedVR 17
2004
-
[17]
In: ACL-IJCNLP (2021) 2
Hambardzumyan, K., Khachatrian, H., May, J.: Warp: Word-level adver- sarial reprogramming. In: ACL-IJCNLP (2021) 2
2021
-
[18]
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (2019) 9
Helber, P., Bischke, B., Dengel, A., Borth, D.: Eurosat: A novel dataset and deep learning benchmark for land use and land cover classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (2019) 9
2019
-
[19]
In: International conference on machine learning
Houlsby, N., Giurgiu, A., Jastrzebski, S., Morrone, B., De Laroussilhe, Q., Gesmundo, A., Attariyan, M., Gelly, S.: Parameter-efficient transfer learn- ing for nlp. In: International conference on machine learning. pp. 2790–2799. PMLR (2019) 4
2019
-
[20]
ICLR 1(2), 3 (2022) 4
Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., Chen, W., et al.: Lora: Low-rank adaptation of large language models. ICLR 1(2), 3 (2022) 4
2022
-
[21]
In: ICASSP (2023) 2
Hung, Y.N., Yang, C.H.H., Chen, P.Y., Lerch, A.: Low-resource music genre classification with cross-modal neural model reprogramming. In: ICASSP (2023) 2
2023
-
[22]
Scientific Reports12(1), 1–14 (2022) 9
Islam, M.N., Hasan, M., Hossain, M.K., Alam, M.G.R., Uddin, M.Z., Soylu, A.: Vision transformer and explainable transfer learning models for auto detection of kidney cyst, stone and tumor from ct-radiography. Scientific Reports12(1), 1–14 (2022) 9
2022
-
[23]
In: International conference on machine learning
Jia, C., Yang, Y., Xia, Y., Chen, Y.T., Parekh, Z., Pham, H., Le, Q., Sung, Y.H., Li, Z., Duerig, T.: Scaling up visual and vision-language representa- tion learning with noisy text supervision. In: International conference on machine learning. pp. 4904–4916. PMLR (2021) 4
2021
-
[24]
In: CVPR (2023) 2
Jing, Y., Yuan, C., Ju, L., Yang, Y., Wang, X., Tao, D.: Deep graph repro- gramming. In: CVPR (2023) 2
2023
-
[25]
Scientific reports6(1), 1–11 (2016) 9
Kather, J.N., Weis, C.A., Bianconi, F., Melchers, S.M., Schad, L.R., Gaiser, T., Marx, A., Zöllner, F.G.: Multi-class texture analysis in colorectal cancer histology. Scientific reports6(1), 1–11 (2016) 9
2016
-
[26]
Cell172(5), 1122 – 1131.e9 (2018) 9
Kermany, D.S., Goldbaum, M., et al.: Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell172(5), 1122 – 1131.e9 (2018) 9
2018
-
[27]
In: CVPR (2023) 2, 4
Khattak, M.U., Rasheed, H., Maaz, M., Khan, S., Khan, F.S.: Maple: Multi- modal prompt learning. In: CVPR (2023) 2, 4
2023
-
[28]
In: Proceedings of the Computer Vision and Pattern Recognition Conference
Koleilat, T., Asgariandehkordi, H., Rivaz, H., Xiao, Y.: Biomedcoop: Learn- ing to prompt for biomedical vision-language models. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 14766–14776 (2025) 2, 4, 10, 11
2025
-
[29]
Köhler, T., Budai, A., Kraus, M., Odstrcilik, J., Michelson, G., Hornegger, J.: Automatic no-reference quality assessment for retinal fundus images us- ing vessel segmentation (06 2013).https://doi.org/10.1109/CBMS.2013. 66277719
-
[30]
Liu et al
Li, Z., Li, X., Fu, X., Zhang, X., Wang, W., Chen, S., Yang, J.: Promp- tkd:Unsupervisedpromptdistillationforvision-languagemodels.In:CVPR (2024) 2, 4 18 J. Liu et al
2024
-
[31]
In: Proceedings of the AAAI Conference on Artificial Intelligence
Liu, J., Hu, T., Du, J., Zhang, R., Zhou, J.T., Liu, Z.: Kpl: Training-free medical knowledge mining of vision-language models. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 39, pp. 18852–18860 (2025) 4
2025
-
[32]
In: Findings of the Association for Computational Linguistics: EMNLP 2024
Liu, J., Hu, T., Xiong, H., Du, J., Feng, Y., Wu, J., Zhou, J.T., Liu, Z.: Vpl: Visual proxy learning framework for zero-shot medical image diagnosis. In: Findings of the Association for Computational Linguistics: EMNLP 2024. pp. 9978–9992 (2024) 4
2024
-
[33]
IEEE Transactions on Emerging Topics in Computational Intelligence8(4), 2816– 2826 (2023) 4
Liu, J., Hu, T., Zhang, Y., Feng, Y., Hao, J., Lv, J., Liu, Z.: Parameter- efficient transfer learning for medical visual question answering. IEEE Transactions on Emerging Topics in Computational Intelligence8(4), 2816– 2826 (2023) 4
2023
-
[34]
34740/KAGGLE/DSV/2645886,https://www.kaggle.com/dsv/26458869
Nickparvar, M.: Brain tumor mri dataset (2021).https://doi.org/10. 34740/KAGGLE/DSV/2645886,https://www.kaggle.com/dsv/26458869
-
[35]
In: Indian conference on computer vision, graphics & image processing (2008) 9
Nilsback, M.E., Zisserman, A.: Automated flower classification over a large number of classes. In: Indian conference on computer vision, graphics & image processing (2008) 9
2008
-
[36]
In: CVPR (2023) 2
Oh, C., Hwang, H., Lee, H.y., Lim, Y., Jung, G., Jung, J., Choi, H., Song, K.: Blackvip: Black-box visual prompting for robust transfer learning. In: CVPR (2023) 2
2023
-
[37]
In: CVPR (2012) 9
Parkhi, O.M., Vedaldi, A., Zisserman, A., Jawahar, C.: Cats and dogs. In: CVPR (2012) 9
2012
-
[38]
Proceedings of the 8th ACM on Multimedia Systems Conference , pages =
Pogorelov, K., Randel, K.R., Griwodz, C., Eskeland, S.L., de Lange, T., Johansen, D., Spampinato, C., Dang-Nguyen, D.T., Lux, M., Schmidt, P.T., Riegler, M., Halvorsen, P.: Kvasir: A multi-class image dataset for computer aided gastrointestinal disease detection. In: Proceedings of the 8th ACM on Multimedia Systems Conference. pp. 164–169. MMSys’17, ACM, ...
-
[39]
Porwal, P., Pachade, S., Kamble, R., Kokare, M., Deshmukh, G., Sa- hasrabuddhe,V.,Meriaudeau,F.:Indiandiabeticretinopathyimagedataset (idrid) (2018).https://doi.org/10.21227/H25W98,https://dx.doi. org/10.21227/H25W989
-
[40]
In: ICML (2021) 2, 4, 5, 10
Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al.: Learning transferable visual models from natural language supervision. In: ICML (2021) 2, 4, 5, 10
2021
-
[41]
Center for Research in Computer Vision (2012) 9
Soomro, K., Zamir, A.R., Shah, M.: A dataset of 101 human action classes from videos in the wild. Center for Research in Computer Vision (2012) 9
2012
-
[42]
Tahir, A.M., Chowdhury, M.E., Khandakar, A., Rahman, T., Qiblawey, Y., Khurshid, U., Kiranyaz, S., Ibtehaz, N., Rahman, M.S., Al-Maadeed, S., Mahmud, S., Ezeddin, M., Hameed, K., Hamid, T.: Covid-19 infection local- ization and severity grading from chest x-ray images. Computers in Biology and Medicine139, 105002 (2021).https://doi.org/https://doi.org/ 10...
-
[43]
In: ICML (2020) 2, 4, 11
Tsai,Y.Y.,Chen,P.Y.,Ho,T.Y.:Transferlearningwithoutknowing:Repro- gramming black-box machine learning models with scarce data and limited resources. In: ICML (2020) 2, 4, 11
2020
-
[44]
In: ICLR (2024) 2
Tsao, H.A., Hsiung, L., Chen, P.Y., Liu, S., Ho, T.Y.: Autovp: An auto- mated visual prompting framework and benchmark. In: ICLR (2024) 2
2024
-
[45]
Scientific data p
Tschandl, P., Rosendahl, C., Kittler, H.: The ham10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Scientific data p. 180161 (2018) 9
2018
-
[46]
In: NeurIPS (2020) 2
Vinod, R., Chen, P.Y., Das, P.: Reprogramming language models for molec- ular representation learning. In: NeurIPS (2020) 2
2020
-
[47]
In: International Conference on Medical Image Computing and Computer-Assisted Interven- tion
Wang, P., Tong, L., Wu, J., Liu, J., Liu, Z.: Fair-moe: Medical fairness- oriented mixture of experts in vision-language models. In: International Conference on Medical Image Computing and Computer-Assisted Interven- tion. pp. 186–196. Springer (2025) 2
2025
-
[48]
In: ICML (2024) 2
Wang, Z., Liang, J., He, R., Wang, Z., Tan, T.: Connecting the dots: Collab- orative fine-tuning for black-box vision-language models. In: ICML (2024) 2
2024
-
[49]
In: ICLR (2024) 2
Xu, Z., Shi, Z., Wei, J., Mu, F., Li, Y., Liang, Y.: Towards few-shot adap- tation of foundation models via multitask finetuning. In: ICLR (2024) 2
2024
-
[50]
In: ICASSP (2023) 4
Yang, C.H.H., Li, B., Zhang, Y., Chen, N., Prabhavalkar, R., Sainath, T.N., Strohman, T.: From english to more languages: Parameter-efficient model reprogramming for cross-lingual speech recognition. In: ICASSP (2023) 4
2023
-
[51]
In: ICML (2021) 2, 4
Yang, C.H.H., Tsai, Y.Y., Chen, P.Y.: Voice2series: Reprogramming acous- tic models for time series classification. In: ICML (2021) 2, 4
2021
-
[52]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Zanella, M., Ben Ayed, I.: Low-rank few-shot adaptation of vision-language models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 1593–1603 (2024) 4
2024
-
[53]
NEJM AI2(1), AIoa2400640 (2025) 10, 11
Zhang, S., Xu, Y., Usuyama, N., Xu, H., Bagga, J., Tinn, R., Preston, S., Rao, R., Wei, M., Valluri, N., et al.: A multimodal biomedical founda- tion model trained from fifteen million image–text pairs. NEJM AI2(1), AIoa2400640 (2025) 10, 11
2025
-
[54]
In: CVPR (2022) 2, 4, 10, 11
Zhou, K., Yang, J., Loy, C.C., Liu, Z.: Conditional prompt learning for vision-language models. In: CVPR (2022) 2, 4, 10, 11
2022
-
[55]
IJCV (2022) 2, 4, 10, 11
Zhou, K., Yang, J., Loy, C.C., Liu, Z.: Learning to prompt for vision- language models. IJCV (2022) 2, 4, 10, 11
2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.