arxiv: 2605.08174 · v1 · submitted 2026-05-05 · 💻 cs.LG · cs.AI· cs.CV

Recognition: unknown

CERSA: Cumulative Energy-Retaining Subspace Adaptation for Memory-Efficient Fine-Tuning

Jingze Ge , Xue Geng , Yun Liu , Wanqi Dong , Wang Zhe Mark , Min Wu , Ngai-Man Cheung , Bharadwaj Veeravalli

show 1 more author

Xulei Yang

Authors on Pith no claims yet

Pith reviewed 2026-05-12 01:07 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CV

keywords parameter-efficient fine-tuninglow-rank adaptationsingular value decompositionmemory-efficient fine-tuningsubspace adaptationlarge modelsweight updatesfine-tuning

0 comments

The pith

CERSA identifies the main energy directions in full fine-tuning weight changes via SVD and adapts models only inside that reduced subspace.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to show that low-rank updates in methods like LoRA miss key directions present in full-parameter fine-tuning, and that storing all frozen weights wastes memory. It claims that singular value decomposition can isolate the principal components holding 90 to 95 percent of the spectral energy in those weight modifications, creating a smaller subspace where low-rank fine-tuning can then occur. This yields lower memory use than prior parameter-efficient approaches while matching or exceeding their accuracy on image recognition, text-to-image, and language tasks. A reader would care because large models become adaptable on hardware with tight memory limits without the usual performance trade-off. The approach treats the energy-retaining subspace as the essential carrier of adaptation information.

Core claim

CERSA applies singular value decomposition to the weight modifications observed in full fine-tuning, retains only the principal components that account for 90 to 95 percent of the spectral energy, derives low-rank representations from this subspace, and performs fine-tuning inside it, thereby reducing memory consumption while outperforming existing PEFT methods across models of varying scales and domains.

What carries the argument

The cumulative energy-retaining subspace from SVD of full-parameter weight update matrices, which supplies the low-rank directions used for adaptation.

If this is right

Memory footprint drops because only the reduced subspace and its low-rank factors need to be stored and updated instead of the full frozen weights.
Performance gap to full fine-tuning narrows because the retained subspace captures the dominant rank characteristics that standard low-rank methods overlook.
The same SVD-plus-low-rank procedure applies uniformly to vision, generation, and language models without task-specific redesign.
Empirical tests confirm outperformance over state-of-the-art PEFT baselines at multiple model scales.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the energy threshold varies by layer or task, further memory savings or accuracy gains may appear without changing the core method.
The same energy-retaining idea could extend to other matrix-compression steps such as pruning or quantization of the update matrix.
Success on this subspace implies that fine-tuning updates are often concentrated in a small number of dominant singular directions, which may inform initialization strategies for new adapters.

Load-bearing premise

The principal components that retain 90 to 95 percent of the spectral energy in the full fine-tuning weight changes contain enough rank information that low-rank adaptation on this subspace recovers any lost performance.

What would settle it

Direct head-to-head runs on the same models and benchmarks where CERSA either uses more memory than LoRA or achieves lower task accuracy than LoRA when the energy threshold is fixed at 90-95 percent.

Figures

Figures reproduced from arXiv: 2605.08174 by Bharadwaj Veeravalli, Jingze Ge, Min Wu, Ngai-Man Cheung, Wang Zhe Mark, Wanqi Dong, Xue Geng, Xulei Yang, Yun Liu.

**Figure 1.** Figure 1: Memory footprint comparison for finetuning ViT-Large (Dosovitskiy, 2021). 1100 1200 1300 Total Memory (MB) 88 89 90 Average Accuracy (%) LoRA PiSSA SVFit SVFT = =0.95 =0.95, =0.90 =0.95, =0.80 = =0.90 = =0.80 CERSA [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 3.** Figure 3: Preserved singular value indices in ViT-Large (Dosovitskiy, 2021) (pre-trained on ImageNet [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Comparison among LoRA (Hu et al., 2022), SVFit (Sun et al., 2024), SVFT (Lingam et al., [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Training process of CERSA. The trainable [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 7.** Figure 7: Comparison on ViT compression rates across various cumulative energy retention rates. Method CIFAR-100 RESISC45 DTD Average Total Memory CERSA(Q, V) 94.0 95.8 82.1 90.6 1194.5 MB CERSA(Q, K, V) 94.4 96.1 82.5 91.0 1232.9 MB CERSA(Q, K, V, P) 94.5 96.0 82.6 91.0 1279.5 MB CERSA(Q, K, V, P, UP, DN) 93.8 94.9 81.6 90.1 1433.1 MB [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

**Figure 8.** Figure 8: Results of visual comparison generated by the subject-driven fine-tuned diffusion model [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗

**Figure 9.** Figure 9: Training throughput and training time of fine-tuning ViT-Large (Dosovitskiy, 2021) on the [PITH_FULL_IMAGE:figures/full_fig_p018_9.png] view at source ↗

**Figure 10.** Figure 10: Out-of-distribution evaluation on various tasks. [PITH_FULL_IMAGE:figures/full_fig_p019_10.png] view at source ↗

**Figure 11.** Figure 11: The similarity between the principal output subspace [PITH_FULL_IMAGE:figures/full_fig_p020_11.png] view at source ↗

**Figure 12.** Figure 12: The similarity between the principal input subspace [PITH_FULL_IMAGE:figures/full_fig_p021_12.png] view at source ↗

read the original abstract

To mitigate the memory constraints associated with fine-tuning large pre-trained models, existing parameter-efficient fine-tuning (PEFT) methods, such as LoRA, rely on low-rank updates. However, such updates fail to fully capture the rank characteristics of the weight modifications observed in full-parameter fine-tuning, resulting in a performance gap. Furthermore, LoRA and other existing PEFT methods still require substantial memory to store the full set of frozen weights, limiting their efficiency in resource-constrained settings. To addres these limitations, we introduce Cumulative Energy-Retaining Subspace Adaptation (CERSA), a novel fine-tuning paradigm that leverages singular value decomposition (SVD) to retain only the principal components responsible for 90% to 95% of the spectral energy. By fine-tuning low-rank representations derived from this principal subspace, CERSA significantly reduces memory consumption. We conduct extensive evaluations of CERSA across models of varying scales and domains, including image recognition, text-to-image generation, and natural language understanding. Empirical results demonstrate that CERSA consistently outperforms state-of-the-art PEFT methods while achieving substantially lower memory requirements. The code will be publicly released.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CERSA's memory savings claim collapses if the SVD subspace really needs a prior full fine-tuning run to obtain the weight deltas.

read the letter

The central point is that CERSA selects a low-rank adaptation basis by running SVD on the weight changes from a full-parameter fine-tuning pass and keeping the components that hold 90-95% of the energy. It then fine-tunes only inside that subspace and reports better accuracy than LoRA-style methods at lower memory across vision, generation, and language tasks. The cumulative-energy rule for picking the subspace is the concrete new piece; it gives a reproducible way to set the basis size instead of guessing rank k. That part is a modest but clear extension of existing subspace ideas in PEFT. If the paper shows a way to obtain the subspace without paying the full memory cost first, the approach could be worth trying for practitioners who already have a small proxy run or can amortize the cost over several tasks. The rest of the paper follows the standard PEFT template: freeze the base weights, insert low-rank updates, and compare on common benchmarks. The experimental claims are presented as consistent wins, which is the sort of result that would interest people shipping models on limited hardware. The soft spots are straightforward. The abstract and stress-test note both tie the SVD step directly to “full fine-tuning weight modifications,” yet the memory-efficiency promise is the main selling point. Without an explicit workaround or proxy computation described, the method cannot reduce peak memory during the primary adaptation; it merely shifts the cost. No tables, no error bars, no ablation on the 90-95% threshold, and no back-propagation details appear in the supplied text, so the performance numbers cannot be checked. The free parameter (energy threshold) also lacks sensitivity analysis. This is the kind of paper that belongs in a reading group focused on practical PEFT variants once the full manuscript and code are available. A serious editor should send it to review precisely so the authors can clarify how the subspace is obtained and supply the missing quantitative evidence. Until then the central claim remains unverified.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes Cumulative Energy-Retaining Subspace Adaptation (CERSA), a PEFT approach that first obtains weight modifications via full-parameter fine-tuning, applies SVD to retain the principal subspace capturing 90-95% of spectral energy, and then performs low-rank adaptation within that subspace. The central claims are that this closes the performance gap with full fine-tuning better than LoRA-style methods while using substantially less memory, with supporting evaluations across image recognition, text-to-image generation, and NLU tasks on models of varying scales.

Significance. If the SVD-derived subspace can be obtained without a full fine-tuning pass (or via a low-cost proxy whose fidelity is demonstrated), the method would offer a principled way to identify dominant adaptation directions and could narrow the gap between PEFT and full fine-tuning. The energy-thresholding idea is a clear, falsifiable design choice that distinguishes it from purely heuristic low-rank updates. However, the memory-efficiency claim as currently framed appears to rest on an unaddressed dependency that limits immediate practical significance.

major comments (3)

[Abstract] Abstract: The memory-efficiency claim ('substantially lower memory requirements') is load-bearing yet directly contradicted by the stated procedure. The principal subspace is derived from SVD of 'full fine-tuning weight modifications'; executing full fine-tuning to produce those modifications requires storing activations and gradients for all parameters, incurring the peak memory cost that CERSA is advertised to avoid. No proxy computation, pre-training on a related task, or one-shot approximation is described that would allow the subspace to be obtained without this cost.
[Abstract] Abstract and method description: The paper asserts that low-rank fine-tuning in the retained subspace recovers performance without loss, but provides no derivation or bound showing that the 90-95% energy threshold preserves the necessary rank characteristics of the full update. The reader's weakest assumption (that the truncated subspace is sufficient) therefore remains untested in the provided text; an ablation varying the threshold and reporting the resulting performance-memory trade-off is required to support the central claim.
[Abstract] Abstract: No quantitative tables, specific metrics (e.g., accuracy deltas, memory in GB, parameter counts), error bars, or baseline comparisons appear in the abstract despite the strong empirical claims ('consistently outperforms state-of-the-art PEFT methods'). The full manuscript must include these to allow verification of the outperformance and memory results.

minor comments (2)

[Abstract] Abstract: Typo in 'To addres these limitations' (should be 'address').
[Abstract] Abstract: The phrase 'low-rank representations derived from this principal subspace' is ambiguous; clarify whether the low-rank factors are initialized from the SVD singular vectors or learned from scratch within the projected subspace.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below, providing clarifications where needed and committing to revisions that strengthen the presentation of our method and results.

read point-by-point responses

Referee: [Abstract] Abstract: The memory-efficiency claim ('substantially lower memory requirements') is load-bearing yet directly contradicted by the stated procedure. The principal subspace is derived from SVD of 'full fine-tuning weight modifications'; executing full fine-tuning to produce those modifications requires storing activations and gradients for all parameters, incurring the peak memory cost that CERSA is advertised to avoid. No proxy computation, pre-training on a related task, or one-shot approximation is described that would allow the subspace to be obtained without this cost.

Authors: We acknowledge the validity of this observation. The current abstract and method description do not sufficiently distinguish the one-time cost of the initial full fine-tuning pass (used solely to derive the SVD-based subspace) from the memory-efficient low-rank adaptation performed thereafter within the retained subspace. This initial pass is intended as a preprocessing step to identify dominant adaptation directions, after which CERSA operates with reduced memory. However, without an explicit low-cost proxy or reuse mechanism described, the practical memory savings are limited to the adaptation phase. In the revised manuscript, we will clarify this distinction in both the abstract and method sections, discuss potential reuse of the subspace across related tasks, and note the dependency as a current limitation while outlining directions for low-cost approximations. revision: yes
Referee: [Abstract] Abstract and method description: The paper asserts that low-rank fine-tuning in the retained subspace recovers performance without loss, but provides no derivation or bound showing that the 90-95% energy threshold preserves the necessary rank characteristics of the full update. The reader's weakest assumption (that the truncated subspace is sufficient) therefore remains untested in the provided text; an ablation varying the threshold and reporting the resulting performance-memory trade-off is required to support the central claim.

Authors: The 90-95% cumulative energy threshold is selected based on the rapid decay of singular values in the weight update matrices, which empirically concentrates the essential adaptation information in the leading components. While the manuscript does not include a formal theoretical derivation or bound, the multi-task empirical evaluations support that this range recovers performance close to full fine-tuning. To directly address the request for testing, we will add a dedicated ablation study in the revised version. This will vary the energy retention threshold across a range (e.g., 80%, 85%, 90%, 95%, 99%) and report the resulting performance metrics alongside memory usage to demonstrate the trade-off and confirm the sufficiency of the chosen thresholds. revision: yes
Referee: [Abstract] Abstract: No quantitative tables, specific metrics (e.g., accuracy deltas, memory in GB, parameter counts), error bars, or baseline comparisons appear in the abstract despite the strong empirical claims ('consistently outperforms state-of-the-art PEFT methods'). The full manuscript must include these to allow verification of the outperformance and memory results.

Authors: We agree that the abstract would be strengthened by including concrete quantitative highlights to support the empirical claims. In the revised manuscript, we will update the abstract to incorporate key results, such as specific performance deltas over baselines like LoRA, memory consumption figures in GB, parameter efficiency metrics, and brief baseline comparisons, while preserving the abstract's length and readability. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in the derivation chain

full rationale

The paper introduces CERSA as an algorithmic procedure that applies SVD to retain principal components capturing 90-95% spectral energy from weight modifications, followed by low-rank adaptation within the resulting subspace. No equations or steps reduce the claimed performance gains or memory savings to a fitted quantity by construction, nor does any central premise collapse into a self-citation, self-definition, or renamed empirical pattern. Evaluations consist of independent empirical comparisons on image, text-to-image, and NLU tasks rather than tautological derivations. The method remains self-contained against external benchmarks with no load-bearing self-referential elements.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach rests on standard linear-algebra facts about SVD and the empirical claim that a fixed energy threshold suffices; the 90-95% cutoff functions as a tunable hyperparameter whose justification is not derived from first principles.

free parameters (1)

energy retention threshold = 90% to 95%
Percentage of spectral energy (90-95%) used to select principal components; directly controls subspace dimensionality and is presented as a design choice rather than derived.

axioms (1)

standard math Singular value decomposition decomposes any matrix into orthogonal principal components ordered by energy contribution
Invoked to identify the subspace that retains most of the weight-update information.

pith-pipeline@v0.9.0 · 5541 in / 1224 out tokens · 63676 ms · 2026-05-12T01:07:50.202310+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

60 extracted references · 60 canonical work pages · 1 internal anchor

[1]

Time-Varying

Zhuang, Zhan and Zhang, Yulong and Wang, Xuehao and Lu, Jiangang and Wei, Ying and Zhang, Yu , booktitle=NIPS, pages=. Time-Varying

work page
[2]

Psychometrika , volume=

The approximation of one matrix by another of lower rank , author=. Psychometrika , volume=. 1936 , publisher=

work page 1936
[3]

Tian, Chunlin and Shi, Zhan and Guo, Zhijiang and Li, Li and Xu, Chengzhong , booktitle=NIPS, pages=. Hydra

work page
[4]

Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning , author=

work page
[5]

Liu, Shih-Yang and Wang, Chien-Yi and Yin, Hongxu and Molchanov, Pavlo and Wang, Yu-Chiang Frank and Cheng, Kwang-Ting and Chen, Min-Hung , booktitle=ICML, pages=

work page
[6]

Wang, Runqian and Ghosh, Soumya and Cox, David and Antognini, Diego and Oliva, Aude and Feris, Rogerio and Karlinsky, Leonid , booktitle=NIPS, pages=. Trans-

work page
[7]

Edward J Hu and Yelong Shen and Phillip Wallis and Zeyuan Allen-Zhu and Yuanzhi Li and Shean Wang and Lu Wang and Weizhu Chen , booktitle=ICLR, year=. Lo

work page
[8]

Sun, Chengwei and Wei, Jiwei and Wu, Yujia and Shi, Yiming and He, Shiyuan and Ma, Zeyu and Xie, Ning and Yang, Yang , journal=

work page
[9]

Meng, Fanxu and Wang, Zhaohui and Zhang, Muhan , booktitle=NIPS, pages=

work page
[10]

Lingam, Vijay Chandra and Neerkaje, Atula and Vavre, Aditya and Shetty, Aneesh and Gudur, Gautham Krishna and Ghosh, Joydeep and Choi, Eunsol and Dimakis, Alex and Bojchevski, Aleksandar and Sanghavi, Sujay , booktitle=NIPS, pages=

work page
[11]

Zi, Bojia and Qi, Xianbiao and Wang, Lingzhi and Wang, Jianan and Wong, Kam-Fai and Zhang, Lei , journal=. Delta-

work page
[12]

Kopiczko, Dawid J and Blankevoort, Tijmen and Asano, Yuki M , booktitle=ICLR, year=

work page
[13]

Gu, Yuxian and Han, Xu and Liu, Zhiyuan and Huang, Minlie , booktitle=ACL, pages=

work page
[14]

arXiv preprint arXiv:2402.17263 , year=

Mini-Ensemble Low-Rank Adapters for Parameter-Efficient Fine-Tuning , author=. arXiv preprint arXiv:2402.17263 , year=

work page arXiv
[15]

Ruiz, Nataniel and Li, Yuanzhen and Jampani, Varun and Pritch, Yael and Rubinstein, Michael and Aberman, Kfir , booktitle=CVPR, pages=

work page
[16]

Multi-Concept Customization of Text-to-Image Diffusion , author=

work page
[17]

Valipour, Mojtaba and Rezagholizadeh, Mehdi and Kobyzev, Ivan and Ghodsi, Ali , booktitle=

work page
[18]

Zhang, Longteng and Zhang, Lin and Shi, Shaohuai and Chu, Xiaowen and Li, Bo , journal=

work page
[19]

Learning multiple visual domains with residual adapters , author=

work page
[20]

Li, Xiang Lisa and Liang, Percy , booktitle=. Prefix-

work page
[21]

The Power of Scale for Parameter-Efficient Prompt Tuning , author=

work page
[22]

Zhao, Jiawei and Zhang, Zhenyu and Chen, Beidi and Wang, Zhangyang and Anandkumar, Anima and Tian, Yuandong , booktitle=ICML, pages=

work page
[23]

Fine-tuning Happens in Tiny Subspaces: Exploring Intrinsic Task-specific Subspaces of Pre-trained Language Models , author=

work page
[24]

Hameed, Marawan Gamal Abdel and Milios, Aristides and Reddy, Siva and Rabusseau, Guillaume , journal=

work page
[25]

2002 , publisher=

Principal component analysis for special types of data , author=. 2002 , publisher=

work page 2002
[26]

Deng, Jia and Dong, Wei and Socher, Richard and Li, Li-Jia and Li, Kai and Fei-Fei, Li , booktitle=CVPR, pages=

work page
[27]

Microsoft

Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll. Microsoft

work page
[28]

Master’s thesis, University of Tront , year=

Learning multiple layers of features from tiny images , author=. Master’s thesis, University of Tront , year=

work page
[29]

Cats and dogs , author=

work page
[30]

Describing textures in the wild , author=

work page
[31]

2019 , publisher=

Helber, Patrick and Bischke, Benjamin and Dengel, Andreas and Borth, Damian , journal=. 2019 , publisher=

work page 2019
[32]

Krause, Jonathan and Stark, Michael and Deng, Jia and Fei-Fei, Li , booktitle=

work page
[33]

Proceedings of the IEEE , volume=

Remote sensing image scene classification: Benchmark and state of the art , author=. Proceedings of the IEEE , volume=. 2017 , publisher=

work page 2017
[34]

Fine-Grained Visual Classification of Aircraft

Fine-grained visual classification of aircraft , author=. arXiv preprint arXiv:1306.5151 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[35]

An image is worth 16x16 words: Transformers for image recognition at scale , author=

work page
[36]

He, Pengcheng and Gao, Jianfeng and Chen, Weizhu , booktitle=ICLR, year=

work page
[37]

Paszke, Adam and Gross, Sam and Massa, Francisco and Lerer, Adam and Bradbury, James and Chanan, Gregory and Killeen, Trevor and Lin, Zeming and Gimelshein, Natalia and Antiga, Luca and others , booktitle=NIPS, pages=. Py

work page
[38]

Transformers: State-of-the-Art Natural Language Processing , author=

work page
[39]

Decoupled Weight Decay Regularization , author=

work page
[40]

High-resolution image synthesis with latent diffusion models , author=

work page
[41]

Shuttleworth, Reece and Andreas, Jacob and Torralba, Antonio and Sharma, Pratyusha , journal=

work page
[42]

Wang, Alex and Singh, Amanpreet and Michael, Julian and Hill, Felix and Levy, Omer and Bowman, Samuel R , booktitle=ICLR, year=

work page
[43]

Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , pages=

A broad-coverage challenge corpus for sentence understanding through inference , author=. Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , pages=

work page
[44]

Rajpurkar, Pranav and Zhang, Jian and Lopyrev, Konstantin and Liang, Percy , booktitle=EMNLP, pages=

work page
[45]

Neural Computation , volume=

Canonical Correlation Analysis: An Overview with Application to Learning Methods , author=. Neural Computation , volume=. 2004 , publisher=

work page 2004
[46]

A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , author=

work page
[47]

Fine-Tuning can Distort Pretrained Features and Underperform Out-of-Distribution , author=

work page
[48]

arXiv preprint arXiv:2405.17604 , year=

Ba. arXiv preprint arXiv:2405.17604 , year=

work page arXiv
[49]

Orthogonal Subspace Learning for Language Model Continual Learning , author=

work page
[50]

Yang, Yibo and Li, Xiaojie and Zhou, Zhongzhu and Song, Shuaiwen Leon and Wu, Jianlong and Nie, Liqiang and Ghanem, Bernard , booktitle=NIPS, pages=

work page
[51]

Wang, Hanqing and Li, Yixia and Wang, Shuo and Chen, Guanhua and Chen, Yun , journal=

work page
[52]

Azizi, Seyedarmin and Kundu, Souvik and Pedram, Massoud , booktitle=EMNLP, pages=

work page
[53]

Wang, Shaowen and Yu, Linxi and Li, Jian , booktitle=NIPS, pages=

work page
[54]

Wang, Zhengbo and Liang, Jian and He, Ran and Wang, Zilei and Tan, Tieniu , journal=

work page
[55]

arXiv preprint arXiv:2410.07170 , year=

One initialization to rule them all: Fine-tuning via explained variance adaptation , author=. arXiv preprint arXiv:2410.07170 , year=

work page arXiv
[56]

Han, Ligong and Li, Yinxiao and Zhang, Han and Milanfar, Peyman and Metaxas, Dimitris and Yang, Feng , booktitle=ICCV, pages=

work page
[57]

Jaiswal, Ajay and Yin, Lu and Zhang, Zhenyu and Liu, Shiwei and Zhao, Jiawei and Tian, Yuandong and Wang, Zhangyang , journal=. From

work page
[58]

Medical image computing and computer-assisted intervention--MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18 , pages=

U-net: Convolutional networks for biomedical image segmentation , author=. Medical image computing and computer-assisted intervention--MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18 , pages=. 2015 , organization=

work page 2015
[59]

International conference on machine learning , pages=

Learning transferable visual models from natural language supervision , author=. International conference on machine learning , pages=. 2021 , organization=

work page 2021
[60]

2022 , eprint=

CLIPScore: A Reference-free Evaluation Metric for Image Captioning , author=. 2022 , eprint=

work page 2022