Recognition: unknown
DSCA: Dynamic Subspace Concept Alignment for Lifelong VLM Editing
Pith reviewed 2026-05-10 17:35 UTC · model grok-4.3
The pith
Decomposing VLM representation spaces into orthogonal subspaces enables precise lifelong concept editing without interference.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
DSCA decomposes the joint vision-language representation space into a set of orthogonal semantic subspaces obtained through incremental clustering and PCA. Surgical edits are performed only in these transformed spaces, which structurally isolates concepts and prevents cross-interference. A multi-term loss maintains task fidelity, edit locality, and cross-modal alignment. With the base model frozen, this yields 98 percent single-edit success that remains above 95 percent after 1000 sequential edits while lowering hallucination by 3 to 5 percent and producing the best backward-transfer scores on continual instruction-tuning benchmarks.
What carries the argument
Dynamic Subspace Concept Alignment (DSCA), which decomposes representations into orthogonal subspaces via incremental clustering and PCA so that edits target isolated concept regions without affecting others.
If this is right
- Edits stay localized and do not degrade performance on unrelated concepts or tasks.
- The frozen base model retains cross-modal alignment across many sequential updates.
- Hallucination rates fall by 3 to 5 percent relative to prior editing approaches.
- Best-in-class backward transfer scores indicate strong retention on continual instruction-tuning benchmarks.
Where Pith is reading between the lines
- The same structural separation might extend to editing other multimodal architectures if clustering remains stable on different feature types.
- Independent subspaces could in principle support simultaneous edits to multiple concepts without added interference.
- This approach might reduce reliance on heavy regularization terms in other lifelong-learning settings.
- One could verify subspace quality by measuring residual correlations between subspaces after large numbers of edits.
Load-bearing premise
Incremental clustering and PCA on joint vision-language representations will create subspaces that isolate distinct concepts without meaningful information loss or leakage between subspaces.
What would settle it
A drop below 95 percent edit success or measurable interference with non-target concepts after 1000 sequential edits would show the subspaces fail to deliver the claimed isolation.
Figures
read the original abstract
Model editing aims to update knowledge to add new concepts and change relevant information without retraining. Lifelong editing is a challenging task, prone to disrupting previously learned concepts, especially for Vision Language Models (VLMs), because sequential edits can lead to degraded reasoning and cross modal misalignment. Existing VLM knowledge editing methods based on gated adapters, activation edits, and parameter merging techniques address catastrophic forgetting seen in full fine tuning; however, they still operate in the shared representation space of the VLM, where concepts are entangled, so edits interfere with other non relevant concepts. We hypothesize that this instability persists because current methods algorithmically control edits via optimization rather than structurally separating knowledge. We introduce Dynamic Subspace Concept Alignment (DSCA) which by design mitigates this limitation by decomposing the representation space into a set of orthogonal semantic subspaces and proposing edits only in those transformed spaces. These subspaces are obtained through incremental clustering and PCA on joint vision language representations. This process structurally isolates concepts, enabling precise, non interfering edits by turning isolation from a soft training objective into an architectural property. The surgical edits are guided by a multi term loss function for maintaining task fidelity, edit locality, and cross modal alignment. With the base model frozen, our method achieves 98 percent single edit success, remains over 95 percent after 1000 sequential edits, lowers hallucination by 3 to 5 percent, and achieves the best backward transfer (BWT) scores on continual instruction tuning benchmarks. Extensive experiments demonstrate DSCA state of the art stability and knowledge retention capability in continual lifelong editing across various datasets and benchmarks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes Dynamic Subspace Concept Alignment (DSCA) for lifelong editing of Vision-Language Models. It decomposes the joint vision-language representation space into a set of orthogonal semantic subspaces obtained via incremental clustering and PCA, then performs surgical edits only within the relevant transformed subspaces while keeping the base model frozen. A multi-term loss maintains task fidelity, edit locality, and cross-modal alignment. The central claim is that this structural separation (rather than optimization-based control) prevents interference and catastrophic forgetting, yielding 98% single-edit success, >95% success after 1000 sequential edits, 3-5% lower hallucination, and the best backward transfer (BWT) scores on continual instruction tuning benchmarks.
Significance. If the claimed subspace isolation can be shown to hold with negligible cross-subspace leakage, DSCA would offer a promising architectural route to stable lifelong VLM editing that sidesteps the entanglement problems of shared representation spaces. The reported retention of performance over 1000 edits and superior BWT would constitute a notable empirical advance over existing gated-adapter, activation-edit, and parameter-merging baselines.
major comments (2)
- [Abstract] Abstract: the assertion that incremental clustering plus PCA 'structurally isolates concepts' and converts isolation from a training objective into an architectural property is load-bearing for the non-interference claim, yet no quantitative verification (e.g., measured inter-subspace orthogonality, residual cross-subspace norms, or linear separability tests on held-out concepts) is referenced. PCA on per-cluster variance does not guarantee orthogonality across clusters or absence of leakage in the entangled VLM feature space, directly undermining the guarantee that edits remain confined even with the base model frozen.
- [Abstract] Abstract: the multi-term loss is described only at a high level ('maintaining task fidelity, edit locality, and cross-modal alignment') with no explicit formulation, weighting schedule, or ablation of individual terms. Without these details it is impossible to assess whether the reported stability after 1000 edits is attributable to the subspace mechanism or to the loss design, and whether the orthogonality assumption was ever stress-tested.
minor comments (1)
- [Abstract] Abstract: the phrase 'lowers hallucination by 3 to 5 percent' should specify the exact metric (e.g., hallucination rate on which benchmark) and the baseline against which the reduction is measured.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which highlight important aspects of how our claims are presented in the abstract. We have revised the manuscript to strengthen the substantiation of the subspace isolation mechanism and to provide clearer details on the loss function.
read point-by-point responses
-
Referee: [Abstract] Abstract: the assertion that incremental clustering plus PCA 'structurally isolates concepts' and converts isolation from a training objective into an architectural property is load-bearing for the non-interference claim, yet no quantitative verification (e.g., measured inter-subspace orthogonality, residual cross-subspace norms, or linear separability tests on held-out concepts) is referenced. PCA on per-cluster variance does not guarantee orthogonality across clusters or absence of leakage in the entangled VLM feature space, directly undermining the guarantee that edits remain confined even with the base model frozen.
Authors: We agree that the abstract would benefit from explicit quantitative verification to support the structural isolation claim. While the incremental clustering combined with per-cluster PCA is designed to produce separated subspaces (with intra-cluster orthogonality guaranteed by the PCA step itself), we acknowledge that cross-cluster leakage metrics were not quantified there. In the revised manuscript we will add these measurements—inter-subspace orthogonality, residual cross-subspace norms, and linear separability on held-out concepts—and reference them in the abstract to directly address the concern about potential leakage in the original VLM space. revision: yes
-
Referee: [Abstract] Abstract: the multi-term loss is described only at a high level ('maintaining task fidelity, edit locality, and cross-modal alignment') with no explicit formulation, weighting schedule, or ablation of individual terms. Without these details it is impossible to assess whether the reported stability after 1000 edits is attributable to the subspace mechanism or to the loss design, and whether the orthogonality assumption was ever stress-tested.
Authors: The referee correctly observes that the abstract summarizes the loss at a high level. The full manuscript contains the explicit multi-term loss formulation, weighting schedule, and corresponding ablations. To improve accessibility, we have expanded the abstract to briefly describe the loss terms and now include a direct reference to the detailed formulation and ablation studies in the main text. This revision clarifies the respective roles of the subspace architecture and the loss in achieving the reported stability. revision: yes
Circularity Check
No circularity: architectural method with empirical validation
full rationale
The paper defines DSCA via incremental clustering and PCA on joint representations to create orthogonal subspaces, then measures performance empirically (98% single-edit success, >95% after 1000 edits, improved BWT). No equations, derivations, or claims reduce by construction to fitted parameters or prior self-citations; isolation is presented as a design choice whose benefits are tested externally rather than assumed tautologically. The derivation chain is self-contained against benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Incremental clustering and PCA on joint vision-language representations produce subspaces that structurally isolate semantic concepts without significant cross-concept interference.
invented entities (1)
-
Dynamic Subspace Concept Alignment (DSCA) mechanism
no independent evidence
Reference graph
Works this paper leans on
-
[1]
PaliGemma: A versatile 3B VLM for transfer
Lucas Beyer, Andreas Steiner, Andr ´e Susano Pinto, Alexan- der Kolesnikov, Xiao Wang, Daniel Salz, Maxim Neumann, Ibrahim Alabdulmohsin, Michael Tschannen, Emanuele Bugliarello, Thomas Unterthiner, Daniel Keysers, Skanda Koppula, Fangyu Liu, Adam Grycner, Alexey Gritsenko, Neil Houlsby, Manoj Kumar, Keran Rong, Julian Eisensch- los, Rishabh Kabra, Matthi...
work page internal anchor Pith review arXiv 2024
-
[2]
Coin: A benchmark of con- tinual instruction tuning for multimodal large language mod- els, 2024
Cheng Chen, Junchen Zhu, Xu Luo, Heng Tao Shen, Jingkuan Song, and Lianli Gao. Coin: A benchmark of con- tinual instruction tuning for multimodal large language mod- els, 2024. arXiv:2403.08350v2 [cs.CV]. 5, 6, 7
- [4]
-
[5]
Attribution analysis meets model editing: Advancing knowledge correction in vision language models with visedit
Qizhou Chen, Taolin Zhang, Chengyu Wang, Xiaofeng He, Dakan Wang, and Tingting Liu. Attribution analysis meets model editing: Advancing knowledge correction in vision language models with visedit. InProceedings of the AAAI Conference on Artificial Intelligence, 2025. 3, 5, 6, 7
2025
-
[6]
Can we edit multimodal large language models?arXiv preprint arXiv:2310.08475, 2023
Siyuan Cheng, Bozhong Tian, Qingbin Liu, Xi Chen, Yongheng Wang, Huajun Chen, and Ningyu Zhang. Can we edit multimodal large language models?arXiv preprint arXiv:2310.08475, 2023. 6, 7
-
[7]
Cgil: Clip with generative latent replay: a strong baseline for incremental learning
Emanuele Frascaroli, Aniello Panariello, Pietro Buzzega, Lorenzo Bonicelli, Angelo Porrello, and Simone Calderara. Cgil: Clip with generative latent replay: a strong baseline for incremental learning. InProceedings of the British Machine Vision Conference (BMVC), 2024. 2
2024
-
[8]
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models
Chaoyou Fu, Peixian Chen, Yunhang Shen, Yulei Qin, Mengdan Zhang, Xu Lin, Jinrui Yang, Xiawu Zheng, Ke Li, Xing Sun, et al. Mme: A comprehensive evaluation bench- mark for multimodal large language models.arXiv preprint arXiv:2306.13394, 2023. 2, 6, 7
work page internal anchor Pith review arXiv 2023
-
[9]
En- hanced continual learning of vision-language models with model fusion, 2025
Haoyuan Gao, Zicong Zhang, Yuqi Wei, Linglan Zhao, Guilin Li, Yexin Li, Linghe Kong, and Weiran Huang. En- hanced continual learning of vision-language models with model fusion, 2025. Workshop paper at SCOPE, ICLR 2025. 2
2025
-
[10]
Making the V in VQA matter: Ele- vating the role of image understanding in Visual Question Answering
Yash Goyal, Tejas Khot, Douglas Summers-Stay, Dhruv Ba- tra, and Devi Parikh. Making the V in VQA matter: Ele- vating the role of image understanding in Visual Question Answering. InConference on Computer Vision and Pattern Recognition (CVPR), 2017. 2, 6, 7
2017
-
[11]
Lora: Low-rank adaptation of large language models, 2021
Edward Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. Lora: Low-rank adaptation of large language models, 2021. Ver- sion 2. 2
2021
-
[12]
Vlkeb: A large vision-language model knowledge editing benchmark, 2024
Han Huang, Haitian Zhong, Tao Yu, Qiang Liu, Shu Wu, Liang Wang, and Tieniu Tan. Vlkeb: A large vision-language model knowledge editing benchmark, 2024. 6, 7
2024
-
[13]
Clap4clip: Contin- ual learning with probabilistic finetuning for vision-language models
Saurav Jha, Dong Gong, and Lina Yao. Clap4clip: Contin- ual learning with probabilistic finetuning for vision-language models. InAdvances in Neural Information Processing Sys- tems, pages 1–35. NeurIPS, 2024. 38th Conference on Neu- ral Information Processing Systems (NeurIPS 2024). 2
2024
-
[14]
Learning to edit: Aligning llms with knowledge editing, 2024
Yuxin Jiang, Yufei Wang, Chuhan Wu, Wanjun Zhong, Xing- shan Zeng, Jiahui Gao, Liangyou Li, Xin Jiang, Lifeng Shang, Ruiming Tang, Qun Liu, and Wei Wang. Learning to edit: Aligning llms with knowledge editing, 2024. 3, 5, 6, 7
2024
-
[15]
Lawrence Zitnick
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll´ar, and C. Lawrence Zitnick. Microsoft coco: Common objects in context. In Computer Vision – ECCV 2014, pages 740–755, Cham,
2014
-
[16]
Springer International Publishing. 6, 7
-
[17]
Improved baselines with visual instruction tuning, 2023
Haotian Liu, Chunyuan Li, Yuheng Li, and Yong Jae Lee. Improved baselines with visual instruction tuning, 2023. 5, 6, 7
2023
-
[18]
Unlocking efficient, scalable, and con- tinual knowledge editing with basis-level representation fine- tuning
Tianci Liu, Ruirui Li, Yunzhe Qi, Hui Liu, Xianfeng Tang, Tianqi Zheng, Qingyu Yin, Monica Cheng, Jun Huan, Haoyu Wang, and Jing Gao. Unlocking efficient, scalable, and con- tinual knowledge editing with basis-level representation fine- tuning. InInternational Conference on Learning Represen- tations (ICLR), 2025. 2, 3
2025
-
[19]
Yuyang Liu, Qiuhe Hong, Linlan Huang, Alexandra Gomez- Villa, Dipam Goswami, Xialei Liu, Joost van de Wei- jer, and Yonghong Tian. Continual learning for vlms: A survey and taxonomy beyond forgetting.arXiv preprint arXiv:2508.04227, 2025. 2
-
[20]
Re-imagining multimodal instruction tuning: A representation view
Yiyang Liu, James Liang, Ruixiang Tang, Yugyung Lee, Ma- jid Rabbani, Sohail Dianat, Raghuveer Rao, Lifu Huang, Dongfang Liu, Qifan Wang, and Cheng Han. Re-imagining multimodal instruction tuning: A representation view. In 13th International Conference on Learning Representations, ICLR 2025. International Conference on Learning Represen- tations, ICLR, 2025. 7
2025
-
[21]
C-clip: Contrastive learning improves knowledge editing in large vision-language models
Ziyang Liu, Yichen Wu, Zhiyi Shi, Binjie Wang, Junsik Kim, and Hanspeter Pfister. C-clip: Contrastive learning improves knowledge editing in large vision-language models. InIn- ternational Conference on Learning Representations (ICLR),
-
[22]
Gradient episodic memory for continual learning
David Lopez-Paz and Marc’Aurelio Ranzato. Gradient episodic memory for continual learning. InProceedings of the 31st International Conference on Neural Information Processing Systems, page 6470–6479, Red Hook, NY , USA,
-
[23]
Curran Associates Inc. 6
-
[24]
Adaptive rank, reduced for- getting: Knowledge retention in continual learning vision- language models with dynamic rank-selective lora, 2024
Haodong Lu, Chongyang Zhao, Jason Xue, Lina Yao, Kris- ten Moore, and Dong Gong. Adaptive rank, reduced for- getting: Knowledge retention in continual learning vision- language models with dynamic rank-selective lora, 2024. Version 6, last revised 8 Oct 2025. 2
2024
-
[25]
Magmax: Leveraging model merging for seamless continual learning
Daniel Marczak, Bartłomiej Twardowski, Tomasz Trzci´nski, and Sebastian Cygert. Magmax: Leveraging model merging for seamless continual learning. 2024. 6, 7
2024
-
[26]
Locating and Editing Factual Associations in GPT, January 2023
Kevin Meng, David Bau, Alex Andonian, and Yonatan Be- linkov. Locating and editing factual associations in GPT.Ad- vances in Neural Information Processing Systems, 36, 2022. arXiv:2202.05262. 2
-
[27]
Mass editing memory in a trans- former.The Eleventh International Conference on Learning Representations (ICLR), 2023
Kevin Meng, Arnab Sen Sharma, Alex Andonian, Yonatan Belinkov, and David Bau. Mass editing memory in a trans- former.The Eleventh International Conference on Learning Representations (ICLR), 2023. 2
2023
-
[28]
Fast model editing at scale
Eric Mitchell, Charles Lin, Antoine Bosselut, Chelsea Finn, and Christopher D Manning. Fast model editing at scale. InInternational Conference on Learning Representations,
-
[29]
Eric Mitchell, Charles Lin, Antoine Bosselut, Chelsea Finn, and Christopher D. Manning. Memory-based model editing at scale. InInternational Conference on Machine Learning,
-
[30]
Continual vision-language representation learning with off-diagonal information
Zixuan Ni, Longhui Wei, Siliang Tang, Yueting Zhuang, and Qi Tian. Continual vision-language representation learning with off-diagonal information. InProceedings of the 40th In- ternational Conference on Machine Learning, pages 26129– 26149. PMLR, 2023. 2
2023
-
[31]
Object hallucination in image cap- tioning
Anna Rohrbach, Lisa Anne Hendricks, Kaylee Burns, Trevor Darrell, and Kate Saenko. Object hallucination in image cap- tioning. InProceedings of the 2018 Conference on Empiri- cal Methods in Natural Language Processing, pages 4035– 4045, Brussels, Belgium, 2018. Association for Computa- tional Linguistics. 6, 7
2018
-
[32]
Youxu Shi, Suorong Yang, and Dong Liu. Exposing halluci- nations to suppress them: Vlms representation editing with generative anchors, 2025. arXiv:2509.21997 [cs.CV]. 7
-
[33]
Dualedit: Dual editing for knowledge updating in vision-language models
Zhiyi Shi, Binjie Wang, Chongjie Si, Yichen Wu, Junsik Kim, and Hanspeter Pfister. Dualedit: Dual editing for knowledge updating in vision-language models. InPro- ceedings of the Conference on Language Modeling (COLM),
-
[34]
Towards vqa models that can read
Amanpreet Singh, Vivek Natarjan, Meet Shah, Yu Jiang, Xinlei Chen, Dhruv Batra, Devi Parikh, and Marcus Rohrbach. Towards vqa models that can read. InProceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 8317–8326, 2019. 6, 7
2019
-
[35]
Continual learning in vision-language models via aligned model merging, 2025
Ghada Sokar. Continual learning in vision-language models via aligned model merging, 2025. 2, 5, 6, 7
2025
-
[36]
Repre- sentation learning with contrastive predictive coding, 2018
Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Repre- sentation learning with contrastive predictive coding, 2018. 4, 3
2018
-
[37]
Lawrence Zitnick, and Devi Parikh
Ramakrishna Vedantam, C. Lawrence Zitnick, and Devi Parikh. CIDEr: Consensus-based image description evalu- ation. InIEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4566–4575, 2015. 6, 7
2015
-
[38]
Manning, and Christo- pher Potts
Zhengxuan Wu, Aryaman Arora, Zheng Wang, Atticus Geiger, Dan Jurafsky, Christopher D. Manning, and Christo- pher Potts. Reft: Representation finetuning for language models, 2024. 3
2024
-
[39]
Generative neg- ative text replay for continual vision-language pretraining
Shipeng Yan, Lanqing Hong, Hang Xu, Jianhua Han, Tinne Tuytelaars, Zhenguo Li, and Xuming He. Generative neg- ative text replay for continual vision-language pretraining. InComputer Vision – ECCV 2022, pages 22–38. Springer,
2022
-
[40]
Boosting continual learning of vision-language models via mixture-of-experts adapters
Jiazuo Yu, Yunzhi Zhuge, Lu Zhang, Ping Hu, Dong Wang, Huchuan Lu, and You He. Boosting continual learning of vision-language models via mixture-of-experts adapters. arXiv preprint arXiv:2403.11549, 2024. 2
-
[41]
Mm-vet: Evaluating large multimodal models for integrated capabilities, 2023
Weihao Yu, Zhengyuan Yang, Linjie Li, Jianfeng Wang, Kevin Lin, Zicheng Liu, Xinchao Wang, and Lijuan Wang. Mm-vet: Evaluating large multimodal models for integrated capabilities, 2023. 6, 7
2023
-
[42]
Select and distill: Selective dual-teacher knowledge transfer for continual learning on vision-language models
Yu-Chu Yu, Chi-Pin Huang, Jr-Jen Chen, Kai-Po Chang, Yung-Hsuan Lai, Fu-En Yang, and Yu-Chiang Frank Wang. Select and distill: Selective dual-teacher knowledge transfer for continual learning on vision-language models. InEuro- pean Conference on Computer Vision (ECCV), 2024. 2
2024
-
[43]
Vqacl: A novel visual question answering continual learning set- ting
Xi Zhang, Feifei Zhang, and Changsheng Xu. Vqacl: A novel visual question answering continual learning set- ting. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), pages 19102– 19112, 2023. 2
2023
-
[44]
Preventing zero-shot transfer degradation in continual learning of vision-language models
Zangwei Zheng, Mingyuan Ma, Kai Wang, Ziheng Qin, Xi- angyu Yue, and Yang You. Preventing zero-shot transfer degradation in continual learning of vision-language models. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 19125–19136, 2023. 2 DSCA: Dynamic Subspace Concept Alignment for Lifelong VLM Editing Supplementar...
2023
-
[45]
Contents • Theoretical Analysis of Non-Interference in DSCA • Additional Methodology Details • Evaluation Metrics • Implementation Details and Hyperparameters • Extended Experimental Results
-
[46]
Preliminaries Let the frozen VLM encoder produce fused representations hf ∈R df (as defined in Sec
Theoretical Analysis of Non-Interference in DSCA 8.1. Preliminaries Let the frozen VLM encoder produce fused representations hf ∈R df (as defined in Sec. 3.1). For each discovered conceptC k, DSCA maintains a low-dimensional semantic subspace with basis matrixR k ∈R rk×df , wherer k ≪ df .(Sec. 3.3) We view the rows ofR k as an orthonormal basis for the c...
-
[47]
edit” samples (De) and “out-of-scope
Additional Methodology Details 9.1. Gating Implementation Details As discussed in Sec. 3.3, the component-wise gating vector γk(hf)∈[0,1] df is implemented via a lightweight neural layer γk(hf) =σ(W g,khf +b g,k), whereσis the element-wise sigmoid. To avoid quadratic parameter growth ind f , we factorizeW g,k as a low-rank bottleneck: Wg,k =U kVk, withU k...
-
[48]
4.2 of the main paper
Evaluation Metrics In this section, we provide the formal definitions of all eval- uation metrics referenced in Sec. 4.2 of the main paper. Let fθ0 denote the original (unedited) model andf θt the model aftertsequential edits. Each edit request is represented as a tuplee= (v, p, o), consisting of a visual inputv, textual promptp, and desired target output...
-
[49]
All experiments were conducted using the PyTorch frame- work on8×NVIDIA A100 (80GB) GPUs with mixed- precision training
Implementation Details and Hyperparam- eters In this section, we detail the experimental setup and hyper- parameter configurations used to train and evaluate DSCA. All experiments were conducted using the PyTorch frame- work on8×NVIDIA A100 (80GB) GPUs with mixed- precision training. Backbone Models.We apply DSCA to two distinct vision- language architect...
-
[50]
Extended Experimental Results We provide expanded comparisons against a wider range of baselines in Tables 8, 9, and 10. 12.1. Expanded Single-Edit Performance Table 8 provides a comprehensive single edit success com- parison on the E-VQA and E-IC benchmarks. All baseline numbers, including standard fine-tuning variants and retrieval-based methods, are so...
2023
-
[51]
are taken directly from the Sequential Editing bench- marks reported inLiveEdit [3]. 12.3. Expanded Continual Learning on CoIN Table 10 reports results on the CoIN benchmark using the PaliGemma-3B backbone[1].All baseline numbers are sourced fromPAM [32]. To establish performance bounds, we include three foundational setups defined in their work: •Zero-sh...
2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.