Beyond Binary Edits Robust Multimodal Knowledge Editing with Adversarial Subspace Alignment

Chaochao Chen; Haoyuan Wang; Jiajie Su; Jianmao Xiao; Xiaohao Liu

arxiv: 2605.23780 · v1 · pith:KFRBOJGLnew · submitted 2026-05-22 · 💻 cs.AI

Beyond Binary Edits Robust Multimodal Knowledge Editing with Adversarial Subspace Alignment

Haoyuan Wang , Xiaohao Liu , Jiajie Su , Jianmao Xiao , Chaochao Chen This is my paper

Pith reviewed 2026-05-25 03:57 UTC · model grok-4.3

classification 💻 cs.AI

keywords multimodal knowledge editingadversarial robustificationsubspace alignmentknowledge unitsgeneralizationMLLMsLARRCSL

0 comments

The pith

Adversarial subspace alignment makes multimodal knowledge edits generalize across equivalent inputs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Multimodal large language models often update knowledge reliably for a given sample yet fail when the same fact appears in varied visual or linguistic forms. The paper formalizes this as a lack of generality over knowledge units, which are groups of semantically equivalent multimodal inputs. It introduces Latent Adversarial Robustification to create challenging yet coherent variants in the joint latent space and Rank-Constrained Subspace Learning to enforce low-rank alignment of those variants at the edit layer. The combined approach, called ASAM, is presented as a way to strengthen fragile semantic regions without explicit supervision on every variant. If successful, edits would propagate consistently inside each knowledge unit while preserving reliability and locality.

Core claim

Robust intrinsic multimodal knowledge editing is achieved by defining generality as consistent predictions within knowledge units and using Latent Adversarial Robustification (LAR) to generate adversarial yet semantically coherent variants together with Rank-Constrained Subspace Learning (RCSL) to enforce low-rank alignment of adversarial representations via a singular-value objective.

What carries the argument

Latent Adversarial Robustification (LAR) combined with Rank-Constrained Subspace Learning (RCSL), which together perform adversarial subspace alignment to expose and correct fragile regions in the joint latent space.

If this is right

Edits apply consistently to all members of a knowledge unit rather than anchoring to single samples.
Generalization improves while reliability and locality metrics remain intact.
Biased anchoring in high-dimensional multimodal spaces is reduced through explicit adversarial exposure.
The method supplies a concrete mechanism for adding semantic supervision without enumerating every variant.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same adversarial generation step could be tested on unimodal language models to check whether the robustness gain transfers.
If the low-rank constraint proves too restrictive on larger models, relaxing the rank bound while keeping the singular-value objective might be worth measuring.
Deployment on production MLLMs would require checking whether the added latent-space operations increase inference latency outside the editing phase.

Load-bearing premise

Generating adversarial yet semantically coherent variants in the joint latent space will expose the fragile semantic regions that limit generality.

What would settle it

A controlled test in which ASAM produces no measurable gain in consistency across held-out semantically equivalent multimodal inputs relative to prior intrinsic editing baselines.

Figures

Figures reproduced from arXiv: 2605.23780 by Chaochao Chen, Haoyuan Wang, Jiajie Su, Jianmao Xiao, Xiaohao Liu.

**Figure 2.** Figure 2: t-SNE visualization of representations under varying perturbations [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗

**Figure 4.** Figure 4: Case study on generalization under input perturbations. [PITH_FULL_IMAGE:figures/full_fig_p017_4.png] view at source ↗

read the original abstract

Multimodal large language models (MLLMs) need efficient mechanisms to update knowledge without degrading existing capabilities. While intrinsic multimodal knowledge editing achieves strong reliability and locality, it often exhibits limited generality, failing to propagate edits across semantically equivalent visual and linguistic variations. This issue arises from the lack of explicit semantic supervision, rigid editing scopes, and biased anchoring to individual samples in high-dimensional multimodal spaces. We address robust intrinsic multimodal knowledge editing by explicitly targeting generalization. We formalize robustness through knowledge units that group semantically equivalent multimodal inputs and define generality as consistent predictions within each unit. To expose fragile semantic regions, we introduce Latent Adversarial Robustification (LAR), which generates adversarial yet semantically coherent variants in the joint latent space. We further propose Rank-Constrained Subspace Learning (RCSL), enforcing low-rank alignment of adversarial representations at the edit layer via a singular value-based objective. Extensive analysis demonstrates the effectiveness of ASAM empirically.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper formalizes generality via knowledge units and adds LAR plus RCSL for latent adversarial robustness in multimodal editing, but the abstract supplies no mechanism or evidence for the required semantic coherence.

read the letter

The main takeaway is that this work targets the generality gap in intrinsic multimodal knowledge editing by grouping inputs into knowledge units and using two new pieces: Latent Adversarial Robustification to create variants in joint latent space, and Rank-Constrained Subspace Learning to align them at the edit layer with a singular-value objective. That framing and those two named techniques are what is actually new here compared with prior editing methods that stay sample-specific or lack explicit robustness terms. The paper does a clear job laying out the practical problem—edits that fail to hold across visual and linguistic paraphrases—and the motivation for moving beyond binary edit success. The formalization of generality as consistent behavior inside each unit is straightforward and useful for thinking about the issue. The stress-test concern lands: the abstract states that LAR produces adversarial yet semantically coherent variants but gives no loss term, constraint, or validation step to enforce or check that coherence. Without it, the variants could simply change meaning, which would break the link to the claimed robustness. The abstract also asserts extensive empirical analysis without any numbers, baselines, or ablation details, so soundness cannot be judged from the given text. The methods appear independent of prior results, with no obvious circularity. This is for researchers focused on making knowledge edits in MLLMs reliable under real variation rather than for general LLM editing audiences. A reader working on robustness would find the unit-based view and the subspace idea worth examining if the full paper supplies the missing implementation and results. I would send it to peer review to get those details checked rather than desk reject, since the problem is real and the proposed direction is specific enough to evaluate once the experiments are visible.

Referee Report

2 major / 0 minor

Summary. The paper claims that intrinsic multimodal knowledge editing in MLLMs suffers from limited generality due to lack of semantic supervision and biased sample anchoring; it addresses this by formalizing generality via 'knowledge units' of semantically equivalent inputs, introducing Latent Adversarial Robustification (LAR) to generate adversarial yet semantically coherent variants in the joint latent space, and Rank-Constrained Subspace Learning (RCSL) to enforce low-rank alignment of adversarial representations at the edit layer via a singular-value objective, with empirical effectiveness shown through extensive analysis.

Significance. If the central construction holds, the work would address a recognized limitation in current intrinsic editing methods by explicitly targeting cross-variant consistency rather than single-sample reliability. The formalization of generality through knowledge units and the use of latent-space adversarial generation plus rank-constrained alignment represent a coherent extension of prior editing frameworks; successful validation would strengthen the case for subspace-based robustness techniques in multimodal settings.

major comments (2)

[Abstract] Abstract: the claim that LAR 'generates adversarial yet semantically coherent variants in the joint latent space' is load-bearing for the generality definition, yet the abstract (and by extension the method description) provides no loss term, constraint, or validation procedure that enforces or measures semantic equivalence of the generated variants; without this, the knowledge-unit consistency test cannot be guaranteed to probe the intended property rather than semantic drift.
[Abstract] Abstract / method overview: the RCSL objective is described only at the level of 'a singular value-based objective' for low-rank alignment; no equation is supplied showing how the rank constraint interacts with the edit-layer update or how it interacts with the LAR-generated variants, leaving the central claim that this produces robust generalization without explicit derivation or pseudocode.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The two major comments both concern insufficient detail in the abstract and method overview. We agree these points require clarification and will revise the manuscript to provide the requested explicit formulations, constraints, and interactions.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that LAR 'generates adversarial yet semantically coherent variants in the joint latent space' is load-bearing for the generality definition, yet the abstract (and by extension the method description) provides no loss term, constraint, or validation procedure that enforces or measures semantic equivalence of the generated variants; without this, the knowledge-unit consistency test cannot be guaranteed to probe the intended property rather than semantic drift.

Authors: We acknowledge that the current abstract is too high-level and does not convey the semantic-coherence mechanism. In the revised manuscript we will (1) update the abstract to note that semantic equivalence is enforced via an embedding-similarity constraint within the latent perturbation process, (2) add the explicit loss term and constraint to the LAR subsection, and (3) include a quantitative validation (cosine-similarity thresholds plus human-rated semantic drift scores) that directly supports the knowledge-unit consistency evaluation. These additions will make the link between variant generation and the generality metric explicit. revision: yes
Referee: [Abstract] Abstract / method overview: the RCSL objective is described only at the level of 'a singular value-based objective' for low-rank alignment; no equation is supplied showing how the rank constraint interacts with the edit-layer update or how it interacts with the LAR-generated variants, leaving the central claim that this produces robust generalization without explicit derivation or pseudocode.

Authors: We agree the description is insufficiently precise. The revision will (1) replace the high-level phrase with the full RCSL objective equation, (2) derive how the singular-value penalty is applied to the edit-layer weight update in the presence of LAR variants, and (3) add pseudocode (or an algorithmic box) that shows the end-to-end interaction between LAR generation and the RCSL-constrained update. This will supply the missing derivation and clarify the source of robust generalization. revision: yes

Circularity Check

0 steps flagged

No significant circularity; methods presented as independent contributions

full rationale

The paper introduces LAR to generate adversarial variants and RCSL for low-rank alignment as new mechanisms to target generality in multimodal editing. It formalizes knowledge units and generality as consistent predictions within units, then defines the methods to address fragile regions. No equations or steps reduce by construction to fitted inputs, self-citations, or renamed priors; the abstract and described approach treat the formalization and algorithms as novel with external empirical validation. This matches the default case of a self-contained derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

Abstract-only review provides high-level overview of new methods but no specific free parameters or additional axioms are detailed.

axioms (1)

domain assumption Multimodal inputs that are semantically equivalent can be grouped into knowledge units where consistent predictions are desired.
This is used to define generality in the abstract.

invented entities (2)

Latent Adversarial Robustification (LAR) no independent evidence
purpose: Generates adversarial yet semantically coherent variants in the joint latent space to expose fragile regions.
Introduced as a new technique in the paper.
Rank-Constrained Subspace Learning (RCSL) no independent evidence
purpose: Enforces low-rank alignment of adversarial representations via singular value-based objective.
New method proposed for the alignment.

pith-pipeline@v0.9.0 · 5701 in / 1339 out tokens · 35020 ms · 2026-05-25T03:57:19.924106+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages · 8 internal anchors

[1]

Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, et al. Qwen2. 5-vl technical report.arXiv preprint arXiv:2502.13923, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[2]

De- coding by contrasting knowledge: Enhancing llms’ confidence on edited facts.arXiv preprint arXiv:2405.11613, 2024

Baolong Bi, Shenghua Liu, Lingrui Mei, Yiwei Wang, Pengliang Ji, and Xueqi Cheng. De- coding by contrasting knowledge: Enhancing llms’ confidence on edited facts.arXiv preprint arXiv:2405.11613, 2024

work page arXiv 2024
[3]

Can we edit multimodal large language models?arXiv preprint arXiv:2310.08475, 2023

Siyuan Cheng, Bozhong Tian, Qingbin Liu, Xi Chen, Yongheng Wang, Huajun Chen, and Ningyu Zhang. Can we edit multimodal large language models?arXiv preprint arXiv:2310.08475, 2023

work page arXiv 2023
[4]

Gramian multimodal representation learning and alignment, 2025

Giordano Cicchetti, Eleonora Grassucci, Luigi Sigillo, and Danilo Comminiello. Gramian multimodal representation learning and alignment, 2025. URL https://arxiv.org/abs/ 2412.11959

work page arXiv 2025
[5]

Evaluating the ripple effects of knowledge editing in language models, 2023

Roi Cohen, Eden Biran, Ori Yoran, Amir Globerson, and Mor Geva. Evaluating the ripple effects of knowledge editing in language models, 2023. URL https://arxiv.org/abs/ 2307.12976

work page arXiv 2023
[6]

Instructblip: Towards general-purpose vision-language models with instruction tuning.Advances in neural information processing systems, 36:49250–49267, 2023

Wenliang Dai, Junnan Li, Dongxu Li, Anthony Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale N Fung, and Steven Hoi. Instructblip: Towards general-purpose vision-language models with instruction tuning.Advances in neural information processing systems, 36:49250–49267, 2023

work page 2023
[7]

Editing factual knowledge in language models

Nicola De Cao, Wilker Aziz, and Ivan Titov. Editing factual knowledge in language models. arXiv preprint arXiv:2104.08164, 2021

work page arXiv 2021
[8]

Everything is editable: Extend knowledge editing to unstructured data in large language models,

Jingcheng Deng, Zihao Wei, Liang Pang, Hanxing Ding, Huawei Shen, and Xueqi Cheng. Everything is editable: Extend knowledge editing to unstructured data in large language models,

work page
[9]

URLhttps://arxiv.org/abs/2405.15349

work page arXiv
[10]

Calibrating factual knowledge in pretrained language models

Qingxiu Dong, Damai Dai, Yifan Song, Jingjing Xu, Zhifang Sui, and Lei Li. Calibrating factual knowledge in pretrained language models. In Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang, editors,Findings of the Association for Computational Linguistics: EMNLP 2022, pages 5937–5947, Abu Dhabi, United Arab Emirates, December 2022. Association for Computati...

work page doi:10.18653/v1/2022.findings-emnlp.438 2022
[11]

Mmke-bench: A multimodal editing benchmark for diverse visual knowledge.arXiv preprint arXiv:2502.19870, 2025

Yuntao Du, Kailin Jiang, Zhi Gao, Chenrui Shi, Zilong Zheng, Siyuan Qi, and Qing Li. Mmke-bench: A multimodal editing benchmark for diverse visual knowledge.arXiv preprint arXiv:2502.19870, 2025

work page arXiv 2025
[12]

Alphaedit: Null-space constrained knowledge editing for language models

Junfeng Fang, Houcheng Jiang, Kun Wang, Yunshan Ma, Shi Jie, Xiang Wang, Xiangnan He, and Tat-Seng Chua. Alphaedit: Null-space constrained knowledge editing for language models. arXiv preprint arXiv:2410.02355, 2024

work page arXiv 2024
[13]

Same question, different words: A latent adversarial framework for prompt robustness, 2025

Tingchen Fu and Fazl Barez. Same question, different words: A latent adversarial framework for prompt robustness, 2025. URLhttps://arxiv.org/abs/2503.01345

work page arXiv 2025
[14]

SimCSE: Simple Contrastive Learning of Sentence Embeddings

Tianyu Gao, Xingcheng Yao, and Danqi Chen. Simcse: Simple contrastive learning of sentence embeddings, 2022. URLhttps://arxiv.org/abs/2104.08821

work page internal anchor Pith review Pith/arXiv arXiv 2022
[15]

Pokemqa: Programmable knowledge editing for multi-hop question answering, 2024

Hengrui Gu, Kaixiong Zhou, Xiaotian Han, Ninghao Liu, Ruobing Wang, and Xin Wang. Pokemqa: Programmable knowledge editing for multi-hop question answering, 2024. URL https://arxiv.org/abs/2312.15194

work page arXiv 2024
[16]

Balancedit: Dynamically balancing the generality-locality trade-off in multi-modal model editing

Dongliang Guo, Mengxuan Hu, Zihan Guan, Thomas Hartvigsen, and Sheng Li. Balancedit: Dynamically balancing the generality-locality trade-off in multi-modal model editing. In International Conference on Machine Learning, 2025. URL https://arxiv.org/abs/2505. 01343. 10

work page 2025
[17]

Aging with grace: Lifelong model editing with discrete key-value adaptors, 2023

Thomas Hartvigsen, Swami Sankaranarayanan, Hamid Palangi, Yoon Kim, and Marzyeh Ghassemi. Aging with grace: Lifelong model editing with discrete key-value adaptors, 2023. URLhttps://arxiv.org/abs/2211.11031

work page arXiv 2023
[18]

Methods for measuring, updating, and visualizing factual beliefs in language models

Peter Hase, Mona Diab, Asli Celikyilmaz, Xian Li, Zornitsa Kozareva, Veselin Stoyanov, Mohit Bansal, and Srinivasan Iyer. Methods for measuring, updating, and visualizing factual beliefs in language models. In Andreas Vlachos and Isabelle Augenstein, editors,Proceedings of the 17th Conference of the European Chapter of the Association for Computational Li...

work page doi:10.18653/v1/2023.eacl-main.199 2023
[19]

Li, and Jacob Andreas

Evan Hernandez, Belinda Z. Li, and Jacob Andreas. Inspecting and editing knowledge repre- sentations in language models, 2024. URLhttps://arxiv.org/abs/2304.00740

work page arXiv 2024
[20]

Vlkeb: A large vision-language model knowledge editing benchmark.Advances in Neural Information Processing Systems, 37:9257–9280, 2024

Han Huang, Haitian Zhong, Tao Yu, Qiang Liu, Shu Wu, Liang Wang, and Tieniu Tan. Vlkeb: A large vision-language model knowledge editing benchmark.Advances in Neural Information Processing Systems, 37:9257–9280, 2024

work page 2024
[21]

Transformer-patcher: One mistake worth one neuron.arXiv preprint arXiv:2301.09785, 2023

Zeyu Huang, Yikang Shen, Xiaofeng Zhang, Jie Zhou, Wenge Rong, and Zhang Xiong. Transformer-patcher: One mistake worth one neuron.arXiv preprint arXiv:2301.09785, 2023

work page arXiv 2023
[22]

Anyedit: Edit any knowledge encoded in language models

Houcheng Jiang, Junfeng Fang, Ningyu Zhang, Guojun Ma, Mingyang Wan, Xiang Wang, Xiangnan He, and Tat-seng Chua. Anyedit: Edit any knowledge encoded in language models. arXiv preprint arXiv:2502.05628, 2025

work page arXiv 2025
[23]

Llava-med: Training a large language-and-vision assistant for biomedicine in one day.Advances in Neural Information Processing Systems, 36: 28541–28564, 2023

Chunyuan Li, Cliff Wong, Sheng Zhang, Naoto Usuyama, Haotian Liu, Jianwei Yang, Tristan Naumann, Hoifung Poon, and Jianfeng Gao. Llava-med: Training a large language-and-vision assistant for biomedicine in one day.Advances in Neural Information Processing Systems, 36: 28541–28564, 2023

work page 2023
[24]

Mike: A new benchmark for fine-grained multimodal entity knowledge editing.arXiv preprint arXiv:2402.14835, 2024

Jiaqi Li, Miaozeng Du, Chuanyi Zhang, Yongrui Chen, Nan Hu, Guilin Qi, Haiyun Jiang, Siyuan Cheng, and Bozhong Tian. Mike: A new benchmark for fine-grained multimodal entity knowledge editing.arXiv preprint arXiv:2402.14835, 2024

work page arXiv 2024
[25]

Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models

Junnan Li, Dongxu Li, Silvio Savarese, and Steven Hoi. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. InInternational conference on machine learning, pages 19730–19742. PMLR, 2023

work page 2023
[26]

Pmet: Precise model editing in a transformer, 2024

Xiaopeng Li, Shasha Li, Shezheng Song, Jing Yang, Jun Ma, and Jie Yu. Pmet: Precise model editing in a transformer, 2024. URLhttps://arxiv.org/abs/2308.08742

work page arXiv 2024
[27]

MoE-LLaVA: Mixture of Experts for Large Vision-Language Models

Bin Lin, Zhenyu Tang, Yang Ye, Jinfa Huang, Junwu Zhang, Yatian Pang, Peng Jin, Munan Ning, Jiebo Luo, and Li Yuan. Moe-llava: Mixture of experts for large vision-language models. arXiv preprint arXiv:2401.15947, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[28]

Calibrated Multimodal Representation Learning with Missing Modalities

Xiaohao Liu, Xiaobo Xia, Jiaheng Wei, Shuo Yang, Xiu Su, See-Kiong Ng, and Tat-Seng Chua. Calibrated multimodal representation learning with missing modalities.arXiv preprint arXiv:2511.12034, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[29]

Principled multimodal represen- tation learning, 2026

Xiaohao Liu, Xiaobo Xia, See-Kiong Ng, and Tat-Seng Chua. Principled multimodal represen- tation learning, 2026

work page 2026
[30]

Untying the reversal curse via bidirectional language model editing, 2024

Jun-Yu Ma, Jia-Chen Gu, Zhen-Hua Ling, Quan Liu, and Cong Liu. Untying the reversal curse via bidirectional language model editing, 2024. URL https://arxiv.org/abs/2310. 10322

work page 2024
[31]

Locating and editing factual associations in gpt.Advances in neural information processing systems, 35:17359–17372, 2022

Kevin Meng, David Bau, Alex Andonian, and Yonatan Belinkov. Locating and editing factual associations in gpt.Advances in neural information processing systems, 35:17359–17372, 2022

work page 2022
[32]

Mass-Editing Memory in a Transformer

Kevin Meng, Arnab Sen Sharma, Alex Andonian, Yonatan Belinkov, and David Bau. Mass- editing memory in a transformer.arXiv preprint arXiv:2210.07229, 2022. 11

work page internal anchor Pith review Pith/arXiv arXiv 2022
[33]

Fast model editing at scale.arXiv preprint arXiv:2110.11309, 2021

Eric Mitchell, Charles Lin, Antoine Bosselut, Chelsea Finn, and Christopher D Manning. Fast model editing at scale.arXiv preprint arXiv:2110.11309, 2021

work page arXiv 2021
[34]

Memory-based model editing at scale

Eric Mitchell, Charles Lin, Antoine Bosselut, Christopher D Manning, and Chelsea Finn. Memory-based model editing at scale. InInternational Conference on Machine Learning, pages 15817–15831. PMLR, 2022

work page 2022
[35]

Precise localization of memories: A fine-grained neuron-level knowledge editing technique for llms, 2025

Haowen Pan, Xiaozhi Wang, Yixin Cao, Zenglin Shi, Xun Yang, Juanzi Li, and Meng Wang. Precise localization of memories: A fine-grained neuron-level knowledge editing technique for llms, 2025. URLhttps://arxiv.org/abs/2503.01090

work page arXiv 2025
[36]

Towards unified multimodal editing with enhanced knowledge collaboration.Advances in Neural Information Processing Systems, 37:110290–110314, 2024

Kaihang Pan, Zhaoyu Fan, Juncheng Li, Qifan Yu, Hao Fei, Siliang Tang, Richang Hong, Hanwang Zhang, and Qianru Sun. Towards unified multimodal editing with enhanced knowledge collaboration.Advances in Neural Information Processing Systems, 37:110290–110314, 2024

work page 2024
[37]

Learning Transferable Visual Models From Natural Language Supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agar- wal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision, 2021. URL https://arxiv.org/abs/2103.00020

work page internal anchor Pith review Pith/arXiv arXiv 2021
[38]

Out-of-distribution generalization via invariant trajectories for multimodal large language model editing, 2026

Jiajie Su, Haoyuan Wang, Xiaohua Feng, Yunshan Ma, Xiaobo Xia, Yuyuan Li, Xiaolin Zheng, Jianmao Xiao, and Chaochao Chen. Out-of-distribution generalization via invariant trajectories for multimodal large language model editing, 2026. URL https://arxiv.org/abs/2601. 19700

work page 2026
[39]

Mass-editing memory with attention in transformers: A cross-lingual exploration of knowledge

Daniel Tamayo, Aitor Gonzalez-Agirre, Javier Hernando, and Marta Villegas. Mass-editing memory with attention in transformers: A cross-lingual exploration of knowledge. InFindings of the Association for Computational Linguistics ACL 2024, page 5831–5847. Association for Computational Linguistics, 2024. doi: 10.18653/v1/2024.findings-acl.347. URL http: //d...

work page doi:10.18653/v1/2024.findings-acl.347 2024
[40]

Massive editing for large language models via meta learning, 2024

Chenmien Tan, Ge Zhang, and Jie Fu. Massive editing for large language models via meta learning, 2024. URLhttps://arxiv.org/abs/2311.04661

work page arXiv 2024
[41]

LLaMA: Open and Efficient Foundation Language Models

Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timo- thée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, and Guillaume Lample. Llama: Open and efficient foundation language models, 2023. URLhttps://arxiv.org/abs/2302.13971

work page internal anchor Pith review Pith/arXiv arXiv 2023
[42]

Wise: Rethinking the knowledge memory for lifelong model editing of large language models.Advances in Neural Information Processing Systems, 37: 53764–53797, 2024

Peng Wang, Zexi Li, Ningyu Zhang, Ziwen Xu, Yunzhi Yao, Yong Jiang, Pengjun Xie, Fei Huang, and Huajun Chen. Wise: Rethinking the knowledge memory for lifelong model editing of large language models.Advances in Neural Information Processing Systems, 37: 53764–53797, 2024

work page 2024
[43]

Knowledge editing for large language models: A survey.ACM Computing Surveys, 57(3):1–37, 2024

Song Wang, Yaochen Zhu, Haochen Liu, Zaiyi Zheng, Chen Chen, and Jundong Li. Knowledge editing for large language models: A survey.ACM Computing Surveys, 57(3):1–37, 2024

work page 2024
[44]

Melo: Enhancing model editing with neuron- indexed dynamic lora, 2023

Lang Yu, Qin Chen, Jie Zhou, and Liang He. Melo: Enhancing model editing with neuron- indexed dynamic lora, 2023. URLhttps://arxiv.org/abs/2312.11795

work page arXiv 2023
[45]

Visual- oriented fine-grained knowledge editing for multimodal large language models.arXiv preprint arXiv:2411.12790, 2024

Zhen Zeng, Leijiang Gu, Xun Yang, Zhangling Duan, Zenglin Shi, and Meng Wang. Visual- oriented fine-grained knowledge editing for multimodal large language models.arXiv preprint arXiv:2411.12790, 2024

work page arXiv 2024
[46]

Mc-mke: A fine-grained multimodal knowledge editing benchmark emphasizing modality consistency.arXiv preprint arXiv:2406.13219, 2024

Junzhe Zhang, Huixuan Zhang, Xunjian Yin, Baizhou Huang, Xu Zhang, Xinyu Hu, and Xiaojun Wan. Mc-mke: A fine-grained multimodal knowledge editing benchmark emphasizing modality consistency.arXiv preprint arXiv:2406.13219, 2024

work page arXiv 2024
[47]

Knowledge graph enhanced large language model editing.arXiv preprint arXiv:2402.13593, 2024

Mengqi Zhang, Xiaotian Ye, Qiang Liu, Pengjie Ren, Shu Wu, and Zhumin Chen. Knowledge graph enhanced large language model editing.arXiv preprint arXiv:2402.13593, 2024

work page arXiv 2024
[48]

Instructedit: Instruction-based knowledge editing for large language models.arXiv preprint arXiv:2402.16123, 2024

Ningyu Zhang, Bozhong Tian, Siyuan Cheng, Xiaozhuan Liang, Yi Hu, Kouying Xue, Yanjie Gou, Xi Chen, and Huajun Chen. Instructedit: Instruction-based knowledge editing for large language models.arXiv preprint arXiv:2402.16123, 2024. 12

work page arXiv 2024
[49]

A comprehensive study of knowledge editing for large language models,

Ningyu Zhang, Yunzhi Yao, Bozhong Tian, Peng Wang, Shumin Deng, Mengru Wang, Zekun Xi, Shengyu Mao, Jintian Zhang, Yuansheng Ni, Siyuan Cheng, Ziwen Xu, Xin Xu, Jia-Chen Gu, Yong Jiang, Pengjun Xie, Fei Huang, Lei Liang, Zhiqiang Zhang, Xiaowei Zhu, Jun Zhou, and Huajun Chen. A comprehensive study of knowledge editing for large language models,

work page
[50]

URLhttps://arxiv.org/abs/2401.01286

work page arXiv
[51]

Can we edit factual knowledge by in-context learning?arXiv preprint arXiv:2305.12740, 2023

Ce Zheng, Lei Li, Qingxiu Dong, Yuxuan Fan, Zhiyong Wu, Jingjing Xu, and Baobao Chang. Can we edit factual knowledge by in-context learning?arXiv preprint arXiv:2305.12740, 2023

work page arXiv 2023
[52]

MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models

Deyao Zhu, Jun Chen, Xiaoqian Shen, Xiang Li, and Mohamed Elhoseiny. Minigpt-4: En- hancing vision-language understanding with advanced large language models.arXiv preprint arXiv:2304.10592, 2023. 13 Appendix Contents A Experiment Setup Details 15 A.1 MLLM Backbones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 A.2 Experiment Data...

work page internal anchor Pith review Pith/arXiv arXiv 2023
[53]

explores the trade-off of generality and locality through influence-scope estimation and localized codebook-based edits, ODEdit [37] proposes a plug-and-play invariant learning based framework to address the semantic shifts coupled with factual changes. However, these works still exhibit limited generality, as they remain constrained by sample-centric upd...

work page

[1] [1]

Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, et al. Qwen2. 5-vl technical report.arXiv preprint arXiv:2502.13923, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[2] [2]

De- coding by contrasting knowledge: Enhancing llms’ confidence on edited facts.arXiv preprint arXiv:2405.11613, 2024

Baolong Bi, Shenghua Liu, Lingrui Mei, Yiwei Wang, Pengliang Ji, and Xueqi Cheng. De- coding by contrasting knowledge: Enhancing llms’ confidence on edited facts.arXiv preprint arXiv:2405.11613, 2024

work page arXiv 2024

[3] [3]

Can we edit multimodal large language models?arXiv preprint arXiv:2310.08475, 2023

Siyuan Cheng, Bozhong Tian, Qingbin Liu, Xi Chen, Yongheng Wang, Huajun Chen, and Ningyu Zhang. Can we edit multimodal large language models?arXiv preprint arXiv:2310.08475, 2023

work page arXiv 2023

[4] [4]

Gramian multimodal representation learning and alignment, 2025

Giordano Cicchetti, Eleonora Grassucci, Luigi Sigillo, and Danilo Comminiello. Gramian multimodal representation learning and alignment, 2025. URL https://arxiv.org/abs/ 2412.11959

work page arXiv 2025

[5] [5]

Evaluating the ripple effects of knowledge editing in language models, 2023

Roi Cohen, Eden Biran, Ori Yoran, Amir Globerson, and Mor Geva. Evaluating the ripple effects of knowledge editing in language models, 2023. URL https://arxiv.org/abs/ 2307.12976

work page arXiv 2023

[6] [6]

Instructblip: Towards general-purpose vision-language models with instruction tuning.Advances in neural information processing systems, 36:49250–49267, 2023

Wenliang Dai, Junnan Li, Dongxu Li, Anthony Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale N Fung, and Steven Hoi. Instructblip: Towards general-purpose vision-language models with instruction tuning.Advances in neural information processing systems, 36:49250–49267, 2023

work page 2023

[7] [7]

Editing factual knowledge in language models

Nicola De Cao, Wilker Aziz, and Ivan Titov. Editing factual knowledge in language models. arXiv preprint arXiv:2104.08164, 2021

work page arXiv 2021

[8] [8]

Everything is editable: Extend knowledge editing to unstructured data in large language models,

Jingcheng Deng, Zihao Wei, Liang Pang, Hanxing Ding, Huawei Shen, and Xueqi Cheng. Everything is editable: Extend knowledge editing to unstructured data in large language models,

work page

[9] [9]

URLhttps://arxiv.org/abs/2405.15349

work page arXiv

[10] [10]

Calibrating factual knowledge in pretrained language models

Qingxiu Dong, Damai Dai, Yifan Song, Jingjing Xu, Zhifang Sui, and Lei Li. Calibrating factual knowledge in pretrained language models. In Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang, editors,Findings of the Association for Computational Linguistics: EMNLP 2022, pages 5937–5947, Abu Dhabi, United Arab Emirates, December 2022. Association for Computati...

work page doi:10.18653/v1/2022.findings-emnlp.438 2022

[11] [11]

Mmke-bench: A multimodal editing benchmark for diverse visual knowledge.arXiv preprint arXiv:2502.19870, 2025

Yuntao Du, Kailin Jiang, Zhi Gao, Chenrui Shi, Zilong Zheng, Siyuan Qi, and Qing Li. Mmke-bench: A multimodal editing benchmark for diverse visual knowledge.arXiv preprint arXiv:2502.19870, 2025

work page arXiv 2025

[12] [12]

Alphaedit: Null-space constrained knowledge editing for language models

Junfeng Fang, Houcheng Jiang, Kun Wang, Yunshan Ma, Shi Jie, Xiang Wang, Xiangnan He, and Tat-Seng Chua. Alphaedit: Null-space constrained knowledge editing for language models. arXiv preprint arXiv:2410.02355, 2024

work page arXiv 2024

[13] [13]

Same question, different words: A latent adversarial framework for prompt robustness, 2025

Tingchen Fu and Fazl Barez. Same question, different words: A latent adversarial framework for prompt robustness, 2025. URLhttps://arxiv.org/abs/2503.01345

work page arXiv 2025

[14] [14]

SimCSE: Simple Contrastive Learning of Sentence Embeddings

Tianyu Gao, Xingcheng Yao, and Danqi Chen. Simcse: Simple contrastive learning of sentence embeddings, 2022. URLhttps://arxiv.org/abs/2104.08821

work page internal anchor Pith review Pith/arXiv arXiv 2022

[15] [15]

Pokemqa: Programmable knowledge editing for multi-hop question answering, 2024

Hengrui Gu, Kaixiong Zhou, Xiaotian Han, Ninghao Liu, Ruobing Wang, and Xin Wang. Pokemqa: Programmable knowledge editing for multi-hop question answering, 2024. URL https://arxiv.org/abs/2312.15194

work page arXiv 2024

[16] [16]

Balancedit: Dynamically balancing the generality-locality trade-off in multi-modal model editing

Dongliang Guo, Mengxuan Hu, Zihan Guan, Thomas Hartvigsen, and Sheng Li. Balancedit: Dynamically balancing the generality-locality trade-off in multi-modal model editing. In International Conference on Machine Learning, 2025. URL https://arxiv.org/abs/2505. 01343. 10

work page 2025

[17] [17]

Aging with grace: Lifelong model editing with discrete key-value adaptors, 2023

Thomas Hartvigsen, Swami Sankaranarayanan, Hamid Palangi, Yoon Kim, and Marzyeh Ghassemi. Aging with grace: Lifelong model editing with discrete key-value adaptors, 2023. URLhttps://arxiv.org/abs/2211.11031

work page arXiv 2023

[18] [18]

Methods for measuring, updating, and visualizing factual beliefs in language models

Peter Hase, Mona Diab, Asli Celikyilmaz, Xian Li, Zornitsa Kozareva, Veselin Stoyanov, Mohit Bansal, and Srinivasan Iyer. Methods for measuring, updating, and visualizing factual beliefs in language models. In Andreas Vlachos and Isabelle Augenstein, editors,Proceedings of the 17th Conference of the European Chapter of the Association for Computational Li...

work page doi:10.18653/v1/2023.eacl-main.199 2023

[19] [19]

Li, and Jacob Andreas

Evan Hernandez, Belinda Z. Li, and Jacob Andreas. Inspecting and editing knowledge repre- sentations in language models, 2024. URLhttps://arxiv.org/abs/2304.00740

work page arXiv 2024

[20] [20]

Vlkeb: A large vision-language model knowledge editing benchmark.Advances in Neural Information Processing Systems, 37:9257–9280, 2024

Han Huang, Haitian Zhong, Tao Yu, Qiang Liu, Shu Wu, Liang Wang, and Tieniu Tan. Vlkeb: A large vision-language model knowledge editing benchmark.Advances in Neural Information Processing Systems, 37:9257–9280, 2024

work page 2024

[21] [21]

Transformer-patcher: One mistake worth one neuron.arXiv preprint arXiv:2301.09785, 2023

Zeyu Huang, Yikang Shen, Xiaofeng Zhang, Jie Zhou, Wenge Rong, and Zhang Xiong. Transformer-patcher: One mistake worth one neuron.arXiv preprint arXiv:2301.09785, 2023

work page arXiv 2023

[22] [22]

Anyedit: Edit any knowledge encoded in language models

Houcheng Jiang, Junfeng Fang, Ningyu Zhang, Guojun Ma, Mingyang Wan, Xiang Wang, Xiangnan He, and Tat-seng Chua. Anyedit: Edit any knowledge encoded in language models. arXiv preprint arXiv:2502.05628, 2025

work page arXiv 2025

[23] [23]

Llava-med: Training a large language-and-vision assistant for biomedicine in one day.Advances in Neural Information Processing Systems, 36: 28541–28564, 2023

Chunyuan Li, Cliff Wong, Sheng Zhang, Naoto Usuyama, Haotian Liu, Jianwei Yang, Tristan Naumann, Hoifung Poon, and Jianfeng Gao. Llava-med: Training a large language-and-vision assistant for biomedicine in one day.Advances in Neural Information Processing Systems, 36: 28541–28564, 2023

work page 2023

[24] [24]

Mike: A new benchmark for fine-grained multimodal entity knowledge editing.arXiv preprint arXiv:2402.14835, 2024

Jiaqi Li, Miaozeng Du, Chuanyi Zhang, Yongrui Chen, Nan Hu, Guilin Qi, Haiyun Jiang, Siyuan Cheng, and Bozhong Tian. Mike: A new benchmark for fine-grained multimodal entity knowledge editing.arXiv preprint arXiv:2402.14835, 2024

work page arXiv 2024

[25] [25]

Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models

Junnan Li, Dongxu Li, Silvio Savarese, and Steven Hoi. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. InInternational conference on machine learning, pages 19730–19742. PMLR, 2023

work page 2023

[26] [26]

Pmet: Precise model editing in a transformer, 2024

Xiaopeng Li, Shasha Li, Shezheng Song, Jing Yang, Jun Ma, and Jie Yu. Pmet: Precise model editing in a transformer, 2024. URLhttps://arxiv.org/abs/2308.08742

work page arXiv 2024

[27] [27]

MoE-LLaVA: Mixture of Experts for Large Vision-Language Models

Bin Lin, Zhenyu Tang, Yang Ye, Jinfa Huang, Junwu Zhang, Yatian Pang, Peng Jin, Munan Ning, Jiebo Luo, and Li Yuan. Moe-llava: Mixture of experts for large vision-language models. arXiv preprint arXiv:2401.15947, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[28] [28]

Calibrated Multimodal Representation Learning with Missing Modalities

Xiaohao Liu, Xiaobo Xia, Jiaheng Wei, Shuo Yang, Xiu Su, See-Kiong Ng, and Tat-Seng Chua. Calibrated multimodal representation learning with missing modalities.arXiv preprint arXiv:2511.12034, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[29] [29]

Principled multimodal represen- tation learning, 2026

Xiaohao Liu, Xiaobo Xia, See-Kiong Ng, and Tat-Seng Chua. Principled multimodal represen- tation learning, 2026

work page 2026

[30] [30]

Untying the reversal curse via bidirectional language model editing, 2024

Jun-Yu Ma, Jia-Chen Gu, Zhen-Hua Ling, Quan Liu, and Cong Liu. Untying the reversal curse via bidirectional language model editing, 2024. URL https://arxiv.org/abs/2310. 10322

work page 2024

[31] [31]

Locating and editing factual associations in gpt.Advances in neural information processing systems, 35:17359–17372, 2022

Kevin Meng, David Bau, Alex Andonian, and Yonatan Belinkov. Locating and editing factual associations in gpt.Advances in neural information processing systems, 35:17359–17372, 2022

work page 2022

[32] [32]

Mass-Editing Memory in a Transformer

Kevin Meng, Arnab Sen Sharma, Alex Andonian, Yonatan Belinkov, and David Bau. Mass- editing memory in a transformer.arXiv preprint arXiv:2210.07229, 2022. 11

work page internal anchor Pith review Pith/arXiv arXiv 2022

[33] [33]

Fast model editing at scale.arXiv preprint arXiv:2110.11309, 2021

Eric Mitchell, Charles Lin, Antoine Bosselut, Chelsea Finn, and Christopher D Manning. Fast model editing at scale.arXiv preprint arXiv:2110.11309, 2021

work page arXiv 2021

[34] [34]

Memory-based model editing at scale

Eric Mitchell, Charles Lin, Antoine Bosselut, Christopher D Manning, and Chelsea Finn. Memory-based model editing at scale. InInternational Conference on Machine Learning, pages 15817–15831. PMLR, 2022

work page 2022

[35] [35]

Precise localization of memories: A fine-grained neuron-level knowledge editing technique for llms, 2025

Haowen Pan, Xiaozhi Wang, Yixin Cao, Zenglin Shi, Xun Yang, Juanzi Li, and Meng Wang. Precise localization of memories: A fine-grained neuron-level knowledge editing technique for llms, 2025. URLhttps://arxiv.org/abs/2503.01090

work page arXiv 2025

[36] [36]

Towards unified multimodal editing with enhanced knowledge collaboration.Advances in Neural Information Processing Systems, 37:110290–110314, 2024

Kaihang Pan, Zhaoyu Fan, Juncheng Li, Qifan Yu, Hao Fei, Siliang Tang, Richang Hong, Hanwang Zhang, and Qianru Sun. Towards unified multimodal editing with enhanced knowledge collaboration.Advances in Neural Information Processing Systems, 37:110290–110314, 2024

work page 2024

[37] [37]

Learning Transferable Visual Models From Natural Language Supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agar- wal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision, 2021. URL https://arxiv.org/abs/2103.00020

work page internal anchor Pith review Pith/arXiv arXiv 2021

[38] [38]

Out-of-distribution generalization via invariant trajectories for multimodal large language model editing, 2026

Jiajie Su, Haoyuan Wang, Xiaohua Feng, Yunshan Ma, Xiaobo Xia, Yuyuan Li, Xiaolin Zheng, Jianmao Xiao, and Chaochao Chen. Out-of-distribution generalization via invariant trajectories for multimodal large language model editing, 2026. URL https://arxiv.org/abs/2601. 19700

work page 2026

[39] [39]

Mass-editing memory with attention in transformers: A cross-lingual exploration of knowledge

Daniel Tamayo, Aitor Gonzalez-Agirre, Javier Hernando, and Marta Villegas. Mass-editing memory with attention in transformers: A cross-lingual exploration of knowledge. InFindings of the Association for Computational Linguistics ACL 2024, page 5831–5847. Association for Computational Linguistics, 2024. doi: 10.18653/v1/2024.findings-acl.347. URL http: //d...

work page doi:10.18653/v1/2024.findings-acl.347 2024

[40] [40]

Massive editing for large language models via meta learning, 2024

Chenmien Tan, Ge Zhang, and Jie Fu. Massive editing for large language models via meta learning, 2024. URLhttps://arxiv.org/abs/2311.04661

work page arXiv 2024

[41] [41]

LLaMA: Open and Efficient Foundation Language Models

Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timo- thée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, and Guillaume Lample. Llama: Open and efficient foundation language models, 2023. URLhttps://arxiv.org/abs/2302.13971

work page internal anchor Pith review Pith/arXiv arXiv 2023

[42] [42]

Wise: Rethinking the knowledge memory for lifelong model editing of large language models.Advances in Neural Information Processing Systems, 37: 53764–53797, 2024

Peng Wang, Zexi Li, Ningyu Zhang, Ziwen Xu, Yunzhi Yao, Yong Jiang, Pengjun Xie, Fei Huang, and Huajun Chen. Wise: Rethinking the knowledge memory for lifelong model editing of large language models.Advances in Neural Information Processing Systems, 37: 53764–53797, 2024

work page 2024

[43] [43]

Knowledge editing for large language models: A survey.ACM Computing Surveys, 57(3):1–37, 2024

Song Wang, Yaochen Zhu, Haochen Liu, Zaiyi Zheng, Chen Chen, and Jundong Li. Knowledge editing for large language models: A survey.ACM Computing Surveys, 57(3):1–37, 2024

work page 2024

[44] [44]

Melo: Enhancing model editing with neuron- indexed dynamic lora, 2023

Lang Yu, Qin Chen, Jie Zhou, and Liang He. Melo: Enhancing model editing with neuron- indexed dynamic lora, 2023. URLhttps://arxiv.org/abs/2312.11795

work page arXiv 2023

[45] [45]

Visual- oriented fine-grained knowledge editing for multimodal large language models.arXiv preprint arXiv:2411.12790, 2024

Zhen Zeng, Leijiang Gu, Xun Yang, Zhangling Duan, Zenglin Shi, and Meng Wang. Visual- oriented fine-grained knowledge editing for multimodal large language models.arXiv preprint arXiv:2411.12790, 2024

work page arXiv 2024

[46] [46]

Mc-mke: A fine-grained multimodal knowledge editing benchmark emphasizing modality consistency.arXiv preprint arXiv:2406.13219, 2024

Junzhe Zhang, Huixuan Zhang, Xunjian Yin, Baizhou Huang, Xu Zhang, Xinyu Hu, and Xiaojun Wan. Mc-mke: A fine-grained multimodal knowledge editing benchmark emphasizing modality consistency.arXiv preprint arXiv:2406.13219, 2024

work page arXiv 2024

[47] [47]

Knowledge graph enhanced large language model editing.arXiv preprint arXiv:2402.13593, 2024

Mengqi Zhang, Xiaotian Ye, Qiang Liu, Pengjie Ren, Shu Wu, and Zhumin Chen. Knowledge graph enhanced large language model editing.arXiv preprint arXiv:2402.13593, 2024

work page arXiv 2024

[48] [48]

Instructedit: Instruction-based knowledge editing for large language models.arXiv preprint arXiv:2402.16123, 2024

Ningyu Zhang, Bozhong Tian, Siyuan Cheng, Xiaozhuan Liang, Yi Hu, Kouying Xue, Yanjie Gou, Xi Chen, and Huajun Chen. Instructedit: Instruction-based knowledge editing for large language models.arXiv preprint arXiv:2402.16123, 2024. 12

work page arXiv 2024

[49] [49]

A comprehensive study of knowledge editing for large language models,

Ningyu Zhang, Yunzhi Yao, Bozhong Tian, Peng Wang, Shumin Deng, Mengru Wang, Zekun Xi, Shengyu Mao, Jintian Zhang, Yuansheng Ni, Siyuan Cheng, Ziwen Xu, Xin Xu, Jia-Chen Gu, Yong Jiang, Pengjun Xie, Fei Huang, Lei Liang, Zhiqiang Zhang, Xiaowei Zhu, Jun Zhou, and Huajun Chen. A comprehensive study of knowledge editing for large language models,

work page

[50] [50]

URLhttps://arxiv.org/abs/2401.01286

work page arXiv

[51] [51]

Can we edit factual knowledge by in-context learning?arXiv preprint arXiv:2305.12740, 2023

Ce Zheng, Lei Li, Qingxiu Dong, Yuxuan Fan, Zhiyong Wu, Jingjing Xu, and Baobao Chang. Can we edit factual knowledge by in-context learning?arXiv preprint arXiv:2305.12740, 2023

work page arXiv 2023

[52] [52]

MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models

Deyao Zhu, Jun Chen, Xiaoqian Shen, Xiang Li, and Mohamed Elhoseiny. Minigpt-4: En- hancing vision-language understanding with advanced large language models.arXiv preprint arXiv:2304.10592, 2023. 13 Appendix Contents A Experiment Setup Details 15 A.1 MLLM Backbones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 A.2 Experiment Data...

work page internal anchor Pith review Pith/arXiv arXiv 2023

[53] [53]

explores the trade-off of generality and locality through influence-scope estimation and localized codebook-based edits, ODEdit [37] proposes a plug-and-play invariant learning based framework to address the semantic shifts coupled with factual changes. However, these works still exhibit limited generality, as they remain constrained by sample-centric upd...

work page