Merging Methods for Multilingual Knowledge Editing for Large Language Models: An Empirical Odyssey

Jong-Hyeok Lee; Ki-Young Shin; Kunil Lee; Young-Joo Suh

arxiv: 2605.13919 · v1 · pith:7I7SCPGGnew · submitted 2026-05-13 · 💻 cs.CL · cs.LG

Merging Methods for Multilingual Knowledge Editing for Large Language Models: An Empirical Odyssey

Kunil Lee , Ki-Young Shin , Jong-Hyeok Lee , Young-Joo Suh This is my paper

Pith reviewed 2026-05-15 05:45 UTC · model grok-4.3

classification 💻 cs.CL cs.LG

keywords multilingual knowledge editingvector mergingshared covarianceTSVMlarge language modelsknowledge interferenceMzsRE benchmarkbatch editing

0 comments

The pith

Vector summation with shared covariance emerges as the most reliable strategy for merging knowledge edits across languages in large language models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper investigates practical ways to combine vector updates when editing factual knowledge in multilingual large language models so that changes in one language do not degrade performance in others. It compares six variants of vector merging on a large batch-editing task that covers twelve languages and two backbone models. The central result is that summation which incorporates shared covariance consistently outperforms other options, while plain summation without covariance fails to control interference. Task singular vector merging helps in limited cases but does not reliably solve the cross-language problem. The study also shows that performance depends strongly on the choice of weight scaling factor and rank compression ratio.

Core claim

Vector summation with shared covariance is the most reliable overall strategy for multilingual knowledge editing, whereas simple summation without shared covariance performs poorly. TSVM improves performance in some settings, but its ability to mitigate multilingual interference is limited. Performance is sensitive to both weight scale and rank ratio, with larger-than-default scaling and relatively low rank often yielding better results.

What carries the argument

Vector merging methods that combine edited parameter updates, especially summation that uses shared covariance to align the statistical structure of edits across languages.

If this is right

Shared-covariance summation delivers more stable editing results across languages than alternatives.
Plain vector summation without covariance allows edits to interfere strongly and should be avoided.
TSVM can raise performance in selected cases but does not remove the need for covariance-aware merging.
Raising the weight scale above the default value and keeping rank compression relatively low tends to improve outcomes.
Practical multilingual editing pipelines should therefore test covariance structure and scaling parameters first.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same merging logic could be tested on sequential rather than batch edits to see whether interference grows over time.
If covariance patterns prove language-independent, the method might extend to other multi-domain editing tasks such as style or domain adaptation.
Low-rank approximations combined with covariance could reduce memory cost when editing very large models.
Developers building production systems may need to re-tune scaling and rank for each new language pair rather than using fixed defaults.

Load-bearing premise

The MzsRE benchmark together with the twelve selected languages and two backbone models captures enough of real multilingual interference for the observed performance ordering to hold more generally.

What would settle it

An experiment on a different multilingual editing benchmark or with a broader set of languages in which simple summation without covariance matches or exceeds the shared-covariance version would falsify the reliability ranking.

Figures

Figures reproduced from arXiv: 2605.13919 by Jong-Hyeok Lee, Ki-Young Shin, Kunil Lee, Young-Joo Suh.

**Figure 1.** Figure 1: Effect of scaling factor α on TSVM, TSVM-Cov, and Sum-Cov. Accuracies are averaged across all languages. the monolingual upper bound in all four experimental configurations. Therefore, the main challenge in MKE is not merely how to merge language-specific updates, but how to construct updates that are mutually compatible across languages before or during merging. 6.2 EFFECT OF TSVM (RQ2) TSVM was motivated… view at source ↗

**Figure 2.** Figure 2: Effect of rank ratio r on TSVM, TSVM-Cov. Accuracies are averaged across all languages. 6.3 EFFECT OF WEIGHT SCALING FACTOR (RQ3) The effect of weight scaling is one of the most practically important findings of this study. In most settings, the best performance is obtained not at the default scale of 1.0, but at a slightly larger value. This result shows that the magnitude of the closed-form update is not… view at source ↗

read the original abstract

Multilingual knowledge editing (MKE) remains challenging because language-specific edits interfere with one another, even when locate-then-edit methods work well in monolingual settings. This paper focuses on three issues: the effectiveness of vector merging methods for MKE, the extent to which Task Singular Vectors for Merging (TSVM) can reduce multilingual interference, and the influence of the weight scaling factor and rank compression ratio on performance. We evaluate six merging variants with two popular backbone large language models, two base knowledge editing methods, and 12 languages on the MzsRE benchmark under a large-scale batch-editing setting. Our results show that vector summation with shared covariance is the most reliable overall strategy, whereas simple summation without shared covariance performs poorly. TSVM improves performance in some settings, but its ability to mitigate multilingual interference is limited. We also find that performance is sensitive to both weight scale and rank ratio, with larger-than-default scaling and relatively low rank often yielding better results. These findings clarify the practical strengths and limits of current vector merging methods for MKE and provide guidance for future multilingual knowledge editing research.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper runs controlled comparisons of merging methods for multilingual knowledge editing and finds shared-covariance summation most reliable on MzsRE, but the ranking is tied to that benchmark and lacks cross-checks.

read the letter

The main thing to know is that vector summation with shared covariance comes out ahead for reducing interference when editing knowledge across 12 languages in batch mode, while plain summation without it performs poorly and TSVM only helps in limited cases. The authors test six merging variants on two backbone models using locate-then-edit methods and report that performance shifts with weight scaling and rank compression, often favoring larger scaling and lower ranks than defaults. This supplies some practical guidance that was missing for the multilingual setting. What the work does well is laying out a clear experimental protocol for large-scale batch editing on MzsRE and showing consistent patterns across the variants. The comparisons are direct and the sensitivity analysis on the two free parameters is useful for anyone trying to apply these techniques. The claims rest on observable benchmark differences rather than circular fitting, which keeps the evidence straightforward. The soft spots are mostly about scope. All rankings come from MzsRE alone, so if other multilingual editing suites have different fact distributions or more typologically distant languages the ordering could shift, and the paper does not test that. The abstract and summary give no error bars or significance tests, which makes it harder to judge how stable the differences really are. The assumption that these 12 languages plus the chosen backbones capture typical interference is reasonable for a first pass but remains unverified. This paper is for researchers already working on knowledge editing who need quick empirical pointers on merging choices rather than new theory. It fills a narrow gap without overreaching. I would send it to peer review because the setup is reproducible and the topic is timely; referees can ask for extra benchmarks or stats without starting from scratch.

Referee Report

2 major / 1 minor

Summary. The paper empirically evaluates six vector merging variants for multilingual knowledge editing (MKE) in LLMs to address language interference. Using two backbone models, two base editing methods, and 12 languages on the MzsRE benchmark in batch-editing settings, it concludes that vector summation with shared covariance is the most reliable overall strategy, simple summation without shared covariance performs poorly, TSVM offers limited mitigation of interference, and performance is sensitive to the weight scaling factor and rank compression ratio.

Significance. If the empirical ranking holds beyond the tested conditions, the work supplies actionable guidance on merging methods for MKE and highlights the value of shared covariance along with hyperparameter sensitivity. The direct benchmark comparisons constitute a useful empirical contribution, though the single-benchmark scope limits broader impact.

major comments (2)

[Results] The headline claim that vector summation with shared covariance is the most reliable overall strategy rests entirely on MzsRE runs with 12 languages and two backbones (abstract and results sections). No cross-benchmark validation is reported, so it remains untested whether the observed ordering reverses under different fact distributions, more typologically distant languages, or non-translation-based edits.
[Experimental Setup] No error bars, statistical significance tests, or full experimental protocol details accompany the performance tables or figures, leaving the strength of support for the method ranking only partially verifiable (abstract and experimental results).

minor comments (1)

[Abstract] The abstract states that performance is sensitive to weight scale and rank ratio but supplies no quantitative deltas or example values to illustrate the effect sizes.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive comments, which help clarify the scope and verifiability of our empirical findings. We address each major comment below and outline the revisions we will make.

read point-by-point responses

Referee: [Results] The headline claim that vector summation with shared covariance is the most reliable overall strategy rests entirely on MzsRE runs with 12 languages and two backbones (abstract and results sections). No cross-benchmark validation is reported, so it remains untested whether the observed ordering reverses under different fact distributions, more typologically distant languages, or non-translation-based edits.

Authors: We agree that the evaluation is confined to the MzsRE benchmark. MzsRE was chosen because it supports large-scale batch editing across 12 languages and provides a direct test of multilingual interference, which aligns with the paper's focus. We acknowledge that the ranking of merging methods has not been validated on other benchmarks, different language typologies, or non-translation edits, and that the ordering could potentially reverse under those conditions. In the revised manuscript we will add an explicit limitations paragraph in the discussion section stating that our conclusions are benchmark-specific and that broader validation remains future work. We cannot add new cross-benchmark experiments at this stage. revision: partial
Referee: [Experimental Setup] No error bars, statistical significance tests, or full experimental protocol details accompany the performance tables or figures, leaving the strength of support for the method ranking only partially verifiable (abstract and experimental results).

Authors: We appreciate this observation. In the revised version we will (i) add error bars (standard deviation across three random seeds) to all tables and figures, (ii) include paired t-test p-values for the key comparisons between merging variants, and (iii) expand the experimental protocol section and appendix with complete hyperparameter lists, random seeds, hardware details, and exact implementation steps so that the ranking can be fully reproduced and statistically assessed. revision: yes

standing simulated objections not resolved

Absence of cross-benchmark validation on datasets other than MzsRE or on non-translation-based edits

Circularity Check

0 steps flagged

No circularity: purely empirical benchmark comparisons

full rationale

The paper conducts an empirical evaluation of six vector merging variants for multilingual knowledge editing, using direct performance measurements on the MzsRE benchmark across 12 languages and two backbone models. No equations, derivations, or parameter-fitting steps are present that could reduce any claim to its own inputs by construction. Conclusions about the reliability of vector summation with shared covariance rest solely on observed benchmark scores rather than any self-referential logic, fitted inputs renamed as predictions, or load-bearing self-citations. The analysis is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 0 axioms · 0 invented entities

No theoretical axioms or invented entities are introduced; the claims rest on empirical observations from standard benchmarks and models.

free parameters (2)

weight scaling factor
Tuned and shown to affect performance; larger-than-default values often better.
rank compression ratio
Varied across experiments; relatively low ranks frequently improve results.

pith-pipeline@v0.9.0 · 5502 in / 1142 out tokens · 39372 ms · 2026-05-15T05:45:47.895554+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages

[1]

GPT-4 Technical Report

OpenAI. GPT-4 Technical Report. Technical Report, 2023

work page 2023
[2]

The Llama 3 Herd of Models

Llama Team, Meta AI. The Llama 3 Herd of Models. Technical Report, 2024

work page 2024
[3]

Qwen2 Technical Report

Qwen Team, Alibaba Group. Qwen2 Technical Report. Technical Report, 2024

work page 2024
[4]

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Gemini Team, Google. Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities. Technical Report, 2025

work page 2025
[5]

Attention Is All You Need

Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, .; Polosukhin, I. Attention Is All You Need. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS), 2017

work page 2017
[6]

LoRA: Low-Rank Adaptation of Large Language Models

Hu, E.J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; Chen, W. LoRA: Low-Rank Adaptation of Large Language Models. In Proceedings of the International Conference on Learning Representations, 2022

work page 2022
[7]

Fast Model Editing at Scale

Mitchell, E.; Lin, C.; Bosselut, A.; Finn, C.; Manning, C.D. Fast Model Editing at Scale. In Proceedings of the International Conference on Learning Representations, 2022

work page 2022
[8]

Editing large language models: Problems, methods, and opportunities

Yao, Y.; Wang, P.; Tian, B.; Cheng, S.; Li, Z.; Deng, S.; Chen, H.; Zhang, N. Editing large language models: Problems, methods, and opportunities. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

work page 2023
[9]

Locating and Editing Factual Associations in GPT

Meng, K.; Bau, D.; Andonian, A.; Belinkov, Y. Locating and Editing Factual Associations in GPT. In Advances in Neural Information Processing Systems, 2022

work page 2022
[10]

Mass-Editing Memory in a Transformer

Meng, K.; Sharma, A.S.; Andonian, A.; Belinkov, Y.; Bau, D. Mass-Editing Memory in a Transformer. In Proceedings of the International Conference on Learning Representations, 2023

work page 2023
[11]

AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models

Fang, J.; Jiang, H.; Wang, K.; Ma, Y.; Shi, J.; Wang, X.; He, X.; Chua, T.-S. AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models. In Proceedings of the International Conference on Learning Representations, 2025

work page 2025
[12]

Cross-Lingual Knowledge Editing in Large Language Models

Wang, J.; Liang, Y.; Sun, Z.; Cao, Y.; Xu, J.; Meng, F. Cross-Lingual Knowledge Editing in Large Language Models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, 2024

work page 2024
[13]

Multilingual Knowledge Editing with Language-Agnostic Factual Neurons

Zhang, X.; Liang, Y.; Meng, F.; Zhang, S.; Chen, Y.; Xu, J.; Zhou, J. Multilingual Knowledge Editing with Language-Agnostic Factual Neurons. In Proceedings of the 31st International Conference on Computational Linguistics, 2025

work page 2025
[14]

Task Singular Vectors: Reducing Task Interference in Model Merging

Gargiulo, A.A.; Crisostomi, D.; Bucarelli, M.S.; Scardapane, S.; Silvestri, F.; Rodol \`a , E. Task Singular Vectors: Reducing Task Interference in Model Merging. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

work page 2025
[15]

Retrieval-augmented Multilingual Knowledge Editing

Wang, W.; Haddow, B.; Birch, A. Retrieval-augmented Multilingual Knowledge Editing. Preprint, arXiv:2312.13040, 2023

work page arXiv 2023
[16]

Can We Edit Factual Knowledge by In-Context Learning? In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Zheng, C.; Li, L.; Dong, Q.; Fan, Y.; Wu, Z.; Xu, J.; Chang, B. Can We Edit Factual Knowledge by In-Context Learning? In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

work page 2023
[17]

Editing Factual Knowledge in Language Models

De Cao, N.; Aziz, W.; Titov, I. Editing Factual Knowledge in Language Models. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021, pp. 6491--6506

work page 2021
[18]

Memory-Based Model Editing at Scale

Mitchell, E.; Lin, C.; Bosselut, A.; Manning, C.D.; Finn, C. Memory-Based Model Editing at Scale. In Proceedings of the 39th International Conference on Machine Learning, 2022, pp. 15817--15831

work page 2022
[19]

Knowledge Neurons in Pretrained Transformers

Dai, D.; Dong, L.; Hao, Y.; Sui, Z.; Chang, B.; Wei, F. Knowledge Neurons in Pretrained Transformers. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, 2022, pp. 8493--8502

work page 2022
[20]

Cross-Lingual Multi-Hop Knowledge Editing

Khandelwal, A.; Singh, H.; Gu, H.; Chen, T.; Zhou, K. Cross-Lingual Multi-Hop Knowledge Editing. In Findings of the Association for Computational Linguistics: EMNLP 2024, 2024, pp. 11995--12015

work page 2024
[21]

PMET: Precise Model Editing in a Transformer

Li, X.; Li, S.; Song, S.; Yang, J.; Ma, J.; Yu, J. PMET: Precise Model Editing in a Transformer. In Proceedings of the 38th Annual AAAI Conference on Artificial Intelligence, 2024

work page 2024
[22]

Model Soups: Averaging Weights of Multiple Fine-Tuned Models Improves Accuracy without Increasing Inference Time

Wortsman, M.; Ilharco, G.; Gadre, S.Y.; Roelofs, R.; Gontijo Lopes, R.; Morcos, A.; Namkoong, H.; Farhadi, A.; Carmon, Y.; Kornblith, S.; Schmidt, L. Model Soups: Averaging Weights of Multiple Fine-Tuned Models Improves Accuracy without Increasing Inference Time. In Proceedings of the 39th International Conference on Machine Learning, 2022

work page 2022
[23]

Editing Models with Task Arithmetic

Ilharco, G.; Ribeiro, M.T.; Wortsman, M.; Schmidt, L.; Hajishirzi, H.; Farhadi, A. Editing Models with Task Arithmetic. In Proceedings of the International Conference on Learning Representations, 2023

work page 2023
[24]

TIES-Merging: Resolving Interference When Merging Models

Yadav, P.; Tam, D.; Choshen, L.; Raffel, C.; Bansal, M. TIES-Merging: Resolving Interference When Merging Models. In Advances in Neural Information Processing Systems, 2023

work page 2023
[25]

Transformer Feed-Forward Layers Are Key-Value Memories

Geva, M.; Schuster, R.; Berant, J.; Levy, O. Transformer Feed-Forward Layers Are Key-Value Memories. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

work page 2021
[26]

Introduction to Linear Algebra

Lang, S. Introduction to Linear Algebra. Springer Science & Business Media, 2012

work page 2012
[27]

PyTorch: An Imperative Style, High-Performance Deep Learning Library

Paszke, A.; et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems, 2019

work page 2019

[1] [1]

GPT-4 Technical Report

OpenAI. GPT-4 Technical Report. Technical Report, 2023

work page 2023

[2] [2]

The Llama 3 Herd of Models

Llama Team, Meta AI. The Llama 3 Herd of Models. Technical Report, 2024

work page 2024

[3] [3]

Qwen2 Technical Report

Qwen Team, Alibaba Group. Qwen2 Technical Report. Technical Report, 2024

work page 2024

[4] [4]

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Gemini Team, Google. Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities. Technical Report, 2025

work page 2025

[5] [5]

Attention Is All You Need

Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, .; Polosukhin, I. Attention Is All You Need. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS), 2017

work page 2017

[6] [6]

LoRA: Low-Rank Adaptation of Large Language Models

Hu, E.J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; Chen, W. LoRA: Low-Rank Adaptation of Large Language Models. In Proceedings of the International Conference on Learning Representations, 2022

work page 2022

[7] [7]

Fast Model Editing at Scale

Mitchell, E.; Lin, C.; Bosselut, A.; Finn, C.; Manning, C.D. Fast Model Editing at Scale. In Proceedings of the International Conference on Learning Representations, 2022

work page 2022

[8] [8]

Editing large language models: Problems, methods, and opportunities

Yao, Y.; Wang, P.; Tian, B.; Cheng, S.; Li, Z.; Deng, S.; Chen, H.; Zhang, N. Editing large language models: Problems, methods, and opportunities. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

work page 2023

[9] [9]

Locating and Editing Factual Associations in GPT

Meng, K.; Bau, D.; Andonian, A.; Belinkov, Y. Locating and Editing Factual Associations in GPT. In Advances in Neural Information Processing Systems, 2022

work page 2022

[10] [10]

Mass-Editing Memory in a Transformer

Meng, K.; Sharma, A.S.; Andonian, A.; Belinkov, Y.; Bau, D. Mass-Editing Memory in a Transformer. In Proceedings of the International Conference on Learning Representations, 2023

work page 2023

[11] [11]

AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models

Fang, J.; Jiang, H.; Wang, K.; Ma, Y.; Shi, J.; Wang, X.; He, X.; Chua, T.-S. AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models. In Proceedings of the International Conference on Learning Representations, 2025

work page 2025

[12] [12]

Cross-Lingual Knowledge Editing in Large Language Models

Wang, J.; Liang, Y.; Sun, Z.; Cao, Y.; Xu, J.; Meng, F. Cross-Lingual Knowledge Editing in Large Language Models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, 2024

work page 2024

[13] [13]

Multilingual Knowledge Editing with Language-Agnostic Factual Neurons

Zhang, X.; Liang, Y.; Meng, F.; Zhang, S.; Chen, Y.; Xu, J.; Zhou, J. Multilingual Knowledge Editing with Language-Agnostic Factual Neurons. In Proceedings of the 31st International Conference on Computational Linguistics, 2025

work page 2025

[14] [14]

Task Singular Vectors: Reducing Task Interference in Model Merging

Gargiulo, A.A.; Crisostomi, D.; Bucarelli, M.S.; Scardapane, S.; Silvestri, F.; Rodol \`a , E. Task Singular Vectors: Reducing Task Interference in Model Merging. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

work page 2025

[15] [15]

Retrieval-augmented Multilingual Knowledge Editing

Wang, W.; Haddow, B.; Birch, A. Retrieval-augmented Multilingual Knowledge Editing. Preprint, arXiv:2312.13040, 2023

work page arXiv 2023

[16] [16]

Can We Edit Factual Knowledge by In-Context Learning? In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Zheng, C.; Li, L.; Dong, Q.; Fan, Y.; Wu, Z.; Xu, J.; Chang, B. Can We Edit Factual Knowledge by In-Context Learning? In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

work page 2023

[17] [17]

Editing Factual Knowledge in Language Models

De Cao, N.; Aziz, W.; Titov, I. Editing Factual Knowledge in Language Models. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021, pp. 6491--6506

work page 2021

[18] [18]

Memory-Based Model Editing at Scale

Mitchell, E.; Lin, C.; Bosselut, A.; Manning, C.D.; Finn, C. Memory-Based Model Editing at Scale. In Proceedings of the 39th International Conference on Machine Learning, 2022, pp. 15817--15831

work page 2022

[19] [19]

Knowledge Neurons in Pretrained Transformers

Dai, D.; Dong, L.; Hao, Y.; Sui, Z.; Chang, B.; Wei, F. Knowledge Neurons in Pretrained Transformers. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, 2022, pp. 8493--8502

work page 2022

[20] [20]

Cross-Lingual Multi-Hop Knowledge Editing

Khandelwal, A.; Singh, H.; Gu, H.; Chen, T.; Zhou, K. Cross-Lingual Multi-Hop Knowledge Editing. In Findings of the Association for Computational Linguistics: EMNLP 2024, 2024, pp. 11995--12015

work page 2024

[21] [21]

PMET: Precise Model Editing in a Transformer

Li, X.; Li, S.; Song, S.; Yang, J.; Ma, J.; Yu, J. PMET: Precise Model Editing in a Transformer. In Proceedings of the 38th Annual AAAI Conference on Artificial Intelligence, 2024

work page 2024

[22] [22]

Model Soups: Averaging Weights of Multiple Fine-Tuned Models Improves Accuracy without Increasing Inference Time

Wortsman, M.; Ilharco, G.; Gadre, S.Y.; Roelofs, R.; Gontijo Lopes, R.; Morcos, A.; Namkoong, H.; Farhadi, A.; Carmon, Y.; Kornblith, S.; Schmidt, L. Model Soups: Averaging Weights of Multiple Fine-Tuned Models Improves Accuracy without Increasing Inference Time. In Proceedings of the 39th International Conference on Machine Learning, 2022

work page 2022

[23] [23]

Editing Models with Task Arithmetic

Ilharco, G.; Ribeiro, M.T.; Wortsman, M.; Schmidt, L.; Hajishirzi, H.; Farhadi, A. Editing Models with Task Arithmetic. In Proceedings of the International Conference on Learning Representations, 2023

work page 2023

[24] [24]

TIES-Merging: Resolving Interference When Merging Models

Yadav, P.; Tam, D.; Choshen, L.; Raffel, C.; Bansal, M. TIES-Merging: Resolving Interference When Merging Models. In Advances in Neural Information Processing Systems, 2023

work page 2023

[25] [25]

Transformer Feed-Forward Layers Are Key-Value Memories

Geva, M.; Schuster, R.; Berant, J.; Levy, O. Transformer Feed-Forward Layers Are Key-Value Memories. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

work page 2021

[26] [26]

Introduction to Linear Algebra

Lang, S. Introduction to Linear Algebra. Springer Science & Business Media, 2012

work page 2012

[27] [27]

PyTorch: An Imperative Style, High-Performance Deep Learning Library

Paszke, A.; et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems, 2019

work page 2019