pith. machine review for the scientific record. sign in

arxiv: 2604.19678 · v1 · submitted 2026-04-21 · 💻 cs.CL

Recognition: unknown

Exploring Language-Agnosticity in Function Vectors: A Case Study in Machine Translation

Authors on Pith no claims yet

Pith reviewed 2026-05-10 02:46 UTC · model grok-4.3

classification 💻 cs.CL
keywords function vectorslanguage-agnostic representationsmachine translationmultilingual LLMsin-context learningdecoder-only modelstoken ranking
0
0 comments X

The pith

Translation function vectors extracted from one English-to-target direction transfer to improve token ranking in other languages.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether function vectors, which capture task behavior from in-context examples, remain effective when the target language changes. It shows that vectors pulled from a single translation direction boost correct token ranks across many unseen languages in three decoder-only models. This matters because it points to shared internal mechanisms for the translation function itself, rather than language-specific details. Readers interested in efficient multilingual systems would care if one extraction can serve many languages instead of repeating the process for each pair.

Core claim

Across three decoder-only multilingual LLMs, translation FVs extracted from a single English→Target direction transfer to other target languages, consistently improving the rank of correct translation tokens across multiple unseen languages. Ablation results show that removing the FV degrades translation across languages with limited impact on unrelated tasks. Base-model FVs transfer to instruction-tuned variants and partially generalize from word-level to sentence-level translation.

What carries the argument

Function vectors extracted from model activations during in-context learning, tested for their ability to carry translation behavior independently of the specific target language.

If this is right

  • A single FV extraction can support translation into multiple target languages without new extractions.
  • Removing the FV specifically harms translation performance while leaving unrelated tasks mostly intact.
  • Vectors from base models remain useful after instruction tuning.
  • The effect holds from isolated word translations to full sentences in limited tests.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This pattern could lower the cost of building multilingual translation systems by reusing one vector across languages.
  • Similar language-agnostic behavior might appear in other task vectors such as summarization or reasoning.
  • Future work could test whether the same vectors work when source and target languages both differ from the extraction pair.

Load-bearing premise

Observed gains in token ranking come from the language-agnostic properties of the extracted vectors rather than from the in-context examples or other setup details.

What would settle it

A controlled test showing that random vectors or in-context examples alone produce equal or larger rank improvements than the extracted FVs when applied to new target languages.

Figures

Figures reproduced from arXiv: 2604.19678 by Fajri Koto, Gerard I. G\'allego, Javier Ferrando, Nurkhan Laiyk.

Figure 1
Figure 1. Figure 1: Example of FV effects on the next-token distribution. For a single source word, we show the top tokens whose probability (∆p) increases the most after FV, highlighting that the largest gains often correspond to plausible translations across multiple languages. it requires both language understanding and language-specific generation, making it a priori unclear whether the underlying computation should be sh… view at source ↗
Figure 2
Figure 2. Figure 2: Left: average mean ∆rank across edit layers for different target languages for Llama-3.2-3B. Curves correspond to target languages, and higher values mean that the FV moves the correct translation token closer to the top of the next-token distribution. Right: per-language absolute mean rank for LLaMA-3.2-3B under the clean setting and after FV intervention. A positive ∆rank indicates that the FV moves the … view at source ↗
Figure 3
Figure 3. Figure 3: Overlap of FV head sets across translation directions. For each English→X direction (X ∈ {es,fr, de}), we compute a translation function vector using the top-10 attention heads selected by the FV procedure. The dashed bracket highlights heads that ap￾pear in the top-10 set for all three directions (shared FV heads), while the colored blocks indicate direction-specific heads unique to a given pair. We also … view at source ↗
Figure 4
Figure 4. Figure 4: Token-level effect of FV steering. For source words, we compare the next-token distribution with and without FV injection and list tokens ranked by their probability increase under the intervention (+FV). are provided in Appendix A.1. Together with the ∆rank results, this confirms that the FV induces a broadly multilingual translation effect rather than a language-pair-specific mapping. A natural follow-up… view at source ↗
Figure 5
Figure 5. Figure 5: Mean ∆rank across edit layers for different target languages under FV ablation. Curves correspond to target languages. More negative values indicate that ablating the FV moves the correct translation token lower in the next-token ranking relative to the clean setting. Data ARC_EASY BOOLQ COPA HELLA￾SWAG OPEN￾BOOKQA PIQA SCIQ WINO￾GRANDE MMLU_DE MMLU_ES MMLU_FR Llama-3.2-3B Clean 0.74 0.74 0.82 0.56 0.31 0.… view at source ↗
Figure 6
Figure 6. Figure 6: Average mean ∆rank across edit layers for different target languages for Llama-3.2-3B-Instruct, using an FV com￾puted from Llama-3.2-3B. The previous sections establish that translation FVs transfer across target languages when ex￾tracted and applied within the same model. We now test whether this language-agnostic prop￾erty extends to instruction-tuned models. We compute translation FVs on the base model … view at source ↗
Figure 7
Figure 7. Figure 7: Sentence-level translation performance for Llama-3.2-3B. Left: XCOMET scores. [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Example of FV effects on the next-token distribution for Tiny-Aya. For a single source word, we show the top tokens whose probability increases the most after FV, highlighting that the largest gains often correspond to plausible translations across multiple languages. 12 [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Example of FV effects on the next-token distribution for Gemma-2-2B. For a single source word, we show the top tokens whose probability increases the most after FV, highlighting that the largest gains often correspond to plausible translations across multiple languages. A.2 Per-direction mean ∆rank results This section reports the mean ∆rank curves separately for each source FV, allowing compar￾ison of the… view at source ↗
Figure 10
Figure 10. Figure 10: Mean ∆ rank across edit layers for different target languages for Llama-3.2-3B Curves correspond to target languages, and higher values mean that the FV moves the correct translation token closer to the top of the next-token distribution. 13 [PITH_FULL_IMAGE:figures/full_fig_p013_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Mean ∆rank across edit layers for different target languages for Gemma-2-2B. Curves correspond to target languages, and higher values mean that the FV moves the correct translation token closer to the top of the next-token distribution [PITH_FULL_IMAGE:figures/full_fig_p014_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Mean ∆rank across edit layers for different target languages for Tiny-Aya. Curves correspond to target languages, and higher values mean that the FV moves the correct translation token closer to the top of the next-token distribution. 14 [PITH_FULL_IMAGE:figures/full_fig_p014_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Mean ∆rank across edit layers for different target languages for Gemma-2-2B￾Instruct using an FV computed from Gemma-2-2B. Curves correspond to target languages, and higher values mean that the FV moves the correct translation token closer to the top of the next-token distribution [PITH_FULL_IMAGE:figures/full_fig_p015_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Mean ∆rank across edit layers for different target languages for Tiny-Aya￾global using an FV computed from Tiny-Aya. Curves correspond to target languages, and higher values mean that the FV moves the correct translation token closer to the top of the next-token distribution. A.4 Sentence-level translation performance This section reports the sentence-level translation results using XCOMET and BLEU scores… view at source ↗
Figure 15
Figure 15. Figure 15: Sentence-level translation performance for Tiny-Aya. Left: XCOMET scores. [PITH_FULL_IMAGE:figures/full_fig_p015_15.png] view at source ↗
read the original abstract

Function vectors (FVs) are vector representations of tasks extracted from model activations during in-context learning. While prior work has shown that multilingual model representations can be language-agnostic, it remains unclear whether the same holds for function vectors. We study whether FVs exhibit language-agnosticity, using machine translation as a case study. Across three decoder-only multilingual LLMs, we find that translation FVs extracted from a single English$\rightarrow$Target direction transfer to other target languages, consistently improving the rank of correct translation tokens across multiple unseen languages. Ablation results show that removing the FV degrades translation across languages with limited impact on unrelated tasks. We further show that base-model FVs transfer to instruction-tuned variants and partially generalize from word-level to sentence-level translation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper investigates language-agnosticity of function vectors (FVs) extracted from decoder-only multilingual LLMs during in-context learning, using machine translation as a case study. It claims that FVs derived from a single English-to-target direction transfer to improve the rank of correct translation tokens on multiple unseen target languages across three models, with ablations showing performance degradation on translation when the FV is removed but limited impact on unrelated tasks; additional results indicate transfer to instruction-tuned variants and partial generalization from word- to sentence-level translation.

Significance. If the central claim holds after addressing controls, the work would strengthen evidence that FVs can encode transferable, language-independent task functions in LLMs, with potential implications for cross-lingual model editing and efficient multilingual in-context learning. The consistency of results across three models and the use of ablation experiments to isolate the FV contribution are notable empirical strengths.

major comments (2)
  1. [Methods / Experimental Setup] The experimental setup does not include a control condition that holds the in-context examples, prompt format, and model fixed while substituting the extracted FV with an orthogonal or random vector. This is load-bearing for the language-agnostic interpretation, as improvements in correct-token rank on unseen targets could arise from the shared examples or model biases rather than properties of the FV itself (see abstract and skeptic note on weakest assumption).
  2. [Results / Ablations] Ablation results are reported as degrading translation performance, but without details on the exact ablation method (e.g., zeroing vs. replacing with noise), statistical significance tests, or per-language effect sizes, it is difficult to confirm that the FV effect is isolated from other factors.
minor comments (2)
  1. [Methods] Clarify the precise definition and extraction procedure for FVs in the methods section, including any hyperparameters or layer choices, to aid reproducibility.
  2. [Abstract] The abstract mentions 'consistent improvements' but does not specify the magnitude or variance; adding quantitative summaries (e.g., average rank improvement) would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which help clarify the strength of evidence for language-agnosticity in function vectors. We address each major point below and commit to revisions that strengthen the experimental controls and reporting without altering the core claims.

read point-by-point responses
  1. Referee: The experimental setup does not include a control condition that holds the in-context examples, prompt format, and model fixed while substituting the extracted FV with an orthogonal or random vector. This is load-bearing for the language-agnostic interpretation, as improvements in correct-token rank on unseen targets could arise from the shared examples or model biases rather than properties of the FV itself (see abstract and skeptic note on weakest assumption).

    Authors: We agree that a direct control substituting the extracted FV with a random or orthogonal vector (while fixing examples, prompt, and model) would more rigorously isolate the FV's contribution from prompt or model biases. Our current ablations demonstrate performance degradation upon FV removal, and the cross-lingual transfer to unseen targets provides supporting evidence that the effect is not solely from the English-to-single-target examples. Nevertheless, to address the concern directly, we will add the requested control experiments in the revised manuscript, including comparisons to random vectors sampled from the activation distribution and to orthogonal vectors in the same subspace. revision: yes

  2. Referee: Ablation results are reported as degrading translation performance, but without details on the exact ablation method (e.g., zeroing vs. replacing with noise), statistical significance tests, or per-language effect sizes, it is difficult to confirm that the FV effect is isolated from other factors.

    Authors: We appreciate this observation on reporting clarity. The ablation in the current manuscript consists of zeroing the FV activations at the relevant layers during inference. We will revise the Methods section to specify this procedure explicitly, add statistical significance tests (e.g., paired comparisons across multiple seeds or languages), and report per-language effect sizes for the change in correct-token rank. These details will be included in the revised results and supplementary material. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical transfer results independent of inputs

full rationale

The paper reports experimental findings on function vector extraction and transfer across languages in decoder-only LLMs, using ablation studies and token-rank improvements as evidence. No derivations, equations, or first-principles predictions are presented that reduce to fitted parameters, self-definitions, or self-citation chains. Claims rest on observable outcomes from controlled transfer and ablation setups rather than any renaming, ansatz smuggling, or uniqueness imported from prior author work. The central language-agnosticity conclusion is tested via cross-lingual generalization, which does not collapse to the extraction procedure by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that function vectors isolate task functions independently of language; no free parameters or new entities are introduced.

axioms (1)
  • domain assumption Function vectors extracted from activations during in-context learning capture task-specific functions in a language-independent way.
    This premise underpins the transfer experiments and is the core hypothesis being tested.

pith-pipeline@v0.9.0 · 5439 in / 1303 out tokens · 66063 ms · 2026-05-10T02:46:43.437118+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

66 extracted references · 22 canonical work pages · 6 internal anchors

  1. [1]

    Aho and Jeffrey D

    Alfred V. Aho and Jeffrey D. Ullman , title =. 1972

  2. [2]

    Publications Manual , year = "1983", publisher =

  3. [3]

    Chandra and Dexter C

    Ashok K. Chandra and Dexter C. Kozen and Larry J. Stockmeyer , year = "1981", title =. doi:10.1145/322234.322243

  4. [4]

    Scalable training of

    Andrew, Galen and Gao, Jianfeng , booktitle=. Scalable training of

  5. [5]

    Dan Gusfield , title =. 1997

  6. [6]

    Tetreault , title =

    Mohammad Sadegh Rasooli and Joel R. Tetreault , title =. Computing Research Repository , volume =. 2015 , url =

  7. [7]

    A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =

    Ando, Rie Kubota and Zhang, Tong , Issn =. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =. Journal of Machine Learning Research , Month = dec, Numpages =

  8. [8]

    Advances in Neural Information Processing Systems , year =

    Training language models to follow instructions with human feedback , author =. Advances in Neural Information Processing Systems , year =

  9. [9]

    Advances in Neural Information Processing Systems , year=

    Chain-of-Thought Prompting Elicits Reasoning in Large Language Models , author=. Advances in Neural Information Processing Systems , year=

  10. [10]

    The Twelfth International Conference on Learning Representations , year=

    Function Vectors in Large Language Models , author=. The Twelfth International Conference on Learning Representations , year=

  11. [11]

    2024 , howpublished =

    google/gemma-2-2b-it: Model Card , author =. 2024 , howpublished =

  12. [12]

    2024 , howpublished =

    Llama 3.2: Model Cards and Prompt Formats , author =. 2024 , howpublished =

  13. [13]

    2024 , journal =

    Gemma 2: Improving Open Language Models at a Practical Size , author =. 2024 , journal =

  14. [14]

    2024 , howpublished =

    meta-llama/Llama-3.2-3B-Instruct: Model Card , author =. 2024 , howpublished =

  15. [15]

    Findings of the Association for Computational Linguistics: EMNLP 2023 , year =

    In-Context Learning Creates Task Vectors , author =. Findings of the Association for Computational Linguistics: EMNLP 2023 , year =

  16. [16]

    Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL) , year =

    Steering Llama 2 via Contrastive Activation Addition , author =. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL) , year =

  17. [17]

    Editing Models with Task Arithmetic

    Editing Models with Task Arithmetic , author =. arXiv preprint arXiv:2212.04089 , year =

  18. [19]

    Linear Representations of Sentiment in Large Language Models

    Linear Representations of Sentiment in Large Language Models , author =. arXiv preprint arXiv:2310.15154 , year =

  19. [20]

    Persona Vectors: Monitoring and Controlling Character Traits in Language Models

    Persona Vectors: Monitoring and Controlling Character Traits in Language Models , author =. arXiv preprint arXiv:2507.21509 , year =

  20. [21]

    Advances in Neural Information Processing Systems (NeurIPS) , year =

    Language Models are Few-Shot Learners , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

  21. [22]

    arXiv preprint arXiv:2205.05124 , year=

    Extracting Latent Steering Vectors from Pretrained Language Models , author =. Findings of the Association for Computational Linguistics: ACL , year =. 2205.05124 , archivePrefix =

  22. [23]

    Steering Language Models With Activation Engineering

    Steering Language Models With Activation Engineering , author =. 2023 , url =. 2308.10248 , archivePrefix =

  23. [24]

    In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering, February 2024

    In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering , author =. 2023 , url =. 2311.06668 , archivePrefix =

  24. [25]

    Representation Engineering: A Top-Down Approach to AI Transparency

    Andy Zou and Long Phan and Sarah Chen and James Campbell and Phillip Guo and Richard Ren and Ian Reynolds and Aditi Raghunathan and Percy Liang and Tatsunori Hashimoto and others , year =. Representation Engineering: A Top-Down Approach to. 2310.01405 , archivePrefix =

  25. [26]

    Gao, Leo and Tow, Jonathan and Abbasi, Stella Biderman and others , booktitle=. The

  26. [27]

    ACL , year=

    HellaSwag: Can a Machine Really Finish Your Sentence? , author=. ACL , year=

  27. [28]

    AAAI , year=

    PIQA: Reasoning about Physical Commonsense in Natural Language , author=. AAAI , year=

  28. [29]

    Think You Have Solved Question Answering? Try

    Clark, Peter and Cowhey, Isaac and Etzioni, Oren and others , booktitle=. Think You Have Solved Question Answering? Try

  29. [30]

    EMNLP , year=

    Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering , author=. EMNLP , year=

  30. [31]

    Workshops at EMNLP , year=

    Crowdsourcing Multiple Choice Science Questions , author=. Workshops at EMNLP , year=

  31. [32]

    AAAI , year=

    WinoGrande: An Adversarial Winograd Schema Challenge at Scale , author=. AAAI , year=

  32. [33]

    NAACL , year=

    BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions , author=. NAACL , year=

  33. [34]

    AAAI Spring Symposium , year=

    Choice of Plausible Alternatives: An Evaluation of Commonsense Causal Reasoning , author=. AAAI Spring Symposium , year=

  34. [35]

    How Multilingual is Multilingual BERT ?

    Pires, Telmo and Schlinger, Eva and Garrette, Dan , booktitle =. How Multilingual is Multilingual. 2019 , address =. doi:10.18653/v1/P19-1493 , pages =

  35. [36]

    Beto, bentz, becas: The surprising cross-lingual effectiveness of BERT

    Wu, Shijie and Dredze, Mark , booktitle =. Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of. 2019 , address =. doi:10.18653/v1/D19-1077 , pages =

  36. [37]

    Separating Tongue from Thought: Activation Patching Reveals Language-Agnostic Concept Representations in Transformers

    Separating Tongue from Thought: Activation Patching Reveals Language-Agnostic Concept Representations in Transformers , author =. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , month = jul, year =. doi:10.18653/v1/2025.acl-long.1536 , pages =

  37. [38]

    2026 , howpublished =

  38. [39]

    2024 , eprint =

    The Llama 3 Herd of Models , author =. 2024 , eprint =

  39. [40]

    2024 , howpublished =

    Gemma 2: Improving Open Language Models at a Practical Size , author =. 2024 , howpublished =

  40. [41]

    2024 , howpublished =

    Llama 3.2 Model Cards and Prompt Formats , author =. 2024 , howpublished =

  41. [42]

    Goyal, Naman and Gao, Cynthia and Chaudhary, Vishrav and Chen, Peng-Jen and Wenzek, Guillaume and Ju, Da and Krishnan, Sanjana and Ranzato, Marc'Aurelio and Guzm. The. 2022 , eprint =

  42. [43]

    No Language Left Behind: Scaling Human-Centered Machine Translation

    No Language Left Behind: Scaling Human-Centered Machine Translation , author =. arXiv preprint arXiv:2207.04672 , year =

  43. [44]

    2025 , booktitle =

    Exploring the Translation Mechanism of Large Language Models , author =. 2025 , booktitle =

  44. [45]

    2026 , journal =

    Disentangling meaning from language in LLM-based machine translation , author =. 2026 , journal =. 2602.04613 , archivePrefix=

  45. [46]

    The Thirteenth International Conference on Learning Representations , year=

    Improving Instruction-Following in Language Models through Activation Steering , author=. The Thirteenth International Conference on Learning Representations , year=

  46. [47]

    2022 , eprint=

    Training language models to follow instructions with human feedback , author=. 2022 , eprint=

  47. [48]

    Prompting P a LM for Translation: Assessing Strategies and Performance

    Vilar, David and Freitag, Markus and Cherry, Colin and Luo, Jiaming and Ratnakar, Viresh and Foster, George. Prompting P a LM for Translation: Assessing Strategies and Performance. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2023. doi:10.18653/v1/2023.acl-long.859

  48. [49]

    Do Llamas Work in E nglish? On the Latent Language of Multilingual Transformers

    Wendler, Chris and Veselovsky, Veniamin and Monea, Giovanni and West, Robert. Do Llamas Work in E nglish? On the Latent Language of Multilingual Transformers. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.acl-long.820

  49. [50]

    2024 , eprint=

    The Platonic Representation Hypothesis , author=. 2024 , eprint=

  50. [51]

    On the Similarity of Circuits across Languages: a Case Study on the Subject-verb Agreement Task

    Ferrando, Javier and Costa-juss \`a , Marta R. On the Similarity of Circuits across Languages: a Case Study on the Subject-verb Agreement Task. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings-emnlp.591

  51. [52]

    Large Language Models Share Representations of Latent Grammatical Concepts Across Typologically Diverse Languages

    Brinkmann, Jannik and Wendler, Chris and Bartelt, Christian and Mueller, Aaron. Large Language Models Share Representations of Latent Grammatical Concepts Across Typologically Diverse Languages. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1:...

  52. [53]

    The Thirteenth International Conference on Learning Representations , year=

    The Same but Different: Structural Similarities and Differences in Multilingual Language Modeling , author=. The Thirteenth International Conference on Learning Representations , year=

  53. [54]

    Lindsey, Jack and Gurnee, Wes and Ameisen, Emmanuel and Chen, Brian and Pearce, Adam and Turner, Nicholas L. and Citro, Craig and Abrahams, David and Carter, Shan and Hosmer, Basil and Marcus, Jonathan and Sklar, Michael and Templeton, Adly and Bricken, Trenton and McDougall, Callum and Cunningham, Hoagy and Henighan, Thomas and Jermyn, Adam and Jones, An...

  54. [55]

    How Do Multilingual Language Models Remember Facts?

    Fierro, Constanza and Foroutan, Negar and Elliott, Desmond and S gaard, Anders. How Do Multilingual Language Models Remember Facts?. Findings of the Association for Computational Linguistics: ACL 2025. 2025. doi:10.18653/v1/2025.findings-acl.827

  55. [56]

    2021 , pages =

    Xue, Linting and Constant, Noah and Roberts, Adam and Kale, Mihir and Al-Rfou, Rami and Siddhant, Aditya and Barua, Aditya and Raffel, Colin , booktitle =. 2021 , pages =

  56. [57]

    Aya model: An instruction finetuned open-access multilingual language model.arXiv preprint arXiv:2402.07827,

    Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model , author =. arXiv preprint arXiv:2402.07827 , year =

  57. [58]

    Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , pages=

    Revealing the Parallel Multilingual Learning within Large Language Models , author=. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , pages=. 2024 , address=

  58. [59]

    Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

    Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=. 2024 , address=

  59. [60]

    A Question of Degree: Exploring the Role of Language Proficiency in

    Wenhao Zhu and Hongyi Liu and Shujian Huang and Lei Li and Jiajun Chen and Fei Yuan , booktitle=. A Question of Degree: Exploring the Role of Language Proficiency in. 2024 , address=

  60. [61]

    Explainability and Interpretability of Multilingual Large Language Models: A Survey

    Resck, Lucas and Augenstein, Isabelle and Korhonen, Anna. Explainability and Interpretability of Multilingual Large Language Models: A Survey. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/2025.emnlp-main.1033

  61. [62]

    BLEU : a Method for Automatic Evaluation of Machine Translation

    Papineni, Kishore and Roukos, Salim and Ward, Todd and Zhu, Wei-Jing. BLEU : a Method for Automatic Evaluation of Machine Translation. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. 2002

  62. [63]

    and Rei, Ricardo and Ferrer, Cristina and de Souza, Jos \'e G

    Guerreiro, Nuno M. and Rei, Ricardo and Ferrer, Cristina and de Souza, Jos \'e G. C. and Martins, Andr \'e F. T. xCOMET : Transparent Machine Translation Evaluation through Fine-grained Error Detection. Transactions of the Association for Computational Linguistics. 2024

  63. [64]

    2024 , eprint=

    A Primer on the Inner Workings of Transformer-based Language Models , author=. 2024 , eprint=

  64. [65]

    Unsupervised Cross-lingual Representation Learning at Scale

    Conneau, Alexis and Khandelwal, Kartikay and Goyal, Naman and Chaudhary, Vishrav and Wenzek, Guillaume and Guzm \'a n, Francisco and Grave, Edouard and Ott, Myle and Zettlemoyer, Luke and Stoyanov, Veselin. Unsupervised Cross-lingual Representation Learning at Scale. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. ...

  65. [66]

    Analyzing the Mono- and Cross-Lingual Pretraining Dynamics of Multilingual Language Models

    Blevins, Terra and Gonen, Hila and Zettlemoyer, Luke. Analyzing the Mono- and Cross-Lingual Pretraining Dynamics of Multilingual Language Models. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 2022. doi:10.18653/v1/2022.emnlp-main.234

  66. [67]

    Proceedings of the 41st International Conference on Machine Learning , pages =

    A Closer Look at the Limitations of Instruction Tuning , author =. Proceedings of the 41st International Conference on Machine Learning , pages =. 2024 , editor =