pith. sign in

arxiv: 2605.23069 · v1 · pith:TP7MGPQ5new · submitted 2026-05-21 · 💻 cs.CL

DFKI-MLT at SemEval-2026 TASK 7: Steering Multilingual Models Towards Cultural Knowledge

Pith reviewed 2026-05-25 05:23 UTC · model grok-4.3

classification 💻 cs.CL
keywords activation steeringcultural awarenessmultilingual LLMslanguage vectorsFLORES datainference-time adaptationSemEval task
0
0 comments X

The pith

Activation steering with language vectors from parallel data produces modest, layer-dependent gains on cultural reasoning in multilingual models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether adding language-specific vectors to the residual stream of multilingual LLMs can adapt them for cultural awareness without any training. Vectors are pulled from parallel sentences in the FLORES dataset and injected at a chosen transformer layer during inference. This method is evaluated on the SemEval-2026 cultural awareness task, where it reaches 86.96 percent accuracy in the multiple-choice track. Post-hoc checks reveal that any gains are small, strongly affected by layer choice, inconsistent across language-region pairs, and sometimes lower performance, while also depending on prompt wording.

Core claim

Language vectors extracted from parallel FLORES data are added to the residual stream of multilingual LLMs at a selected layer to steer the model toward cultural knowledge at inference time with no parameter updates. The method is applied to SemEval-2026 Task 7, yielding 86.96 percent accuracy on the official MCQ track. Analyses of both MCQ and SAQ settings show that the resulting improvements on cultural reasoning are modest and heterogeneous, highly sensitive to layer selection, variable across language-region pairs with some settings causing degradation, and interactive with generic versus culturally conditioned prompts.

What carries the argument

Addition of language-specific steering vectors extracted from parallel FLORES data to the residual stream at a chosen transformer layer.

If this is right

  • The method requires no parameter updates to adapt models for new cultural contexts.
  • Steering effectiveness depends critically on which transformer layer receives the vector addition.
  • Gains differ substantially by language and region and can turn negative in some configurations.
  • Steering interacts with prompt formulation, requiring joint tuning of vectors and prompts.
  • The approach was applied to both short-answer and multiple-choice formats in the shared task.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The observed layer sensitivity implies that cultural information is represented at different depths across languages inside the model.
  • The inconsistency across language pairs suggests that parallel data alone may not capture all region-specific cultural signals equally.
  • Joint optimization of steering vectors and prompt design could be tested by searching over both simultaneously on held-out cultural questions.

Load-bearing premise

Vectors taken from parallel FLORES sentences carry transferable cultural knowledge that can be injected via residual-stream addition to raise performance on cultural reasoning tasks.

What would settle it

Selecting the layer that maximizes performance on a validation split and then measuring accuracy on the full task set; if overall accuracy shows no gain or a net loss compared with the unsteered model across multiple language pairs, the steering approach would be falsified.

Figures

Figures reproduced from arXiv: 2605.23069 by Cristina Espa\~na-Bonet, Daniil Gurgurov, Josef van Genabith, Simon Ostermann, Yasser Hamidullah, Yusser Al Ghussin.

Figure 1
Figure 1. Figure 1: Motivation: if culture overlaps with language [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 4
Figure 4. Figure 4: Top and bottom per-language MCQ accuracy [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 2
Figure 2. Figure 2: Post-hoc cross-prompt layer sweeps for Qwen2.5-72B-Instruct with β = 1 on MCQ (top) and SAQ (bottom). The official submission uses the cul￾tural prompt [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Post-hoc overall MCQ layer sweeps for Qwen2.5-72B-Instruct under different steering strengths (β ∈ {1, 3, 5}). analyses on both MCQ and SAQ evaluation data across multiple models (Qwen2.5-72B/7B, Aya Ex￾panse 8B/32B, Qwen3 8B/32B) using the same evaluation metrics provided by the SemEval-2026 organizers for each track. We observe: (i) strong layer sensitivity: steering gains con￾centrate in a subset of lay… view at source ↗
Figure 5
Figure 5. Figure 5: Per-locale steering effect for Qwen2.5-72B: [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: DiffMean vector convergence vs. FLORES sample size: [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 10
Figure 10. Figure 10: Post-hoc overall MCQ layer sweeps for Qwen2.5-7B-Instruct under different steering strengths (β ∈ {1, 3, 5}). Large steering strengths can substan￾tially degrade performance in early layers, while β = 1 remains stable and yields the best overall trade-off in our experiments [PITH_FULL_IMAGE:figures/full_fig_p010_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Post-hoc cross-prompt MCQ layer sweep for [PITH_FULL_IMAGE:figures/full_fig_p010_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Top and bottom per-language MCQ accuracy [PITH_FULL_IMAGE:figures/full_fig_p010_12.png] view at source ↗
Figure 16
Figure 16. Figure 16: Post-hoc overall MCQ layer sweeps for Aya Expanse 32B under different steering strengths (β ∈ {1, 3, 5}). Large steering strengths can substan￾tially degrade performance in early layers, while im￾proving or meeting performance in mid to late layers. In comparison, β = 1 remains stable across layers al￾though β = 5 yields the best overall Acc in our experi￾ments [PITH_FULL_IMAGE:figures/full_fig_p011_16.png] view at source ↗
Figure 14
Figure 14. Figure 14: Post-hoc cross-prompt MCQ layer sweep for [PITH_FULL_IMAGE:figures/full_fig_p011_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Top and bottom per-language MCQ accuracy [PITH_FULL_IMAGE:figures/full_fig_p011_15.png] view at source ↗
Figure 22
Figure 22. Figure 22: Post-hoc overall MCQ layer sweeps for Qwen3-32B under different steering strengths (β ∈ {1, 3, 5}). Large steering strengths can substantially degrade performance in early layers, while β = 1 re￾mains stable although β = 5 yields the best overall Acc in our experiments [PITH_FULL_IMAGE:figures/full_fig_p012_22.png] view at source ↗
Figure 20
Figure 20. Figure 20: Post-hoc cross-prompt MCQ layer sweep for Qwen3-8B with β = 1. The official submission uses the cultural prompt. Prompt choice affects both baseline accuracy and the optimal steering layer (here, Layer 14 for the cultural prompt and Layer 25 for the generic prompt) [PITH_FULL_IMAGE:figures/full_fig_p012_20.png] view at source ↗
Figure 21
Figure 21. Figure 21: Top and bottom per-language MCQ accuracy [PITH_FULL_IMAGE:figures/full_fig_p012_21.png] view at source ↗
Figure 28
Figure 28. Figure 28: Post-hoc overall SAQ layer sweeps for Qwen2.5-7B-Instruct under different steering strengths (β ∈ {1, 3, 5}). Large steering strengths can substan￾tially degrade performance in early layers, while β = 1 remains stable and yields the best overall trade-off in our experiments [PITH_FULL_IMAGE:figures/full_fig_p013_28.png] view at source ↗
Figure 29
Figure 29. Figure 29: Post-hoc cross-prompt SAQ layer sweep for [PITH_FULL_IMAGE:figures/full_fig_p013_29.png] view at source ↗
Figure 30
Figure 30. Figure 30: Top and bottom per-language SAQ accuracy [PITH_FULL_IMAGE:figures/full_fig_p013_30.png] view at source ↗
Figure 31
Figure 31. Figure 31: Post-hoc overall SAQ layer sweeps for Aya [PITH_FULL_IMAGE:figures/full_fig_p014_31.png] view at source ↗
Figure 35
Figure 35. Figure 35: Post-hoc cross-prompt SAQ layer sweep for [PITH_FULL_IMAGE:figures/full_fig_p014_35.png] view at source ↗
Figure 36
Figure 36. Figure 36: Top and bottom per-language SAQ accuracy [PITH_FULL_IMAGE:figures/full_fig_p014_36.png] view at source ↗
Figure 37
Figure 37. Figure 37: Post-hoc overall SAQ layer sweeps for Qwen3-8B under different steering strengths (β ∈ {1, 3, 5}). Large steering strengths can substantially degrade performance in early layers, while β = 1 re￾mains stable and yields the best overall trade-off in our experiments [PITH_FULL_IMAGE:figures/full_fig_p015_37.png] view at source ↗
Figure 41
Figure 41. Figure 41: Post-hoc cross-prompt SAQ layer sweep for Qwen3-32B with β = 5. The official submission uses the cultural prompt. Prompt choice affects both baseline accuracy and the optimal steering layer (here, Layer 20 for the cultural prompt and Layer 35 for the generic prompt) [PITH_FULL_IMAGE:figures/full_fig_p015_41.png] view at source ↗
Figure 42
Figure 42. Figure 42: Top and bottom per-language SAQ accuracy [PITH_FULL_IMAGE:figures/full_fig_p015_42.png] view at source ↗
read the original abstract

Large language models (LLMs) are increasingly used across diverse linguistic and cultural contexts, yet their cultural knowledge remains uneven across regions and languages. We present the DFKI-MLT system for SemEval-2026 Task 7 on cultural awareness, where we apply activation steering to multilingual LLMs using language vectors extracted from parallel FLORES data. Our method performs inference-time adaptation by adding language-specific steering vectors to the residual stream at a selected transformer layer, without any parameter updates. We participated in both the short-answer (SAQ) and multiple-choice (MCQ) tracks; however, only our MCQ submission received an official score. In the official MCQ track, we achieved 86.96% accuracy, ranking 7th out of 17 teams. To better understand system behavior, we conduct post-hoc analyses on the shared-task MCQ and SAQ settings. These analyses show that activation steering yields modest and heterogeneous improvements on cultural reasoning: gains are strongly layer-sensitive, vary substantially across language-region pairs, with some configurations even degrading performance, and interact with prompt formulation, comparing generic and culturally conditioned prompts. Our findings suggest that prompt design and activation steering should be jointly optimized for culturally aware multilingual inference.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper describes the DFKI-MLT system for SemEval-2026 Task 7 on cultural awareness. It extracts language vectors from parallel FLORES data and applies activation steering by adding these vectors to the residual stream of multilingual LLMs at a selected layer during inference, without parameter updates. The system achieved an official 86.96% accuracy on the MCQ track (ranking 7th of 17), while post-hoc analyses on both MCQ and SAQ settings report modest, heterogeneous, layer-sensitive gains that vary by language-region pair, sometimes degrade performance, and interact with prompt formulation (generic vs. culturally conditioned). The authors conclude that prompt design and steering should be jointly optimized.

Significance. If the post-hoc results are reproducible, the work supplies concrete empirical observations on the variable and limited effectiveness of activation steering for cultural reasoning in multilingual LLMs. It highlights layer sensitivity, language-region variation, and prompt interactions, which are useful for practitioners working on culturally aware inference. The official shared-task score and use of publicly available parallel data are strengths that support direct evaluation without circularity.

major comments (1)
  1. [Abstract and post-hoc analyses] Abstract and post-hoc analyses section: the central claim that activation steering yields modest heterogeneous improvements rests on analyses whose key implementation details (layer selection criteria, exact vector computation from FLORES, and scaling-factor values) are not provided. This directly affects assessment of the load-bearing assumption that the extracted vectors encode transferable cultural knowledge.
minor comments (2)
  1. [Abstract] The manuscript reports participation in both SAQ and MCQ tracks but provides an official score for only one; clarifying the status of the second track would improve completeness.
  2. [Method description] Notation for the steering operation (residual-stream addition) and the precise definition of the language vector should be stated explicitly with an equation or pseudocode for reproducibility.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the thorough review and the recommendation of minor revision. The positive assessment of the empirical observations on activation steering is appreciated. We address the concern about missing implementation details below.

read point-by-point responses
  1. Referee: [Abstract and post-hoc analyses] Abstract and post-hoc analyses section: the central claim that activation steering yields modest heterogeneous improvements rests on analyses whose key implementation details (layer selection criteria, exact vector computation from FLORES, and scaling-factor values) are not provided. This directly affects assessment of the load-bearing assumption that the extracted vectors encode transferable cultural knowledge.

    Authors: We agree with the referee that the key implementation details for the post-hoc analyses were not sufficiently detailed in the submitted manuscript. This omission affects the ability to fully assess the analyses. We will revise the post-hoc analyses section to include explicit descriptions of the layer selection criteria, the exact procedure for computing the language vectors from the FLORES data, and the scaling-factor values used in the experiments. These additions will be made in the revised manuscript to strengthen the support for our claims. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper reports an empirical application of activation steering on multilingual LLMs using language vectors extracted from public parallel FLORES data, followed by direct evaluation on the shared SemEval-2026 Task 7 MCQ and SAQ tracks. No derivation, first-principles claim, or prediction is presented that reduces by construction to fitted parameters or self-citations; results are post-hoc observations of layer-sensitive, heterogeneous effects with explicit hedging on modest gains and degradations. The central method (residual-stream addition at inference time) is independent of the evaluation data and does not rely on load-bearing self-citations or ansatzes imported from prior author work.

Axiom & Free-Parameter Ledger

2 free parameters · 0 axioms · 0 invented entities

The paper is an empirical system description for a shared task; the abstract does not introduce mathematical derivations, so the ledger is minimal. Layer selection and any implicit scaling of the steering vector function as free parameters chosen on the basis of experimentation not detailed here.

free parameters (2)
  • selected transformer layer
    The layer at which the steering vector is added is described as 'selected' without a stated selection procedure or justification in the abstract.
  • steering vector scaling factor
    The magnitude with which the language vector is added to the residual stream is not specified and must be treated as a tunable hyperparameter.

pith-pipeline@v0.9.0 · 5780 in / 1386 out tokens · 28833 ms · 2026-05-25T05:23:01.440913+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

82 extracted references · 82 canonical work pages

  1. [1]

    Nedjma Ousidhoum and Junho Myung and Carla Perez-Almendros and Jiho Jin and Amr Keleg and Meriem Beloucif and Yi Zhou and Rodrigo Agerri and Vladimir Araujo and Naomi Baes and James Barry and Joanne Boisson and Nancy F. Chen and Christine de Kock and Aleksandra Edwards and Joseba Fernandez de Landa and Mohamed Fazli Imam and Huda Hakami and Shu-Kai Hsieh ...

  2. [2]

    Advances in Neural Information Processing Systems , volume=

    Blend: A benchmark for llms on everyday knowledge in diverse cultures and languages , author=. Advances in Neural Information Processing Systems , volume=

  3. [3]

    Proceedings of the 20th International Workshop on Semantic Evaluation (SemEval-2026). 2026

  4. [4]

    Qwen3 Technical Report , url =

    Qwen Team , journal =. Qwen3 Technical Report , url =

  5. [5]

    Aya Expanse: Combining Research Breakthroughs for a New Multilingual Frontier , url =

    John Dang and Shivalika Singh and Daniel D'souza and Arash Ahmadian and Alejandro Salamanca and Madeline Smith and Aidan Peppin and Sungjin Hong and Manoj Govindassamy and Terrence Zhao and Sandra Kublik and Meor Amer and Viraat Aryabumi and Jon Ander Campos and Yi-Chern Tan and Tom Kocmi and Florian Strub and Nathan Grinsztajn and Yannis Flet-Berliac and...

  6. [6]

    Cultural bias and cultural alignment of large language models , volume =

    Tao, Yan and Viberg, Olga and Baker, Ryan S and Kizilcec, Ren. Cultural bias and cultural alignment of large language models , volume =. PNAS nexus , number =

  7. [7]

    NLLB Team and Marta R. Costa-jussà and James Cross and Onur Çelebi and Maha Elbayad and Kenneth Heafield and Kevin Heffernan and Elahe Kalbassi and Janice Lam and Daniel Licht and Jean Maillard and Anna Sun and Skyler Wang and Guillaume Wenzek and Al Youngblood and Bapi Akula and Loic Barrault and Gabriel Mejia Gonzalez and Prangthip Hansanti and John Hof...

  8. [8]

    Computational Linguistics , volume=

    Survey of cultural awareness in language models: Text and beyond , author=. Computational Linguistics , volume=. 2025 , publisher=

  9. [9]

    Junho Myung and Nayeon Lee and Yi Zhou and Jiho Jin and Rifki Afina Putri and Dimosthenis Antypas and Hsuvas Borkakoty and Eunsu Kim and Carla P. BLEnD:. Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, NeurIPS 2024, Vancouver, BC, Canada, December 10 - 15, 2024 , editor =

  10. [10]

    Qwen2.5 Technical Report , url =

    Qwen and : and An Yang and Baosong Yang and Beichen Zhang and Binyuan Hui and Bo Zheng and Bowen Yu and Chengyuan Li and Dayiheng Liu and Fei Huang and Haoran Wei and Huan Lin and Jian Yang and Jianhong Tu and Jianwei Zhang and Jianxin Yang and Jiaxi Yang and Jingren Zhou and Junyang Lin and Kai Dang and Keming Lu and Keqin Bao and Kexin Yang and Le Yu an...

  11. [11]

    Isolating culture neurons in multilingual large language models , author=. Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics , pages=

  12. [12]

    and Nguyen, Thien Huu , booktitle =

    Nguyen, Thuat and Nguyen, Chien Van and Lai, Viet Dac and Man, Hieu and Ngo, Nghia Trung and Dernoncourt, Franck and Rossi, Ryan A. and Nguyen, Thien Huu , booktitle =

  13. [13]

    doi:10.18653/v1/2023.emnlp-main.981 , editor =

    Mukherjee, Anjishnu and Raj, Chahat and Zhu, Ziwei and Anastasopoulos, Antonios , booktitle =. doi:10.18653/v1/2023.emnlp-main.981 , editor =

  14. [14]

    CultureLLM: Incorporating Cultural Differences into Large Language Models , url =

    Cheng Li and Mengzhuo Chen and Jindong Wang and Sunayana Sitaram and Xing Xie , bibsource =. CultureLLM: Incorporating Cultural Differences into Large Language Models , url =. Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, NeurIPS 2024, Vancouver, BC, Canada, December 10 - 15, 2024 , editor =

  15. [15]

    Toxicity in chatgpt: Analyzing persona-assigned language models , url =

    Deshpande, Ameet and Murahari, Vishvak and Rajpurohit, Tanmay and Kalyan, Ashwin and Narasimhan, Karthik , booktitle =. Toxicity in chatgpt: Analyzing persona-assigned language models , url =. doi:10.18653/v1/2023.findings-emnlp.88 , editor =

  16. [16]

    Understanding intermediate layers using linear classifier probes , url =

    Guillaume Alain and Yoshua Bengio , journal =. Understanding intermediate layers using linear classifier probes , url =

  17. [17]

    Steering llama 2 via contrastive activation addition , year =

    Rimsky, Nina and Gabrieli, Nick and Schulz, Julian and Tong, Meg and Hubinger, Evan and Turner, Alexander , booktitle =. Steering llama 2 via contrastive activation addition , year =

  18. [18]

    Byun and Zifan Wang and Alex Mallen and Steven Basart and Sanmi Koyejo and Dawn Song and Matt Fredrikson and J

    Andy Zou and Long Phan and Sarah Chen and James Campbell and Phillip Guo and Richard Ren and Alexander Pan and Xuwang Yin and Mantas Mazeika and Ann-Kathrin Dombrowski and Shashwat Goel and Nathaniel Li and Michael J. Byun and Zifan Wang and Alex Mallen and Steven Basart and Sanmi Koyejo and Dawn Song and Matt Fredrikson and J. Zico Kolter and Dan Hendryc...

  19. [19]

    Fluent but Culturally Distant: Can Regional Training Teach Cultural Understanding? , url =

    Agarwal, Dhruv and Shukla, Anya and Sitaram, Sunayana and Vashistha, Aditya , journal =. Fluent but Culturally Distant: Can Regional Training Teach Cultural Understanding? , url =

  20. [20]

    ArXiv preprint , title =

    Rystr. ArXiv preprint , title =

  21. [21]

    Disentangling Language and Culture for Evaluating Multilingual Large Language Models , url =

    Ying, Jiahao and Tang, Wei and Zhao, Yiran and Cao, Yixin and Rong, Yu and Zhang, Wenxuan , booktitle =. Disentangling Language and Culture for Evaluating Multilingual Large Language Models , url =. doi:10.18653/v1/2025.acl-long.1082 , editor =

  22. [22]

    The Linear Representation Hypothesis and the Geometry of Large Language Models , url =

    Kiho Park and Yo Joong Choe and Victor Veitch , bibsource =. The Linear Representation Hypothesis and the Geometry of Large Language Models , url =. Forty-first International Conference on Machine Learning,

  23. [23]

    Style Vectors for Steering Generative Large Language Model , url =

    Kai Konen and Sophie Jentzsch and Diaoulé Diallo and Peer Schütt and Oliver Bensch and Roxanne El Baff and Dominik Opitz and Tobias Hecking , journal =. Style Vectors for Steering Generative Large Language Model , url =

  24. [24]

    Steering Llama 2 via Contrastive Activation Addition , url =

    Rimsky, Nina and Gabrieli, Nick and Schulz, Julian and Tong, Meg and Hubinger, Evan and Turner, Alexander , booktitle =. Steering Llama 2 via Contrastive Activation Addition , url =. doi:10.18653/v1/2024.acl-long.828 , editor =

  25. [25]

    Tyler A. Chang and Catherine Arnett and Abdelrahman Eldesokey and Abdelrahman Sadallah and Abeer Kashar and Abolade Daud and Abosede Grace Olanihun and Adamu Labaran Mohammed and Adeyemi Praise and Adhikarinayum Meerajita Sharma and Aditi Gupta and Afitab Iyigun and Afonso Simplício and Ahmed Essouaied and Aicha Chorana and Akhil Eppa and Akintunde Oladip...

  26. [26]

    CulturalBench: A Robust, Diverse, and Challenging Cultural Benchmark by Human-AI CulturalTeaming , url =

    Yu Ying Chiu and Liwei Jiang and Bill Yuchen Lin and Chan Young Park and Shuyue Stella Li and Sahithya Ravi and Mehar Bhatia and Maria Antoniak and Yulia Tsvetkov and Vered Shwartz and Yejin Choi , journal =. CulturalBench: A Robust, Diverse, and Challenging Cultural Benchmark by Human-AI CulturalTeaming , url =

  27. [27]

    doi:10.18653/v1/2023.findings-acl.631 , editor =

    Palta, Shramay and Rudinger, Rachel , booktitle =. doi:10.18653/v1/2023.findings-acl.631 , editor =

  28. [28]

    BertaQA: How Much Do Language Models Know About Local Culture? , url =

    Julen Etxaniz and Gorka Azkune and Aitor Soroa and Oier Lopez de Lacalle and Mikel Artetxe , bibsource =. BertaQA: How Much Do Language Models Know About Local Culture? , url =. Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, NeurIPS 2024, Vancouver, BC, Canada, December 10 - 15, 2024 ...

  29. [29]

    Abhinav Rao and Akhila Yerukola and Vishwa Shah and Katharina Reinecke and Maarten Sap , journal =

  30. [30]

    David Romero and Chenyang Lyu and Haryo Akbarianto Wibowo and Teresa Lynn and Injy Hamed and Aditya Nanda Kishore and Aishik Mandal and Alina Dragonetti and Artem Abzaliev and Atnafu Lambebo Tonja and others , journal =

  31. [31]

    Language-agnostic

    Feng, Fangxiaoyu and Yang, Yinfei and Cer, Daniel and Arivazhagan, Naveen and Wang, Wei , booktitle =. Language-agnostic. doi:10.18653/v1/2022.acl-long.62 , editor =

  32. [32]

    Weinberger and Yoav Artzi , bibsource =

    Tianyi Zhang and Varsha Kishore and Felix Wu and Kilian Q. Weinberger and Yoav Artzi , bibsource =. BERTScore: Evaluating Text Generation with. 8th International Conference on Learning Representations,

  33. [33]

    Steering Large Language Model Activations in Sparse Spaces , url =

    Reza Bayat and Ali Rahimi-Kalahroudi and Mohammad Pezeshki and Sarath Chandar and Pascal Vincent , journal =. Steering Large Language Model Activations in Sparse Spaces , url =

  34. [34]

    Language Arithmetics: Towards Systematic Language Neuron Identification and Manipulation , url =

    Daniil Gurgurov and Katharina Trinley and Yusser Al Ghussin and Tanja Baeumel and Josef van Genabith and Simon Ostermann , journal =. Language Arithmetics: Towards Systematic Language Neuron Identification and Manipulation , url =

  35. [35]

    Proceedings of the 22nd Annual Conference of the European Association for Machine Translation , editor =

    Tiedemann, J. Proceedings of the 22nd Annual Conference of the European Association for Machine Translation , editor =

  36. [36]

    doi:10.3115/1073083.1073135 , editor =

    Papineni, Kishore and Roukos, Salim and Ward, Todd and Zhu, Wei-Jing , booktitle =. doi:10.3115/1073083.1073135 , editor =

  37. [37]

    WEIRD FAccTs: How western, educated, industrialized, rich, and democratic is FAccT? , year =

    Septiandri, Ali Akbar and Constantinides, Marios and Tahaei, Mohammad and Quercia, Daniele , booktitle =. WEIRD FAccTs: How western, educated, industrialized, rich, and democratic is FAccT? , year =

  38. [39]

    Canny and Sarah E

    Hellina Hailu Nigatu and John F. Canny and Sarah E. Chasins , bibsource =. Low-Resourced Languages and Online Knowledge Repositories:. Proceedings of the. doi:10.1145/3613904.3642605 , editor =

  39. [40]

    On the Multilingual Ability of Decoder-based Pre-trained Language Models: Finding and Controlling Language-Specific Neurons , url =

    Kojima, Takeshi and Okimura, Itsuki and Iwasawa, Yusuke and Yanaka, Hitomi and Matsuo, Yutaka , booktitle =. On the Multilingual Ability of Decoder-based Pre-trained Language Models: Finding and Controlling Language-Specific Neurons , url =

  40. [41]

    The Tatoeba Translation Challenge

    Tiedemann, J. The Tatoeba Translation Challenge. Proceedings of the Fifth Conference on Machine Translation , editor =

  41. [42]

    Unveiling Language-Specific Features in Large Language Models via Sparse Autoencoders , url =

    Boyi Deng and Yu Wan and Yidan Zhang and Baosong Yang and Fuli Feng , journal =. Unveiling Language-Specific Features in Large Language Models via Sparse Autoencoders , url =

  42. [43]

    Self-conditioning Pre-Trained Language Models , url =

    Xavier Suau Cuadros and Luca Zappella and Nicholas Apostoloff , bibsource =. Self-conditioning Pre-Trained Language Models , url =. International Conference on Machine Learning,

  43. [44]

    Neuron Specialization: Leveraging intrinsic task modularity for multilingual machine translation , url =

    Tan, Shaomu and Wu, Di and Monz, Christof , journal =. Neuron Specialization: Leveraging intrinsic task modularity for multilingual machine translation , url =

  44. [45]

    Unveiling a core linguistic region in large language models , url =

    Zhao, Jun and Zhang, Zhihao and Ma, Yide and Zhang, Qi and Gui, Tao and Gao, Luhui and Huang, Xuanjing , journal =. Unveiling a core linguistic region in large language models , url =

  45. [46]

    Beyond English-Centric LLMs: What Language Do Multilingual Language Models Think in? , url =

    Zhong, Chengzhi and Cheng, Fei and Liu, Qianying and Jiang, Junfeng and Wan, Zhen and Chu, Chenhui and Murawaki, Yugo and Kurohashi, Sadao , journal =. Beyond English-Centric LLMs: What Language Do Multilingual Language Models Think in? , url =

  46. [47]

    Language-specific neurons: The key to multilingual capabilities in large language models , url =

    Tang, Tianyi and Luo, Wenyang and Huang, Haoyang and Zhang, Dongdong and Wang, Xiaolei and Zhao, Xin and Wei, Furu and Wen, Ji-Rong , journal =. Language-specific neurons: The key to multilingual capabilities in large language models , url =

  47. [48]

    Do llamas work in english? on the latent language of multilingual transformers , year =

    Wendler, Chris and Veselovsky, Veniamin and Monea, Giovanni and West, Robert , booktitle =. Do llamas work in english? on the latent language of multilingual transformers , year =

  48. [49]

    The llama 3 herd of models , url =

    Grattafiori, Aaron and Dubey, Abhimanyu and Jauhri, Abhinav and Pandey, Abhinav and Kadian, Abhishek and Al-Dahle, Ahmad and Letman, Aiesha and Mathur, Akhil and Schelten, Alan and Vaughan, Alex and others , journal =. The llama 3 herd of models , url =

  49. [50]

    Toy models of superposition , url =

    Elhage, Nelson and Hume, Tristan and Olsson, Catherine and Schiefer, Nicholas and Henighan, Tom and Kravec, Shauna and Hatfield-Dodds, Zac and Lasenby, Robert and Drain, Dawn and Chen, Carol and others , journal =. Toy models of superposition , url =

  50. [51]

    Sparse Autoencoders Find Highly Interpretable Features in Language Models , url =

    Robert Huben and Hoagy Cunningham and Logan Riggs and Aidan Ewart and Lee Sharkey , bibsource =. Sparse Autoencoders Find Highly Interpretable Features in Language Models , url =. The Twelfth International Conference on Learning Representations,

  51. [52]

    Mistral Nemo , url =

  52. [53]

    Bloom: A 176b-parameter open-access multilingual language model , year =

    Le Scao, Teven and Fan, Angela and Akiki, Christopher and Pavlick, Ellie and Ili. Bloom: A 176b-parameter open-access multilingual language model , year =

  53. [54]

    Sparse Autoencoders Can Capture Language-Specific Concepts Across Diverse Languages , url =

    Andrylie, Lyzander Marciano and Rahmanisa, Inaya and Ihsani, Mahardika Krisna and Wicaksono, Alfan Farizki and Wibowo, Haryo Akbarianto and Aji, Alham Fikri , journal =. Sparse Autoencoders Can Capture Language-Specific Concepts Across Diverse Languages , url =

  54. [55]

    Interpreting GPT: The Logit Lens , url =

    Nostalgebraist , journal =. Interpreting GPT: The Logit Lens , url =

  55. [56]

    Do Multilingual LLMs Think In English? , url =

    Lisa Schut and Yarin Gal and Sebastian Farquhar , journal =. Do Multilingual LLMs Think In English? , url =

  56. [57]

    Llama 2: Open foundation and fine-tuned chat models , url =

    Touvron, Hugo and Martin, Louis and Stone, Kevin and Albert, Peter and Almahairi, Amjad and Babaei, Yasmine and Bashlykov, Nikolay and Batra, Soumya and Bhargava, Prajjwal and Bhosale, Shruti and others , journal =. Llama 2: Open foundation and fine-tuned chat models , url =

  57. [58]

    How do Large Language Models Handle Multilingualism? , url =

    Yiran Zhao and Wenxuan Zhang and Guizhen Chen and Kenji Kawaguchi and Lidong Bing , bibsource =. How do Large Language Models Handle Multilingualism? , url =. Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, NeurIPS 2024, Vancouver, BC, Canada, December 10 - 15, 2024 , editor =

  58. [59]

    On the Cross-lingual Transferability of Monolingual Representations , url =

    Artetxe, Mikel and Ruder, Sebastian and Yogatama, Dani , booktitle =. On the Cross-lingual Transferability of Monolingual Representations , url =. doi:10.18653/v1/2020.acl-main.421 , editor =

  59. [60]

    Language Arithmetics: Towards Systematic Language Neuron Identification and Manipulation , url =

    Gurgurov, Daniil and Trinley, Katharina and Al Ghussin, Yusser and Baeumel, Tanja and Genabith, Josef Van and Ostermann, Simon , booktitle =. Language Arithmetics: Towards Systematic Language Neuron Identification and Manipulation , url =

  60. [61]

    doi:10.18653/v1/D18-1269 , editor =

    Conneau, Alexis and Rinott, Ruty and Lample, Guillaume and Williams, Adina and Bowman, Samuel and Schwenk, Holger and Stoyanov, Veselin , booktitle =. doi:10.18653/v1/D18-1269 , editor =

  61. [62]

    Language-Specific Neurons Do Not Facilitate Cross-Lingual Transfer , url =

    Mondal, Soumen Kumar and Sen, Sayambhu and Singhania, Abhishek and Jyothi, Preethi , booktitle =. Language-Specific Neurons Do Not Facilitate Cross-Lingual Transfer , url =. doi:10.18653/v1/2025.insights-1.6 , editor =

  62. [63]

    A mathematical theory of communication , volume =

    Shannon, Claude Elwood , journal =. A mathematical theory of communication , volume =

  63. [64]

    Yonghui Wu and Mike Schuster and Zhifeng Chen and Quoc V. Le and Mohammad Norouzi and Wolfgang Macherey and Maxim Krikun and Yuan Cao and Qin Gao and Klaus Macherey and Jeff Klingner and Apurva Shah and Melvin Johnson and Xiaobing Liu and Łukasz Kaiser and Stephan Gouws and Yoshikiyo Kato and Taku Kudo and Hideto Kazawa and Keith Stevens and George Kurian...

  64. [65]

    Adelani, David and Liu, Hannah and Shen, Xiaoyu and Vassilyev, Nikita and Alabi, Jesujoba and Mao, Yanke and Gao, Haonan and Lee, En-Shiun , booktitle =

  65. [66]

    Rethinking Interpretability in the Era of Large Language Models , url =

    Chandan Singh and Jeevana Priya Inala and Michel Galley and Rich Caruana and Jianfeng Gao , journal =. Rethinking Interpretability in the Era of Large Language Models , url =

  66. [67]

    Neuron Specialization: Leveraging Intrinsic Task Modularity for Multilingual Machine Translation , url =

    Tan, Shaomu and Wu, Di and Monz, Christof , booktitle =. Neuron Specialization: Leveraging Intrinsic Task Modularity for Multilingual Machine Translation , url =. doi:10.18653/v1/2024.emnlp-main.374 , editor =

  67. [68]

    Wikimedia Foundation , title =

  68. [69]

    Proceedings of the Twelfth Language Resources and Evaluation Conference , editor =

    Wenzek, Guillaume and Lachaux, Marie-Anne and Conneau, Alexis and Chaudhary, Vishrav and Guzm. Proceedings of the Twelfth Language Resources and Evaluation Conference , editor =

  69. [70]

    Concrete Problems in AI Safety , url =

    Dario Amodei and Chris Olah and Jacob Steinhardt and Paul Christiano and John Schulman and Dan Mané , journal =. Concrete Problems in AI Safety , url =

  70. [71]

    ArXiv preprint , title =

    Team, Gemma and Riviere, Morgane and Pathak, Shreya and Sessa, Pier Giuseppe and Hardin, Cassidy and Bhupatiraju, Surya and Hussenot, L. ArXiv preprint , title =

  71. [72]

    ArXiv preprint , title =

    Costa-Juss. ArXiv preprint , title =

  72. [73]

    Mistral-Nemo-Base-2407 , year =

    Mistral AI , howpublished =. Mistral-Nemo-Base-2407 , year =

  73. [74]

    Bag of Tricks for Efficient Text Classification , url =

    Joulin, Armand and Grave, Edouard and Bojanowski, Piotr and Mikolov, Tomas , booktitle =. Bag of Tricks for Efficient Text Classification , url =

  74. [75]

    The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants , url =

    Bandarkar, Lucas and Liang, Davis and Muller, Benjamin and Artetxe, Mikel and Shukla, Satya Narayan and Husa, Donald and Goyal, Naman and Krishnan, Abhinandan and Zettlemoyer, Luke and Khabsa, Madian , booktitle =. The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants , url =. doi:10.18653/v1/2024.acl-long.44 , editor =

  75. [76]

    Phi-2: The surprising power of small language models , volume =

    Javaheripi, Mojan and Bubeck, S. Phi-2: The surprising power of small language models , volume =. Microsoft Research Blog , number =

  76. [77]

    and Stoica, Ion and Xing, Eric P

    Chiang, Wei-Lin and Li, Zhuohan and Lin, Zi and Sheng, Ying and Wu, Zhanghao and Zhang, Hao and Zheng, Lianmin and Zhuang, Siyuan and Zhuang, Yonghao and Gonzalez, Joseph E. and Stoica, Ion and Xing, Eric P. , title =

  77. [78]

    Causal Language Control in Multilingual Transformers via Sparse Feature Steering , year =

    Chou, Cheng-Ting and Liu, George and Sun, Jessica and Blondin, Cole and Zhu, Kevin and Sharma, Vasu and O'Brien, Sean , booktitle =. Causal Language Control in Multilingual Transformers via Sparse Feature Steering , year =

  78. [79]

    Steering llama 2 via contrastive activation addition , url =

    Panickssery, Nina and Gabrieli, Nick and Schulz, Julian and Tong, Meg and Hubinger, Evan and Turner, Alexander Matt , journal =. Steering llama 2 via contrastive activation addition , url =

  79. [80]

    The geometry of truth: Emergent linear structure in large language model representations of true/false datasets , url =

    Marks, Samuel and Tegmark, Max , journal =. The geometry of truth: Emergent linear structure in large language model representations of true/false datasets , url =

  80. [81]

    Axbench: Steering llms? even simple baselines outperform sparse autoencoders , url =

    Wu, Zhengxuan and Arora, Aryaman and Geiger, Atticus and Wang, Zheng and Huang, Jing and Jurafsky, Dan and Manning, Christopher D and Potts, Christopher , journal =. Axbench: Steering llms? even simple baselines outperform sparse autoencoders , url =

Showing first 80 references.