DFKI-MLT at SemEval-2026 TASK 7: Steering Multilingual Models Towards Cultural Knowledge

Cristina Espa\~na-Bonet; Daniil Gurgurov; Josef van Genabith; Simon Ostermann; Yasser Hamidullah; Yusser Al Ghussin

arxiv: 2605.23069 · v1 · pith:TP7MGPQ5new · submitted 2026-05-21 · 💻 cs.CL

DFKI-MLT at SemEval-2026 TASK 7: Steering Multilingual Models Towards Cultural Knowledge

Yusser Al Ghussin , Daniil Gurgurov , Yasser Hamidullah , Josef van Genabith , Cristina Espa\~na-Bonet , Simon Ostermann This is my paper

Pith reviewed 2026-05-25 05:23 UTC · model grok-4.3

classification 💻 cs.CL

keywords activation steeringcultural awarenessmultilingual LLMslanguage vectorsFLORES datainference-time adaptationSemEval task

0 comments

The pith

Activation steering with language vectors from parallel data produces modest, layer-dependent gains on cultural reasoning in multilingual models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether adding language-specific vectors to the residual stream of multilingual LLMs can adapt them for cultural awareness without any training. Vectors are pulled from parallel sentences in the FLORES dataset and injected at a chosen transformer layer during inference. This method is evaluated on the SemEval-2026 cultural awareness task, where it reaches 86.96 percent accuracy in the multiple-choice track. Post-hoc checks reveal that any gains are small, strongly affected by layer choice, inconsistent across language-region pairs, and sometimes lower performance, while also depending on prompt wording.

Core claim

Language vectors extracted from parallel FLORES data are added to the residual stream of multilingual LLMs at a selected layer to steer the model toward cultural knowledge at inference time with no parameter updates. The method is applied to SemEval-2026 Task 7, yielding 86.96 percent accuracy on the official MCQ track. Analyses of both MCQ and SAQ settings show that the resulting improvements on cultural reasoning are modest and heterogeneous, highly sensitive to layer selection, variable across language-region pairs with some settings causing degradation, and interactive with generic versus culturally conditioned prompts.

What carries the argument

Addition of language-specific steering vectors extracted from parallel FLORES data to the residual stream at a chosen transformer layer.

If this is right

The method requires no parameter updates to adapt models for new cultural contexts.
Steering effectiveness depends critically on which transformer layer receives the vector addition.
Gains differ substantially by language and region and can turn negative in some configurations.
Steering interacts with prompt formulation, requiring joint tuning of vectors and prompts.
The approach was applied to both short-answer and multiple-choice formats in the shared task.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The observed layer sensitivity implies that cultural information is represented at different depths across languages inside the model.
The inconsistency across language pairs suggests that parallel data alone may not capture all region-specific cultural signals equally.
Joint optimization of steering vectors and prompt design could be tested by searching over both simultaneously on held-out cultural questions.

Load-bearing premise

Vectors taken from parallel FLORES sentences carry transferable cultural knowledge that can be injected via residual-stream addition to raise performance on cultural reasoning tasks.

What would settle it

Selecting the layer that maximizes performance on a validation split and then measuring accuracy on the full task set; if overall accuracy shows no gain or a net loss compared with the unsteered model across multiple language pairs, the steering approach would be falsified.

Figures

Figures reproduced from arXiv: 2605.23069 by Cristina Espa\~na-Bonet, Daniil Gurgurov, Josef van Genabith, Simon Ostermann, Yasser Hamidullah, Yusser Al Ghussin.

**Figure 4.** Figure 4: Top and bottom per-language MCQ accuracy [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 2.** Figure 2: Post-hoc cross-prompt layer sweeps for Qwen2.5-72B-Instruct with β = 1 on MCQ (top) and SAQ (bottom). The official submission uses the cultural prompt [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Post-hoc overall MCQ layer sweeps for Qwen2.5-72B-Instruct under different steering strengths (β ∈ {1, 3, 5}). analyses on both MCQ and SAQ evaluation data across multiple models (Qwen2.5-72B/7B, Aya Expanse 8B/32B, Qwen3 8B/32B) using the same evaluation metrics provided by the SemEval-2026 organizers for each track. We observe: (i) strong layer sensitivity: steering gains concentrate in a subset of lay… view at source ↗

**Figure 5.** Figure 5: Per-locale steering effect for Qwen2.5-72B: [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: DiffMean vector convergence vs. FLORES sample size: [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 10.** Figure 10: Post-hoc overall MCQ layer sweeps for Qwen2.5-7B-Instruct under different steering strengths (β ∈ {1, 3, 5}). Large steering strengths can substantially degrade performance in early layers, while β = 1 remains stable and yields the best overall trade-off in our experiments [PITH_FULL_IMAGE:figures/full_fig_p010_10.png] view at source ↗

**Figure 11.** Figure 11: Post-hoc cross-prompt MCQ layer sweep for [PITH_FULL_IMAGE:figures/full_fig_p010_11.png] view at source ↗

**Figure 12.** Figure 12: Top and bottom per-language MCQ accuracy [PITH_FULL_IMAGE:figures/full_fig_p010_12.png] view at source ↗

**Figure 16.** Figure 16: Post-hoc overall MCQ layer sweeps for Aya Expanse 32B under different steering strengths (β ∈ {1, 3, 5}). Large steering strengths can substantially degrade performance in early layers, while improving or meeting performance in mid to late layers. In comparison, β = 1 remains stable across layers although β = 5 yields the best overall Acc in our experiments [PITH_FULL_IMAGE:figures/full_fig_p011_16.png] view at source ↗

**Figure 14.** Figure 14: Post-hoc cross-prompt MCQ layer sweep for [PITH_FULL_IMAGE:figures/full_fig_p011_14.png] view at source ↗

**Figure 15.** Figure 15: Top and bottom per-language MCQ accuracy [PITH_FULL_IMAGE:figures/full_fig_p011_15.png] view at source ↗

**Figure 22.** Figure 22: Post-hoc overall MCQ layer sweeps for Qwen3-32B under different steering strengths (β ∈ {1, 3, 5}). Large steering strengths can substantially degrade performance in early layers, while β = 1 remains stable although β = 5 yields the best overall Acc in our experiments [PITH_FULL_IMAGE:figures/full_fig_p012_22.png] view at source ↗

**Figure 20.** Figure 20: Post-hoc cross-prompt MCQ layer sweep for Qwen3-8B with β = 1. The official submission uses the cultural prompt. Prompt choice affects both baseline accuracy and the optimal steering layer (here, Layer 14 for the cultural prompt and Layer 25 for the generic prompt) [PITH_FULL_IMAGE:figures/full_fig_p012_20.png] view at source ↗

**Figure 21.** Figure 21: Top and bottom per-language MCQ accuracy [PITH_FULL_IMAGE:figures/full_fig_p012_21.png] view at source ↗

**Figure 28.** Figure 28: Post-hoc overall SAQ layer sweeps for Qwen2.5-7B-Instruct under different steering strengths (β ∈ {1, 3, 5}). Large steering strengths can substantially degrade performance in early layers, while β = 1 remains stable and yields the best overall trade-off in our experiments [PITH_FULL_IMAGE:figures/full_fig_p013_28.png] view at source ↗

**Figure 29.** Figure 29: Post-hoc cross-prompt SAQ layer sweep for [PITH_FULL_IMAGE:figures/full_fig_p013_29.png] view at source ↗

**Figure 30.** Figure 30: Top and bottom per-language SAQ accuracy [PITH_FULL_IMAGE:figures/full_fig_p013_30.png] view at source ↗

**Figure 31.** Figure 31: Post-hoc overall SAQ layer sweeps for Aya [PITH_FULL_IMAGE:figures/full_fig_p014_31.png] view at source ↗

**Figure 35.** Figure 35: Post-hoc cross-prompt SAQ layer sweep for [PITH_FULL_IMAGE:figures/full_fig_p014_35.png] view at source ↗

**Figure 36.** Figure 36: Top and bottom per-language SAQ accuracy [PITH_FULL_IMAGE:figures/full_fig_p014_36.png] view at source ↗

**Figure 37.** Figure 37: Post-hoc overall SAQ layer sweeps for Qwen3-8B under different steering strengths (β ∈ {1, 3, 5}). Large steering strengths can substantially degrade performance in early layers, while β = 1 remains stable and yields the best overall trade-off in our experiments [PITH_FULL_IMAGE:figures/full_fig_p015_37.png] view at source ↗

**Figure 41.** Figure 41: Post-hoc cross-prompt SAQ layer sweep for Qwen3-32B with β = 5. The official submission uses the cultural prompt. Prompt choice affects both baseline accuracy and the optimal steering layer (here, Layer 20 for the cultural prompt and Layer 35 for the generic prompt) [PITH_FULL_IMAGE:figures/full_fig_p015_41.png] view at source ↗

**Figure 42.** Figure 42: Top and bottom per-language SAQ accuracy [PITH_FULL_IMAGE:figures/full_fig_p015_42.png] view at source ↗

read the original abstract

Large language models (LLMs) are increasingly used across diverse linguistic and cultural contexts, yet their cultural knowledge remains uneven across regions and languages. We present the DFKI-MLT system for SemEval-2026 Task 7 on cultural awareness, where we apply activation steering to multilingual LLMs using language vectors extracted from parallel FLORES data. Our method performs inference-time adaptation by adding language-specific steering vectors to the residual stream at a selected transformer layer, without any parameter updates. We participated in both the short-answer (SAQ) and multiple-choice (MCQ) tracks; however, only our MCQ submission received an official score. In the official MCQ track, we achieved 86.96% accuracy, ranking 7th out of 17 teams. To better understand system behavior, we conduct post-hoc analyses on the shared-task MCQ and SAQ settings. These analyses show that activation steering yields modest and heterogeneous improvements on cultural reasoning: gains are strongly layer-sensitive, vary substantially across language-region pairs, with some configurations even degrading performance, and interact with prompt formulation, comparing generic and culturally conditioned prompts. Our findings suggest that prompt design and activation steering should be jointly optimized for culturally aware multilingual inference.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This shared-task paper applies activation steering to cultural MCQs and reports modest, layer-sensitive gains that sometimes degrade performance.

read the letter

The main point is that activation steering with FLORES-derived language vectors yields only modest and uneven improvements on cultural reasoning tasks, with results that vary sharply by layer, language pair, and prompt style, and can hurt accuracy in some setups. They entered SemEval-2026 Task 7, posted an official 86.96% on the MCQ track for 7th place out of 17, and used post-hoc checks on both tracks to map out those inconsistencies. The method itself is inference-time residual addition at a chosen layer with no parameter updates. What the paper does well is stay honest about the mixed outcomes and note that prompt design and steering need joint tuning. That reporting is useful because it shows the practical limits of the approach rather than claiming a general fix. The work is not technically new; it ports an existing steering technique to cultural awareness using standard parallel data, without new derivations or frameworks. The findings stay tied to this one task, and the abstract leaves out specifics on layer choice and scaling, though the post-hoc section at least documents the heterogeneity. This is mainly for people running shared-task systems or testing activation methods on multilingual models. A reader looking for realistic engineering notes on cultural adaptation will find the analyses worth scanning. I would send it to peer review because the empirical observations are clear and the paper does not overstate what the numbers show.

Referee Report

1 major / 2 minor

Summary. The paper describes the DFKI-MLT system for SemEval-2026 Task 7 on cultural awareness. It extracts language vectors from parallel FLORES data and applies activation steering by adding these vectors to the residual stream of multilingual LLMs at a selected layer during inference, without parameter updates. The system achieved an official 86.96% accuracy on the MCQ track (ranking 7th of 17), while post-hoc analyses on both MCQ and SAQ settings report modest, heterogeneous, layer-sensitive gains that vary by language-region pair, sometimes degrade performance, and interact with prompt formulation (generic vs. culturally conditioned). The authors conclude that prompt design and steering should be jointly optimized.

Significance. If the post-hoc results are reproducible, the work supplies concrete empirical observations on the variable and limited effectiveness of activation steering for cultural reasoning in multilingual LLMs. It highlights layer sensitivity, language-region variation, and prompt interactions, which are useful for practitioners working on culturally aware inference. The official shared-task score and use of publicly available parallel data are strengths that support direct evaluation without circularity.

major comments (1)

[Abstract and post-hoc analyses] Abstract and post-hoc analyses section: the central claim that activation steering yields modest heterogeneous improvements rests on analyses whose key implementation details (layer selection criteria, exact vector computation from FLORES, and scaling-factor values) are not provided. This directly affects assessment of the load-bearing assumption that the extracted vectors encode transferable cultural knowledge.

minor comments (2)

[Abstract] The manuscript reports participation in both SAQ and MCQ tracks but provides an official score for only one; clarifying the status of the second track would improve completeness.
[Method description] Notation for the steering operation (residual-stream addition) and the precise definition of the language vector should be stated explicitly with an equation or pseudocode for reproducibility.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the thorough review and the recommendation of minor revision. The positive assessment of the empirical observations on activation steering is appreciated. We address the concern about missing implementation details below.

read point-by-point responses

Referee: [Abstract and post-hoc analyses] Abstract and post-hoc analyses section: the central claim that activation steering yields modest heterogeneous improvements rests on analyses whose key implementation details (layer selection criteria, exact vector computation from FLORES, and scaling-factor values) are not provided. This directly affects assessment of the load-bearing assumption that the extracted vectors encode transferable cultural knowledge.

Authors: We agree with the referee that the key implementation details for the post-hoc analyses were not sufficiently detailed in the submitted manuscript. This omission affects the ability to fully assess the analyses. We will revise the post-hoc analyses section to include explicit descriptions of the layer selection criteria, the exact procedure for computing the language vectors from the FLORES data, and the scaling-factor values used in the experiments. These additions will be made in the revised manuscript to strengthen the support for our claims. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper reports an empirical application of activation steering on multilingual LLMs using language vectors extracted from public parallel FLORES data, followed by direct evaluation on the shared SemEval-2026 Task 7 MCQ and SAQ tracks. No derivation, first-principles claim, or prediction is presented that reduces by construction to fitted parameters or self-citations; results are post-hoc observations of layer-sensitive, heterogeneous effects with explicit hedging on modest gains and degradations. The central method (residual-stream addition at inference time) is independent of the evaluation data and does not rely on load-bearing self-citations or ansatzes imported from prior author work.

Axiom & Free-Parameter Ledger

2 free parameters · 0 axioms · 0 invented entities

The paper is an empirical system description for a shared task; the abstract does not introduce mathematical derivations, so the ledger is minimal. Layer selection and any implicit scaling of the steering vector function as free parameters chosen on the basis of experimentation not detailed here.

free parameters (2)

selected transformer layer
The layer at which the steering vector is added is described as 'selected' without a stated selection procedure or justification in the abstract.
steering vector scaling factor
The magnitude with which the language vector is added to the residual stream is not specified and must be treated as a tunable hyperparameter.

pith-pipeline@v0.9.0 · 5780 in / 1386 out tokens · 28833 ms · 2026-05-25T05:23:01.440913+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

82 extracted references · 82 canonical work pages

[1]

Nedjma Ousidhoum and Junho Myung and Carla Perez-Almendros and Jiho Jin and Amr Keleg and Meriem Beloucif and Yi Zhou and Rodrigo Agerri and Vladimir Araujo and Naomi Baes and James Barry and Joanne Boisson and Nancy F. Chen and Christine de Kock and Aleksandra Edwards and Joseba Fernandez de Landa and Mohamed Fazli Imam and Huda Hakami and Shu-Kai Hsieh ...

work page 2026
[2]

Advances in Neural Information Processing Systems , volume=

Blend: A benchmark for llms on everyday knowledge in diverse cultures and languages , author=. Advances in Neural Information Processing Systems , volume=

work page
[3]

Proceedings of the 20th International Workshop on Semantic Evaluation (SemEval-2026). 2026

work page 2026
[4]

Qwen3 Technical Report , url =

Qwen Team , journal =. Qwen3 Technical Report , url =

work page
[5]

Aya Expanse: Combining Research Breakthroughs for a New Multilingual Frontier , url =

John Dang and Shivalika Singh and Daniel D'souza and Arash Ahmadian and Alejandro Salamanca and Madeline Smith and Aidan Peppin and Sungjin Hong and Manoj Govindassamy and Terrence Zhao and Sandra Kublik and Meor Amer and Viraat Aryabumi and Jon Ander Campos and Yi-Chern Tan and Tom Kocmi and Florian Strub and Nathan Grinsztajn and Yannis Flet-Berliac and...

work page
[6]

Cultural bias and cultural alignment of large language models , volume =

Tao, Yan and Viberg, Olga and Baker, Ryan S and Kizilcec, Ren. Cultural bias and cultural alignment of large language models , volume =. PNAS nexus , number =

work page
[7]

NLLB Team and Marta R. Costa-jussà and James Cross and Onur Çelebi and Maha Elbayad and Kenneth Heafield and Kevin Heffernan and Elahe Kalbassi and Janice Lam and Daniel Licht and Jean Maillard and Anna Sun and Skyler Wang and Guillaume Wenzek and Al Youngblood and Bapi Akula and Loic Barrault and Gabriel Mejia Gonzalez and Prangthip Hansanti and John Hof...

work page
[8]

Computational Linguistics , volume=

Survey of cultural awareness in language models: Text and beyond , author=. Computational Linguistics , volume=. 2025 , publisher=

work page 2025
[9]

Junho Myung and Nayeon Lee and Yi Zhou and Jiho Jin and Rifki Afina Putri and Dimosthenis Antypas and Hsuvas Borkakoty and Eunsu Kim and Carla P. BLEnD:. Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, NeurIPS 2024, Vancouver, BC, Canada, December 10 - 15, 2024 , editor =

work page 2024
[10]

Qwen2.5 Technical Report , url =

Qwen and : and An Yang and Baosong Yang and Beichen Zhang and Binyuan Hui and Bo Zheng and Bowen Yu and Chengyuan Li and Dayiheng Liu and Fei Huang and Haoran Wei and Huan Lin and Jian Yang and Jianhong Tu and Jianwei Zhang and Jianxin Yang and Jiaxi Yang and Jingren Zhou and Junyang Lin and Kai Dang and Keming Lu and Keqin Bao and Kexin Yang and Le Yu an...

work page
[11]

Isolating culture neurons in multilingual large language models , author=. Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics , pages=

work page
[12]

and Nguyen, Thien Huu , booktitle =

Nguyen, Thuat and Nguyen, Chien Van and Lai, Viet Dac and Man, Hieu and Ngo, Nghia Trung and Dernoncourt, Franck and Rossi, Ryan A. and Nguyen, Thien Huu , booktitle =

work page
[13]

doi:10.18653/v1/2023.emnlp-main.981 , editor =

Mukherjee, Anjishnu and Raj, Chahat and Zhu, Ziwei and Anastasopoulos, Antonios , booktitle =. doi:10.18653/v1/2023.emnlp-main.981 , editor =

work page doi:10.18653/v1/2023.emnlp-main.981 2023
[14]

CultureLLM: Incorporating Cultural Differences into Large Language Models , url =

Cheng Li and Mengzhuo Chen and Jindong Wang and Sunayana Sitaram and Xing Xie , bibsource =. CultureLLM: Incorporating Cultural Differences into Large Language Models , url =. Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, NeurIPS 2024, Vancouver, BC, Canada, December 10 - 15, 2024 , editor =

work page 2024
[15]

Toxicity in chatgpt: Analyzing persona-assigned language models , url =

Deshpande, Ameet and Murahari, Vishvak and Rajpurohit, Tanmay and Kalyan, Ashwin and Narasimhan, Karthik , booktitle =. Toxicity in chatgpt: Analyzing persona-assigned language models , url =. doi:10.18653/v1/2023.findings-emnlp.88 , editor =

work page doi:10.18653/v1/2023.findings-emnlp.88 2023
[16]

Understanding intermediate layers using linear classifier probes , url =

Guillaume Alain and Yoshua Bengio , journal =. Understanding intermediate layers using linear classifier probes , url =

work page
[17]

Steering llama 2 via contrastive activation addition , year =

Rimsky, Nina and Gabrieli, Nick and Schulz, Julian and Tong, Meg and Hubinger, Evan and Turner, Alexander , booktitle =. Steering llama 2 via contrastive activation addition , year =

work page
[18]

Byun and Zifan Wang and Alex Mallen and Steven Basart and Sanmi Koyejo and Dawn Song and Matt Fredrikson and J

Andy Zou and Long Phan and Sarah Chen and James Campbell and Phillip Guo and Richard Ren and Alexander Pan and Xuwang Yin and Mantas Mazeika and Ann-Kathrin Dombrowski and Shashwat Goel and Nathaniel Li and Michael J. Byun and Zifan Wang and Alex Mallen and Steven Basart and Sanmi Koyejo and Dawn Song and Matt Fredrikson and J. Zico Kolter and Dan Hendryc...

work page
[19]

Fluent but Culturally Distant: Can Regional Training Teach Cultural Understanding? , url =

Agarwal, Dhruv and Shukla, Anya and Sitaram, Sunayana and Vashistha, Aditya , journal =. Fluent but Culturally Distant: Can Regional Training Teach Cultural Understanding? , url =

work page
[20]

ArXiv preprint , title =

Rystr. ArXiv preprint , title =

work page
[21]

Disentangling Language and Culture for Evaluating Multilingual Large Language Models , url =

Ying, Jiahao and Tang, Wei and Zhao, Yiran and Cao, Yixin and Rong, Yu and Zhang, Wenxuan , booktitle =. Disentangling Language and Culture for Evaluating Multilingual Large Language Models , url =. doi:10.18653/v1/2025.acl-long.1082 , editor =

work page doi:10.18653/v1/2025.acl-long.1082 2025
[22]

The Linear Representation Hypothesis and the Geometry of Large Language Models , url =

Kiho Park and Yo Joong Choe and Victor Veitch , bibsource =. The Linear Representation Hypothesis and the Geometry of Large Language Models , url =. Forty-first International Conference on Machine Learning,

work page
[23]

Style Vectors for Steering Generative Large Language Model , url =

Kai Konen and Sophie Jentzsch and Diaoulé Diallo and Peer Schütt and Oliver Bensch and Roxanne El Baff and Dominik Opitz and Tobias Hecking , journal =. Style Vectors for Steering Generative Large Language Model , url =

work page
[24]

Steering Llama 2 via Contrastive Activation Addition , url =

Rimsky, Nina and Gabrieli, Nick and Schulz, Julian and Tong, Meg and Hubinger, Evan and Turner, Alexander , booktitle =. Steering Llama 2 via Contrastive Activation Addition , url =. doi:10.18653/v1/2024.acl-long.828 , editor =

work page doi:10.18653/v1/2024.acl-long.828 2024
[25]

Tyler A. Chang and Catherine Arnett and Abdelrahman Eldesokey and Abdelrahman Sadallah and Abeer Kashar and Abolade Daud and Abosede Grace Olanihun and Adamu Labaran Mohammed and Adeyemi Praise and Adhikarinayum Meerajita Sharma and Aditi Gupta and Afitab Iyigun and Afonso Simplício and Ahmed Essouaied and Aicha Chorana and Akhil Eppa and Akintunde Oladip...

work page
[26]

CulturalBench: A Robust, Diverse, and Challenging Cultural Benchmark by Human-AI CulturalTeaming , url =

Yu Ying Chiu and Liwei Jiang and Bill Yuchen Lin and Chan Young Park and Shuyue Stella Li and Sahithya Ravi and Mehar Bhatia and Maria Antoniak and Yulia Tsvetkov and Vered Shwartz and Yejin Choi , journal =. CulturalBench: A Robust, Diverse, and Challenging Cultural Benchmark by Human-AI CulturalTeaming , url =

work page
[27]

doi:10.18653/v1/2023.findings-acl.631 , editor =

Palta, Shramay and Rudinger, Rachel , booktitle =. doi:10.18653/v1/2023.findings-acl.631 , editor =

work page doi:10.18653/v1/2023.findings-acl.631 2023
[28]

BertaQA: How Much Do Language Models Know About Local Culture? , url =

Julen Etxaniz and Gorka Azkune and Aitor Soroa and Oier Lopez de Lacalle and Mikel Artetxe , bibsource =. BertaQA: How Much Do Language Models Know About Local Culture? , url =. Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, NeurIPS 2024, Vancouver, BC, Canada, December 10 - 15, 2024 ...

work page 2024
[29]

Abhinav Rao and Akhila Yerukola and Vishwa Shah and Katharina Reinecke and Maarten Sap , journal =

work page
[30]

David Romero and Chenyang Lyu and Haryo Akbarianto Wibowo and Teresa Lynn and Injy Hamed and Aditya Nanda Kishore and Aishik Mandal and Alina Dragonetti and Artem Abzaliev and Atnafu Lambebo Tonja and others , journal =

work page
[31]

Language-agnostic

Feng, Fangxiaoyu and Yang, Yinfei and Cer, Daniel and Arivazhagan, Naveen and Wang, Wei , booktitle =. Language-agnostic. doi:10.18653/v1/2022.acl-long.62 , editor =

work page doi:10.18653/v1/2022.acl-long.62 2022
[32]

Weinberger and Yoav Artzi , bibsource =

Tianyi Zhang and Varsha Kishore and Felix Wu and Kilian Q. Weinberger and Yoav Artzi , bibsource =. BERTScore: Evaluating Text Generation with. 8th International Conference on Learning Representations,

work page
[33]

Steering Large Language Model Activations in Sparse Spaces , url =

Reza Bayat and Ali Rahimi-Kalahroudi and Mohammad Pezeshki and Sarath Chandar and Pascal Vincent , journal =. Steering Large Language Model Activations in Sparse Spaces , url =

work page
[34]

Language Arithmetics: Towards Systematic Language Neuron Identification and Manipulation , url =

Daniil Gurgurov and Katharina Trinley and Yusser Al Ghussin and Tanja Baeumel and Josef van Genabith and Simon Ostermann , journal =. Language Arithmetics: Towards Systematic Language Neuron Identification and Manipulation , url =

work page
[35]

Proceedings of the 22nd Annual Conference of the European Association for Machine Translation , editor =

Tiedemann, J. Proceedings of the 22nd Annual Conference of the European Association for Machine Translation , editor =

work page
[36]

doi:10.3115/1073083.1073135 , editor =

Papineni, Kishore and Roukos, Salim and Ward, Todd and Zhu, Wei-Jing , booktitle =. doi:10.3115/1073083.1073135 , editor =

work page doi:10.3115/1073083.1073135
[37]

WEIRD FAccTs: How western, educated, industrialized, rich, and democratic is FAccT? , year =

Septiandri, Ali Akbar and Constantinides, Marios and Tahaei, Mohammad and Quercia, Daniele , booktitle =. WEIRD FAccTs: How western, educated, industrialized, rich, and democratic is FAccT? , year =

work page
[39]

Canny and Sarah E

Hellina Hailu Nigatu and John F. Canny and Sarah E. Chasins , bibsource =. Low-Resourced Languages and Online Knowledge Repositories:. Proceedings of the. doi:10.1145/3613904.3642605 , editor =

work page doi:10.1145/3613904.3642605
[40]

On the Multilingual Ability of Decoder-based Pre-trained Language Models: Finding and Controlling Language-Specific Neurons , url =

Kojima, Takeshi and Okimura, Itsuki and Iwasawa, Yusuke and Yanaka, Hitomi and Matsuo, Yutaka , booktitle =. On the Multilingual Ability of Decoder-based Pre-trained Language Models: Finding and Controlling Language-Specific Neurons , url =

work page
[41]

The Tatoeba Translation Challenge

Tiedemann, J. The Tatoeba Translation Challenge. Proceedings of the Fifth Conference on Machine Translation , editor =

work page
[42]

Unveiling Language-Specific Features in Large Language Models via Sparse Autoencoders , url =

Boyi Deng and Yu Wan and Yidan Zhang and Baosong Yang and Fuli Feng , journal =. Unveiling Language-Specific Features in Large Language Models via Sparse Autoencoders , url =

work page
[43]

Self-conditioning Pre-Trained Language Models , url =

Xavier Suau Cuadros and Luca Zappella and Nicholas Apostoloff , bibsource =. Self-conditioning Pre-Trained Language Models , url =. International Conference on Machine Learning,

work page
[44]

Neuron Specialization: Leveraging intrinsic task modularity for multilingual machine translation , url =

Tan, Shaomu and Wu, Di and Monz, Christof , journal =. Neuron Specialization: Leveraging intrinsic task modularity for multilingual machine translation , url =

work page
[45]

Unveiling a core linguistic region in large language models , url =

Zhao, Jun and Zhang, Zhihao and Ma, Yide and Zhang, Qi and Gui, Tao and Gao, Luhui and Huang, Xuanjing , journal =. Unveiling a core linguistic region in large language models , url =

work page
[46]

Beyond English-Centric LLMs: What Language Do Multilingual Language Models Think in? , url =

Zhong, Chengzhi and Cheng, Fei and Liu, Qianying and Jiang, Junfeng and Wan, Zhen and Chu, Chenhui and Murawaki, Yugo and Kurohashi, Sadao , journal =. Beyond English-Centric LLMs: What Language Do Multilingual Language Models Think in? , url =

work page
[47]

Language-specific neurons: The key to multilingual capabilities in large language models , url =

Tang, Tianyi and Luo, Wenyang and Huang, Haoyang and Zhang, Dongdong and Wang, Xiaolei and Zhao, Xin and Wei, Furu and Wen, Ji-Rong , journal =. Language-specific neurons: The key to multilingual capabilities in large language models , url =

work page
[48]

Do llamas work in english? on the latent language of multilingual transformers , year =

Wendler, Chris and Veselovsky, Veniamin and Monea, Giovanni and West, Robert , booktitle =. Do llamas work in english? on the latent language of multilingual transformers , year =

work page
[49]

The llama 3 herd of models , url =

Grattafiori, Aaron and Dubey, Abhimanyu and Jauhri, Abhinav and Pandey, Abhinav and Kadian, Abhishek and Al-Dahle, Ahmad and Letman, Aiesha and Mathur, Akhil and Schelten, Alan and Vaughan, Alex and others , journal =. The llama 3 herd of models , url =

work page
[50]

Toy models of superposition , url =

Elhage, Nelson and Hume, Tristan and Olsson, Catherine and Schiefer, Nicholas and Henighan, Tom and Kravec, Shauna and Hatfield-Dodds, Zac and Lasenby, Robert and Drain, Dawn and Chen, Carol and others , journal =. Toy models of superposition , url =

work page
[51]

Sparse Autoencoders Find Highly Interpretable Features in Language Models , url =

Robert Huben and Hoagy Cunningham and Logan Riggs and Aidan Ewart and Lee Sharkey , bibsource =. Sparse Autoencoders Find Highly Interpretable Features in Language Models , url =. The Twelfth International Conference on Learning Representations,

work page
[52]

Mistral Nemo , url =

work page
[53]

Bloom: A 176b-parameter open-access multilingual language model , year =

Le Scao, Teven and Fan, Angela and Akiki, Christopher and Pavlick, Ellie and Ili. Bloom: A 176b-parameter open-access multilingual language model , year =

work page
[54]

Sparse Autoencoders Can Capture Language-Specific Concepts Across Diverse Languages , url =

Andrylie, Lyzander Marciano and Rahmanisa, Inaya and Ihsani, Mahardika Krisna and Wicaksono, Alfan Farizki and Wibowo, Haryo Akbarianto and Aji, Alham Fikri , journal =. Sparse Autoencoders Can Capture Language-Specific Concepts Across Diverse Languages , url =

work page
[55]

Interpreting GPT: The Logit Lens , url =

Nostalgebraist , journal =. Interpreting GPT: The Logit Lens , url =

work page
[56]

Do Multilingual LLMs Think In English? , url =

Lisa Schut and Yarin Gal and Sebastian Farquhar , journal =. Do Multilingual LLMs Think In English? , url =

work page
[57]

Llama 2: Open foundation and fine-tuned chat models , url =

Touvron, Hugo and Martin, Louis and Stone, Kevin and Albert, Peter and Almahairi, Amjad and Babaei, Yasmine and Bashlykov, Nikolay and Batra, Soumya and Bhargava, Prajjwal and Bhosale, Shruti and others , journal =. Llama 2: Open foundation and fine-tuned chat models , url =

work page
[58]

How do Large Language Models Handle Multilingualism? , url =

Yiran Zhao and Wenxuan Zhang and Guizhen Chen and Kenji Kawaguchi and Lidong Bing , bibsource =. How do Large Language Models Handle Multilingualism? , url =. Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, NeurIPS 2024, Vancouver, BC, Canada, December 10 - 15, 2024 , editor =

work page 2024
[59]

On the Cross-lingual Transferability of Monolingual Representations , url =

Artetxe, Mikel and Ruder, Sebastian and Yogatama, Dani , booktitle =. On the Cross-lingual Transferability of Monolingual Representations , url =. doi:10.18653/v1/2020.acl-main.421 , editor =

work page doi:10.18653/v1/2020.acl-main.421 2020
[60]

Language Arithmetics: Towards Systematic Language Neuron Identification and Manipulation , url =

Gurgurov, Daniil and Trinley, Katharina and Al Ghussin, Yusser and Baeumel, Tanja and Genabith, Josef Van and Ostermann, Simon , booktitle =. Language Arithmetics: Towards Systematic Language Neuron Identification and Manipulation , url =

work page
[61]

doi:10.18653/v1/D18-1269 , editor =

Conneau, Alexis and Rinott, Ruty and Lample, Guillaume and Williams, Adina and Bowman, Samuel and Schwenk, Holger and Stoyanov, Veselin , booktitle =. doi:10.18653/v1/D18-1269 , editor =

work page doi:10.18653/v1/d18-1269
[62]

Language-Specific Neurons Do Not Facilitate Cross-Lingual Transfer , url =

Mondal, Soumen Kumar and Sen, Sayambhu and Singhania, Abhishek and Jyothi, Preethi , booktitle =. Language-Specific Neurons Do Not Facilitate Cross-Lingual Transfer , url =. doi:10.18653/v1/2025.insights-1.6 , editor =

work page doi:10.18653/v1/2025.insights-1.6 2025
[63]

A mathematical theory of communication , volume =

Shannon, Claude Elwood , journal =. A mathematical theory of communication , volume =

work page
[64]

Yonghui Wu and Mike Schuster and Zhifeng Chen and Quoc V. Le and Mohammad Norouzi and Wolfgang Macherey and Maxim Krikun and Yuan Cao and Qin Gao and Klaus Macherey and Jeff Klingner and Apurva Shah and Melvin Johnson and Xiaobing Liu and Łukasz Kaiser and Stephan Gouws and Yoshikiyo Kato and Taku Kudo and Hideto Kazawa and Keith Stevens and George Kurian...

work page
[65]

Adelani, David and Liu, Hannah and Shen, Xiaoyu and Vassilyev, Nikita and Alabi, Jesujoba and Mao, Yanke and Gao, Haonan and Lee, En-Shiun , booktitle =

work page
[66]

Rethinking Interpretability in the Era of Large Language Models , url =

Chandan Singh and Jeevana Priya Inala and Michel Galley and Rich Caruana and Jianfeng Gao , journal =. Rethinking Interpretability in the Era of Large Language Models , url =

work page
[67]

Neuron Specialization: Leveraging Intrinsic Task Modularity for Multilingual Machine Translation , url =

Tan, Shaomu and Wu, Di and Monz, Christof , booktitle =. Neuron Specialization: Leveraging Intrinsic Task Modularity for Multilingual Machine Translation , url =. doi:10.18653/v1/2024.emnlp-main.374 , editor =

work page doi:10.18653/v1/2024.emnlp-main.374 2024
[68]

Wikimedia Foundation , title =

work page
[69]

Proceedings of the Twelfth Language Resources and Evaluation Conference , editor =

Wenzek, Guillaume and Lachaux, Marie-Anne and Conneau, Alexis and Chaudhary, Vishrav and Guzm. Proceedings of the Twelfth Language Resources and Evaluation Conference , editor =

work page
[70]

Concrete Problems in AI Safety , url =

Dario Amodei and Chris Olah and Jacob Steinhardt and Paul Christiano and John Schulman and Dan Mané , journal =. Concrete Problems in AI Safety , url =

work page
[71]

ArXiv preprint , title =

Team, Gemma and Riviere, Morgane and Pathak, Shreya and Sessa, Pier Giuseppe and Hardin, Cassidy and Bhupatiraju, Surya and Hussenot, L. ArXiv preprint , title =

work page
[72]

ArXiv preprint , title =

Costa-Juss. ArXiv preprint , title =

work page
[73]

Mistral-Nemo-Base-2407 , year =

Mistral AI , howpublished =. Mistral-Nemo-Base-2407 , year =

work page
[74]

Bag of Tricks for Efficient Text Classification , url =

Joulin, Armand and Grave, Edouard and Bojanowski, Piotr and Mikolov, Tomas , booktitle =. Bag of Tricks for Efficient Text Classification , url =

work page
[75]

The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants , url =

Bandarkar, Lucas and Liang, Davis and Muller, Benjamin and Artetxe, Mikel and Shukla, Satya Narayan and Husa, Donald and Goyal, Naman and Krishnan, Abhinandan and Zettlemoyer, Luke and Khabsa, Madian , booktitle =. The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants , url =. doi:10.18653/v1/2024.acl-long.44 , editor =

work page doi:10.18653/v1/2024.acl-long.44 2024
[76]

Phi-2: The surprising power of small language models , volume =

Javaheripi, Mojan and Bubeck, S. Phi-2: The surprising power of small language models , volume =. Microsoft Research Blog , number =

work page
[77]

and Stoica, Ion and Xing, Eric P

Chiang, Wei-Lin and Li, Zhuohan and Lin, Zi and Sheng, Ying and Wu, Zhanghao and Zhang, Hao and Zheng, Lianmin and Zhuang, Siyuan and Zhuang, Yonghao and Gonzalez, Joseph E. and Stoica, Ion and Xing, Eric P. , title =

work page
[78]

Causal Language Control in Multilingual Transformers via Sparse Feature Steering , year =

Chou, Cheng-Ting and Liu, George and Sun, Jessica and Blondin, Cole and Zhu, Kevin and Sharma, Vasu and O'Brien, Sean , booktitle =. Causal Language Control in Multilingual Transformers via Sparse Feature Steering , year =

work page
[79]

Steering llama 2 via contrastive activation addition , url =

Panickssery, Nina and Gabrieli, Nick and Schulz, Julian and Tong, Meg and Hubinger, Evan and Turner, Alexander Matt , journal =. Steering llama 2 via contrastive activation addition , url =

work page
[80]

The geometry of truth: Emergent linear structure in large language model representations of true/false datasets , url =

Marks, Samuel and Tegmark, Max , journal =. The geometry of truth: Emergent linear structure in large language model representations of true/false datasets , url =

work page
[81]

Axbench: Steering llms? even simple baselines outperform sparse autoencoders , url =

Wu, Zhengxuan and Arora, Aryaman and Geiger, Atticus and Wang, Zheng and Huang, Jing and Jurafsky, Dan and Manning, Christopher D and Potts, Christopher , journal =. Axbench: Steering llms? even simple baselines outperform sparse autoencoders , url =

work page

Showing first 80 references.

[1] [1]

Nedjma Ousidhoum and Junho Myung and Carla Perez-Almendros and Jiho Jin and Amr Keleg and Meriem Beloucif and Yi Zhou and Rodrigo Agerri and Vladimir Araujo and Naomi Baes and James Barry and Joanne Boisson and Nancy F. Chen and Christine de Kock and Aleksandra Edwards and Joseba Fernandez de Landa and Mohamed Fazli Imam and Huda Hakami and Shu-Kai Hsieh ...

work page 2026

[2] [2]

Advances in Neural Information Processing Systems , volume=

Blend: A benchmark for llms on everyday knowledge in diverse cultures and languages , author=. Advances in Neural Information Processing Systems , volume=

work page

[3] [3]

Proceedings of the 20th International Workshop on Semantic Evaluation (SemEval-2026). 2026

work page 2026

[4] [4]

Qwen3 Technical Report , url =

Qwen Team , journal =. Qwen3 Technical Report , url =

work page

[5] [5]

Aya Expanse: Combining Research Breakthroughs for a New Multilingual Frontier , url =

John Dang and Shivalika Singh and Daniel D'souza and Arash Ahmadian and Alejandro Salamanca and Madeline Smith and Aidan Peppin and Sungjin Hong and Manoj Govindassamy and Terrence Zhao and Sandra Kublik and Meor Amer and Viraat Aryabumi and Jon Ander Campos and Yi-Chern Tan and Tom Kocmi and Florian Strub and Nathan Grinsztajn and Yannis Flet-Berliac and...

work page

[6] [6]

Cultural bias and cultural alignment of large language models , volume =

Tao, Yan and Viberg, Olga and Baker, Ryan S and Kizilcec, Ren. Cultural bias and cultural alignment of large language models , volume =. PNAS nexus , number =

work page

[7] [7]

NLLB Team and Marta R. Costa-jussà and James Cross and Onur Çelebi and Maha Elbayad and Kenneth Heafield and Kevin Heffernan and Elahe Kalbassi and Janice Lam and Daniel Licht and Jean Maillard and Anna Sun and Skyler Wang and Guillaume Wenzek and Al Youngblood and Bapi Akula and Loic Barrault and Gabriel Mejia Gonzalez and Prangthip Hansanti and John Hof...

work page

[8] [8]

Computational Linguistics , volume=

Survey of cultural awareness in language models: Text and beyond , author=. Computational Linguistics , volume=. 2025 , publisher=

work page 2025

[9] [9]

Junho Myung and Nayeon Lee and Yi Zhou and Jiho Jin and Rifki Afina Putri and Dimosthenis Antypas and Hsuvas Borkakoty and Eunsu Kim and Carla P. BLEnD:. Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, NeurIPS 2024, Vancouver, BC, Canada, December 10 - 15, 2024 , editor =

work page 2024

[10] [10]

Qwen2.5 Technical Report , url =

Qwen and : and An Yang and Baosong Yang and Beichen Zhang and Binyuan Hui and Bo Zheng and Bowen Yu and Chengyuan Li and Dayiheng Liu and Fei Huang and Haoran Wei and Huan Lin and Jian Yang and Jianhong Tu and Jianwei Zhang and Jianxin Yang and Jiaxi Yang and Jingren Zhou and Junyang Lin and Kai Dang and Keming Lu and Keqin Bao and Kexin Yang and Le Yu an...

work page

[11] [11]

Isolating culture neurons in multilingual large language models , author=. Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics , pages=

work page

[12] [12]

and Nguyen, Thien Huu , booktitle =

Nguyen, Thuat and Nguyen, Chien Van and Lai, Viet Dac and Man, Hieu and Ngo, Nghia Trung and Dernoncourt, Franck and Rossi, Ryan A. and Nguyen, Thien Huu , booktitle =

work page

[13] [13]

doi:10.18653/v1/2023.emnlp-main.981 , editor =

Mukherjee, Anjishnu and Raj, Chahat and Zhu, Ziwei and Anastasopoulos, Antonios , booktitle =. doi:10.18653/v1/2023.emnlp-main.981 , editor =

work page doi:10.18653/v1/2023.emnlp-main.981 2023

[14] [14]

CultureLLM: Incorporating Cultural Differences into Large Language Models , url =

Cheng Li and Mengzhuo Chen and Jindong Wang and Sunayana Sitaram and Xing Xie , bibsource =. CultureLLM: Incorporating Cultural Differences into Large Language Models , url =. Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, NeurIPS 2024, Vancouver, BC, Canada, December 10 - 15, 2024 , editor =

work page 2024

[15] [15]

Toxicity in chatgpt: Analyzing persona-assigned language models , url =

Deshpande, Ameet and Murahari, Vishvak and Rajpurohit, Tanmay and Kalyan, Ashwin and Narasimhan, Karthik , booktitle =. Toxicity in chatgpt: Analyzing persona-assigned language models , url =. doi:10.18653/v1/2023.findings-emnlp.88 , editor =

work page doi:10.18653/v1/2023.findings-emnlp.88 2023

[16] [16]

Understanding intermediate layers using linear classifier probes , url =

Guillaume Alain and Yoshua Bengio , journal =. Understanding intermediate layers using linear classifier probes , url =

work page

[17] [17]

Steering llama 2 via contrastive activation addition , year =

Rimsky, Nina and Gabrieli, Nick and Schulz, Julian and Tong, Meg and Hubinger, Evan and Turner, Alexander , booktitle =. Steering llama 2 via contrastive activation addition , year =

work page

[18] [18]

Byun and Zifan Wang and Alex Mallen and Steven Basart and Sanmi Koyejo and Dawn Song and Matt Fredrikson and J

Andy Zou and Long Phan and Sarah Chen and James Campbell and Phillip Guo and Richard Ren and Alexander Pan and Xuwang Yin and Mantas Mazeika and Ann-Kathrin Dombrowski and Shashwat Goel and Nathaniel Li and Michael J. Byun and Zifan Wang and Alex Mallen and Steven Basart and Sanmi Koyejo and Dawn Song and Matt Fredrikson and J. Zico Kolter and Dan Hendryc...

work page

[19] [19]

Fluent but Culturally Distant: Can Regional Training Teach Cultural Understanding? , url =

Agarwal, Dhruv and Shukla, Anya and Sitaram, Sunayana and Vashistha, Aditya , journal =. Fluent but Culturally Distant: Can Regional Training Teach Cultural Understanding? , url =

work page

[20] [20]

ArXiv preprint , title =

Rystr. ArXiv preprint , title =

work page

[21] [21]

Disentangling Language and Culture for Evaluating Multilingual Large Language Models , url =

Ying, Jiahao and Tang, Wei and Zhao, Yiran and Cao, Yixin and Rong, Yu and Zhang, Wenxuan , booktitle =. Disentangling Language and Culture for Evaluating Multilingual Large Language Models , url =. doi:10.18653/v1/2025.acl-long.1082 , editor =

work page doi:10.18653/v1/2025.acl-long.1082 2025

[22] [22]

The Linear Representation Hypothesis and the Geometry of Large Language Models , url =

Kiho Park and Yo Joong Choe and Victor Veitch , bibsource =. The Linear Representation Hypothesis and the Geometry of Large Language Models , url =. Forty-first International Conference on Machine Learning,

work page

[23] [23]

Style Vectors for Steering Generative Large Language Model , url =

Kai Konen and Sophie Jentzsch and Diaoulé Diallo and Peer Schütt and Oliver Bensch and Roxanne El Baff and Dominik Opitz and Tobias Hecking , journal =. Style Vectors for Steering Generative Large Language Model , url =

work page

[24] [24]

Steering Llama 2 via Contrastive Activation Addition , url =

Rimsky, Nina and Gabrieli, Nick and Schulz, Julian and Tong, Meg and Hubinger, Evan and Turner, Alexander , booktitle =. Steering Llama 2 via Contrastive Activation Addition , url =. doi:10.18653/v1/2024.acl-long.828 , editor =

work page doi:10.18653/v1/2024.acl-long.828 2024

[25] [25]

Tyler A. Chang and Catherine Arnett and Abdelrahman Eldesokey and Abdelrahman Sadallah and Abeer Kashar and Abolade Daud and Abosede Grace Olanihun and Adamu Labaran Mohammed and Adeyemi Praise and Adhikarinayum Meerajita Sharma and Aditi Gupta and Afitab Iyigun and Afonso Simplício and Ahmed Essouaied and Aicha Chorana and Akhil Eppa and Akintunde Oladip...

work page

[26] [26]

CulturalBench: A Robust, Diverse, and Challenging Cultural Benchmark by Human-AI CulturalTeaming , url =

Yu Ying Chiu and Liwei Jiang and Bill Yuchen Lin and Chan Young Park and Shuyue Stella Li and Sahithya Ravi and Mehar Bhatia and Maria Antoniak and Yulia Tsvetkov and Vered Shwartz and Yejin Choi , journal =. CulturalBench: A Robust, Diverse, and Challenging Cultural Benchmark by Human-AI CulturalTeaming , url =

work page

[27] [27]

doi:10.18653/v1/2023.findings-acl.631 , editor =

Palta, Shramay and Rudinger, Rachel , booktitle =. doi:10.18653/v1/2023.findings-acl.631 , editor =

work page doi:10.18653/v1/2023.findings-acl.631 2023

[28] [28]

BertaQA: How Much Do Language Models Know About Local Culture? , url =

Julen Etxaniz and Gorka Azkune and Aitor Soroa and Oier Lopez de Lacalle and Mikel Artetxe , bibsource =. BertaQA: How Much Do Language Models Know About Local Culture? , url =. Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, NeurIPS 2024, Vancouver, BC, Canada, December 10 - 15, 2024 ...

work page 2024

[29] [29]

Abhinav Rao and Akhila Yerukola and Vishwa Shah and Katharina Reinecke and Maarten Sap , journal =

work page

[30] [30]

David Romero and Chenyang Lyu and Haryo Akbarianto Wibowo and Teresa Lynn and Injy Hamed and Aditya Nanda Kishore and Aishik Mandal and Alina Dragonetti and Artem Abzaliev and Atnafu Lambebo Tonja and others , journal =

work page

[31] [31]

Language-agnostic

Feng, Fangxiaoyu and Yang, Yinfei and Cer, Daniel and Arivazhagan, Naveen and Wang, Wei , booktitle =. Language-agnostic. doi:10.18653/v1/2022.acl-long.62 , editor =

work page doi:10.18653/v1/2022.acl-long.62 2022

[32] [32]

Weinberger and Yoav Artzi , bibsource =

Tianyi Zhang and Varsha Kishore and Felix Wu and Kilian Q. Weinberger and Yoav Artzi , bibsource =. BERTScore: Evaluating Text Generation with. 8th International Conference on Learning Representations,

work page

[33] [33]

Steering Large Language Model Activations in Sparse Spaces , url =

Reza Bayat and Ali Rahimi-Kalahroudi and Mohammad Pezeshki and Sarath Chandar and Pascal Vincent , journal =. Steering Large Language Model Activations in Sparse Spaces , url =

work page

[34] [34]

Language Arithmetics: Towards Systematic Language Neuron Identification and Manipulation , url =

Daniil Gurgurov and Katharina Trinley and Yusser Al Ghussin and Tanja Baeumel and Josef van Genabith and Simon Ostermann , journal =. Language Arithmetics: Towards Systematic Language Neuron Identification and Manipulation , url =

work page

[35] [35]

Proceedings of the 22nd Annual Conference of the European Association for Machine Translation , editor =

Tiedemann, J. Proceedings of the 22nd Annual Conference of the European Association for Machine Translation , editor =

work page

[36] [36]

doi:10.3115/1073083.1073135 , editor =

Papineni, Kishore and Roukos, Salim and Ward, Todd and Zhu, Wei-Jing , booktitle =. doi:10.3115/1073083.1073135 , editor =

work page doi:10.3115/1073083.1073135

[37] [37]

WEIRD FAccTs: How western, educated, industrialized, rich, and democratic is FAccT? , year =

Septiandri, Ali Akbar and Constantinides, Marios and Tahaei, Mohammad and Quercia, Daniele , booktitle =. WEIRD FAccTs: How western, educated, industrialized, rich, and democratic is FAccT? , year =

work page

[38] [39]

Canny and Sarah E

Hellina Hailu Nigatu and John F. Canny and Sarah E. Chasins , bibsource =. Low-Resourced Languages and Online Knowledge Repositories:. Proceedings of the. doi:10.1145/3613904.3642605 , editor =

work page doi:10.1145/3613904.3642605

[39] [40]

On the Multilingual Ability of Decoder-based Pre-trained Language Models: Finding and Controlling Language-Specific Neurons , url =

Kojima, Takeshi and Okimura, Itsuki and Iwasawa, Yusuke and Yanaka, Hitomi and Matsuo, Yutaka , booktitle =. On the Multilingual Ability of Decoder-based Pre-trained Language Models: Finding and Controlling Language-Specific Neurons , url =

work page

[40] [41]

The Tatoeba Translation Challenge

Tiedemann, J. The Tatoeba Translation Challenge. Proceedings of the Fifth Conference on Machine Translation , editor =

work page

[41] [42]

Unveiling Language-Specific Features in Large Language Models via Sparse Autoencoders , url =

Boyi Deng and Yu Wan and Yidan Zhang and Baosong Yang and Fuli Feng , journal =. Unveiling Language-Specific Features in Large Language Models via Sparse Autoencoders , url =

work page

[42] [43]

Self-conditioning Pre-Trained Language Models , url =

Xavier Suau Cuadros and Luca Zappella and Nicholas Apostoloff , bibsource =. Self-conditioning Pre-Trained Language Models , url =. International Conference on Machine Learning,

work page

[43] [44]

Neuron Specialization: Leveraging intrinsic task modularity for multilingual machine translation , url =

Tan, Shaomu and Wu, Di and Monz, Christof , journal =. Neuron Specialization: Leveraging intrinsic task modularity for multilingual machine translation , url =

work page

[44] [45]

Unveiling a core linguistic region in large language models , url =

Zhao, Jun and Zhang, Zhihao and Ma, Yide and Zhang, Qi and Gui, Tao and Gao, Luhui and Huang, Xuanjing , journal =. Unveiling a core linguistic region in large language models , url =

work page

[45] [46]

Beyond English-Centric LLMs: What Language Do Multilingual Language Models Think in? , url =

Zhong, Chengzhi and Cheng, Fei and Liu, Qianying and Jiang, Junfeng and Wan, Zhen and Chu, Chenhui and Murawaki, Yugo and Kurohashi, Sadao , journal =. Beyond English-Centric LLMs: What Language Do Multilingual Language Models Think in? , url =

work page

[46] [47]

Language-specific neurons: The key to multilingual capabilities in large language models , url =

Tang, Tianyi and Luo, Wenyang and Huang, Haoyang and Zhang, Dongdong and Wang, Xiaolei and Zhao, Xin and Wei, Furu and Wen, Ji-Rong , journal =. Language-specific neurons: The key to multilingual capabilities in large language models , url =

work page

[47] [48]

Do llamas work in english? on the latent language of multilingual transformers , year =

Wendler, Chris and Veselovsky, Veniamin and Monea, Giovanni and West, Robert , booktitle =. Do llamas work in english? on the latent language of multilingual transformers , year =

work page

[48] [49]

The llama 3 herd of models , url =

Grattafiori, Aaron and Dubey, Abhimanyu and Jauhri, Abhinav and Pandey, Abhinav and Kadian, Abhishek and Al-Dahle, Ahmad and Letman, Aiesha and Mathur, Akhil and Schelten, Alan and Vaughan, Alex and others , journal =. The llama 3 herd of models , url =

work page

[49] [50]

Toy models of superposition , url =

Elhage, Nelson and Hume, Tristan and Olsson, Catherine and Schiefer, Nicholas and Henighan, Tom and Kravec, Shauna and Hatfield-Dodds, Zac and Lasenby, Robert and Drain, Dawn and Chen, Carol and others , journal =. Toy models of superposition , url =

work page

[50] [51]

Sparse Autoencoders Find Highly Interpretable Features in Language Models , url =

Robert Huben and Hoagy Cunningham and Logan Riggs and Aidan Ewart and Lee Sharkey , bibsource =. Sparse Autoencoders Find Highly Interpretable Features in Language Models , url =. The Twelfth International Conference on Learning Representations,

work page

[51] [52]

Mistral Nemo , url =

work page

[52] [53]

Bloom: A 176b-parameter open-access multilingual language model , year =

Le Scao, Teven and Fan, Angela and Akiki, Christopher and Pavlick, Ellie and Ili. Bloom: A 176b-parameter open-access multilingual language model , year =

work page

[53] [54]

Sparse Autoencoders Can Capture Language-Specific Concepts Across Diverse Languages , url =

Andrylie, Lyzander Marciano and Rahmanisa, Inaya and Ihsani, Mahardika Krisna and Wicaksono, Alfan Farizki and Wibowo, Haryo Akbarianto and Aji, Alham Fikri , journal =. Sparse Autoencoders Can Capture Language-Specific Concepts Across Diverse Languages , url =

work page

[54] [55]

Interpreting GPT: The Logit Lens , url =

Nostalgebraist , journal =. Interpreting GPT: The Logit Lens , url =

work page

[55] [56]

Do Multilingual LLMs Think In English? , url =

Lisa Schut and Yarin Gal and Sebastian Farquhar , journal =. Do Multilingual LLMs Think In English? , url =

work page

[56] [57]

Llama 2: Open foundation and fine-tuned chat models , url =

Touvron, Hugo and Martin, Louis and Stone, Kevin and Albert, Peter and Almahairi, Amjad and Babaei, Yasmine and Bashlykov, Nikolay and Batra, Soumya and Bhargava, Prajjwal and Bhosale, Shruti and others , journal =. Llama 2: Open foundation and fine-tuned chat models , url =

work page

[57] [58]

How do Large Language Models Handle Multilingualism? , url =

Yiran Zhao and Wenxuan Zhang and Guizhen Chen and Kenji Kawaguchi and Lidong Bing , bibsource =. How do Large Language Models Handle Multilingualism? , url =. Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, NeurIPS 2024, Vancouver, BC, Canada, December 10 - 15, 2024 , editor =

work page 2024

[58] [59]

On the Cross-lingual Transferability of Monolingual Representations , url =

Artetxe, Mikel and Ruder, Sebastian and Yogatama, Dani , booktitle =. On the Cross-lingual Transferability of Monolingual Representations , url =. doi:10.18653/v1/2020.acl-main.421 , editor =

work page doi:10.18653/v1/2020.acl-main.421 2020

[59] [60]

Language Arithmetics: Towards Systematic Language Neuron Identification and Manipulation , url =

Gurgurov, Daniil and Trinley, Katharina and Al Ghussin, Yusser and Baeumel, Tanja and Genabith, Josef Van and Ostermann, Simon , booktitle =. Language Arithmetics: Towards Systematic Language Neuron Identification and Manipulation , url =

work page

[60] [61]

doi:10.18653/v1/D18-1269 , editor =

Conneau, Alexis and Rinott, Ruty and Lample, Guillaume and Williams, Adina and Bowman, Samuel and Schwenk, Holger and Stoyanov, Veselin , booktitle =. doi:10.18653/v1/D18-1269 , editor =

work page doi:10.18653/v1/d18-1269

[61] [62]

Language-Specific Neurons Do Not Facilitate Cross-Lingual Transfer , url =

Mondal, Soumen Kumar and Sen, Sayambhu and Singhania, Abhishek and Jyothi, Preethi , booktitle =. Language-Specific Neurons Do Not Facilitate Cross-Lingual Transfer , url =. doi:10.18653/v1/2025.insights-1.6 , editor =

work page doi:10.18653/v1/2025.insights-1.6 2025

[62] [63]

A mathematical theory of communication , volume =

Shannon, Claude Elwood , journal =. A mathematical theory of communication , volume =

work page

[63] [64]

Yonghui Wu and Mike Schuster and Zhifeng Chen and Quoc V. Le and Mohammad Norouzi and Wolfgang Macherey and Maxim Krikun and Yuan Cao and Qin Gao and Klaus Macherey and Jeff Klingner and Apurva Shah and Melvin Johnson and Xiaobing Liu and Łukasz Kaiser and Stephan Gouws and Yoshikiyo Kato and Taku Kudo and Hideto Kazawa and Keith Stevens and George Kurian...

work page

[64] [65]

Adelani, David and Liu, Hannah and Shen, Xiaoyu and Vassilyev, Nikita and Alabi, Jesujoba and Mao, Yanke and Gao, Haonan and Lee, En-Shiun , booktitle =

work page

[65] [66]

Rethinking Interpretability in the Era of Large Language Models , url =

Chandan Singh and Jeevana Priya Inala and Michel Galley and Rich Caruana and Jianfeng Gao , journal =. Rethinking Interpretability in the Era of Large Language Models , url =

work page

[66] [67]

Neuron Specialization: Leveraging Intrinsic Task Modularity for Multilingual Machine Translation , url =

Tan, Shaomu and Wu, Di and Monz, Christof , booktitle =. Neuron Specialization: Leveraging Intrinsic Task Modularity for Multilingual Machine Translation , url =. doi:10.18653/v1/2024.emnlp-main.374 , editor =

work page doi:10.18653/v1/2024.emnlp-main.374 2024

[67] [68]

Wikimedia Foundation , title =

work page

[68] [69]

Proceedings of the Twelfth Language Resources and Evaluation Conference , editor =

Wenzek, Guillaume and Lachaux, Marie-Anne and Conneau, Alexis and Chaudhary, Vishrav and Guzm. Proceedings of the Twelfth Language Resources and Evaluation Conference , editor =

work page

[69] [70]

Concrete Problems in AI Safety , url =

Dario Amodei and Chris Olah and Jacob Steinhardt and Paul Christiano and John Schulman and Dan Mané , journal =. Concrete Problems in AI Safety , url =

work page

[70] [71]

ArXiv preprint , title =

Team, Gemma and Riviere, Morgane and Pathak, Shreya and Sessa, Pier Giuseppe and Hardin, Cassidy and Bhupatiraju, Surya and Hussenot, L. ArXiv preprint , title =

work page

[71] [72]

ArXiv preprint , title =

Costa-Juss. ArXiv preprint , title =

work page

[72] [73]

Mistral-Nemo-Base-2407 , year =

Mistral AI , howpublished =. Mistral-Nemo-Base-2407 , year =

work page

[73] [74]

Bag of Tricks for Efficient Text Classification , url =

Joulin, Armand and Grave, Edouard and Bojanowski, Piotr and Mikolov, Tomas , booktitle =. Bag of Tricks for Efficient Text Classification , url =

work page

[74] [75]

The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants , url =

Bandarkar, Lucas and Liang, Davis and Muller, Benjamin and Artetxe, Mikel and Shukla, Satya Narayan and Husa, Donald and Goyal, Naman and Krishnan, Abhinandan and Zettlemoyer, Luke and Khabsa, Madian , booktitle =. The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants , url =. doi:10.18653/v1/2024.acl-long.44 , editor =

work page doi:10.18653/v1/2024.acl-long.44 2024

[75] [76]

Phi-2: The surprising power of small language models , volume =

Javaheripi, Mojan and Bubeck, S. Phi-2: The surprising power of small language models , volume =. Microsoft Research Blog , number =

work page

[76] [77]

and Stoica, Ion and Xing, Eric P

Chiang, Wei-Lin and Li, Zhuohan and Lin, Zi and Sheng, Ying and Wu, Zhanghao and Zhang, Hao and Zheng, Lianmin and Zhuang, Siyuan and Zhuang, Yonghao and Gonzalez, Joseph E. and Stoica, Ion and Xing, Eric P. , title =

work page

[77] [78]

Causal Language Control in Multilingual Transformers via Sparse Feature Steering , year =

Chou, Cheng-Ting and Liu, George and Sun, Jessica and Blondin, Cole and Zhu, Kevin and Sharma, Vasu and O'Brien, Sean , booktitle =. Causal Language Control in Multilingual Transformers via Sparse Feature Steering , year =

work page

[78] [79]

Steering llama 2 via contrastive activation addition , url =

Panickssery, Nina and Gabrieli, Nick and Schulz, Julian and Tong, Meg and Hubinger, Evan and Turner, Alexander Matt , journal =. Steering llama 2 via contrastive activation addition , url =

work page

[79] [80]

The geometry of truth: Emergent linear structure in large language model representations of true/false datasets , url =

Marks, Samuel and Tegmark, Max , journal =. The geometry of truth: Emergent linear structure in large language model representations of true/false datasets , url =

work page

[80] [81]

Axbench: Steering llms? even simple baselines outperform sparse autoencoders , url =

Wu, Zhengxuan and Arora, Aryaman and Geiger, Atticus and Wang, Zheng and Huang, Jing and Jurafsky, Dan and Manning, Christopher D and Potts, Christopher , journal =. Axbench: Steering llms? even simple baselines outperform sparse autoencoders , url =

work page