DFKI-MLT at SemEval-2026 TASK 7: Steering Multilingual Models Towards Cultural Knowledge
Pith reviewed 2026-05-25 05:23 UTC · model grok-4.3
The pith
Activation steering with language vectors from parallel data produces modest, layer-dependent gains on cultural reasoning in multilingual models.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Language vectors extracted from parallel FLORES data are added to the residual stream of multilingual LLMs at a selected layer to steer the model toward cultural knowledge at inference time with no parameter updates. The method is applied to SemEval-2026 Task 7, yielding 86.96 percent accuracy on the official MCQ track. Analyses of both MCQ and SAQ settings show that the resulting improvements on cultural reasoning are modest and heterogeneous, highly sensitive to layer selection, variable across language-region pairs with some settings causing degradation, and interactive with generic versus culturally conditioned prompts.
What carries the argument
Addition of language-specific steering vectors extracted from parallel FLORES data to the residual stream at a chosen transformer layer.
If this is right
- The method requires no parameter updates to adapt models for new cultural contexts.
- Steering effectiveness depends critically on which transformer layer receives the vector addition.
- Gains differ substantially by language and region and can turn negative in some configurations.
- Steering interacts with prompt formulation, requiring joint tuning of vectors and prompts.
- The approach was applied to both short-answer and multiple-choice formats in the shared task.
Where Pith is reading between the lines
- The observed layer sensitivity implies that cultural information is represented at different depths across languages inside the model.
- The inconsistency across language pairs suggests that parallel data alone may not capture all region-specific cultural signals equally.
- Joint optimization of steering vectors and prompt design could be tested by searching over both simultaneously on held-out cultural questions.
Load-bearing premise
Vectors taken from parallel FLORES sentences carry transferable cultural knowledge that can be injected via residual-stream addition to raise performance on cultural reasoning tasks.
What would settle it
Selecting the layer that maximizes performance on a validation split and then measuring accuracy on the full task set; if overall accuracy shows no gain or a net loss compared with the unsteered model across multiple language pairs, the steering approach would be falsified.
Figures
read the original abstract
Large language models (LLMs) are increasingly used across diverse linguistic and cultural contexts, yet their cultural knowledge remains uneven across regions and languages. We present the DFKI-MLT system for SemEval-2026 Task 7 on cultural awareness, where we apply activation steering to multilingual LLMs using language vectors extracted from parallel FLORES data. Our method performs inference-time adaptation by adding language-specific steering vectors to the residual stream at a selected transformer layer, without any parameter updates. We participated in both the short-answer (SAQ) and multiple-choice (MCQ) tracks; however, only our MCQ submission received an official score. In the official MCQ track, we achieved 86.96% accuracy, ranking 7th out of 17 teams. To better understand system behavior, we conduct post-hoc analyses on the shared-task MCQ and SAQ settings. These analyses show that activation steering yields modest and heterogeneous improvements on cultural reasoning: gains are strongly layer-sensitive, vary substantially across language-region pairs, with some configurations even degrading performance, and interact with prompt formulation, comparing generic and culturally conditioned prompts. Our findings suggest that prompt design and activation steering should be jointly optimized for culturally aware multilingual inference.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper describes the DFKI-MLT system for SemEval-2026 Task 7 on cultural awareness. It extracts language vectors from parallel FLORES data and applies activation steering by adding these vectors to the residual stream of multilingual LLMs at a selected layer during inference, without parameter updates. The system achieved an official 86.96% accuracy on the MCQ track (ranking 7th of 17), while post-hoc analyses on both MCQ and SAQ settings report modest, heterogeneous, layer-sensitive gains that vary by language-region pair, sometimes degrade performance, and interact with prompt formulation (generic vs. culturally conditioned). The authors conclude that prompt design and steering should be jointly optimized.
Significance. If the post-hoc results are reproducible, the work supplies concrete empirical observations on the variable and limited effectiveness of activation steering for cultural reasoning in multilingual LLMs. It highlights layer sensitivity, language-region variation, and prompt interactions, which are useful for practitioners working on culturally aware inference. The official shared-task score and use of publicly available parallel data are strengths that support direct evaluation without circularity.
major comments (1)
- [Abstract and post-hoc analyses] Abstract and post-hoc analyses section: the central claim that activation steering yields modest heterogeneous improvements rests on analyses whose key implementation details (layer selection criteria, exact vector computation from FLORES, and scaling-factor values) are not provided. This directly affects assessment of the load-bearing assumption that the extracted vectors encode transferable cultural knowledge.
minor comments (2)
- [Abstract] The manuscript reports participation in both SAQ and MCQ tracks but provides an official score for only one; clarifying the status of the second track would improve completeness.
- [Method description] Notation for the steering operation (residual-stream addition) and the precise definition of the language vector should be stated explicitly with an equation or pseudocode for reproducibility.
Simulated Author's Rebuttal
We thank the referee for the thorough review and the recommendation of minor revision. The positive assessment of the empirical observations on activation steering is appreciated. We address the concern about missing implementation details below.
read point-by-point responses
-
Referee: [Abstract and post-hoc analyses] Abstract and post-hoc analyses section: the central claim that activation steering yields modest heterogeneous improvements rests on analyses whose key implementation details (layer selection criteria, exact vector computation from FLORES, and scaling-factor values) are not provided. This directly affects assessment of the load-bearing assumption that the extracted vectors encode transferable cultural knowledge.
Authors: We agree with the referee that the key implementation details for the post-hoc analyses were not sufficiently detailed in the submitted manuscript. This omission affects the ability to fully assess the analyses. We will revise the post-hoc analyses section to include explicit descriptions of the layer selection criteria, the exact procedure for computing the language vectors from the FLORES data, and the scaling-factor values used in the experiments. These additions will be made in the revised manuscript to strengthen the support for our claims. revision: yes
Circularity Check
No significant circularity
full rationale
The paper reports an empirical application of activation steering on multilingual LLMs using language vectors extracted from public parallel FLORES data, followed by direct evaluation on the shared SemEval-2026 Task 7 MCQ and SAQ tracks. No derivation, first-principles claim, or prediction is presented that reduces by construction to fitted parameters or self-citations; results are post-hoc observations of layer-sensitive, heterogeneous effects with explicit hedging on modest gains and degradations. The central method (residual-stream addition at inference time) is independent of the evaluation data and does not rely on load-bearing self-citations or ansatzes imported from prior author work.
Axiom & Free-Parameter Ledger
free parameters (2)
- selected transformer layer
- steering vector scaling factor
Reference graph
Works this paper leans on
-
[1]
Nedjma Ousidhoum and Junho Myung and Carla Perez-Almendros and Jiho Jin and Amr Keleg and Meriem Beloucif and Yi Zhou and Rodrigo Agerri and Vladimir Araujo and Naomi Baes and James Barry and Joanne Boisson and Nancy F. Chen and Christine de Kock and Aleksandra Edwards and Joseba Fernandez de Landa and Mohamed Fazli Imam and Huda Hakami and Shu-Kai Hsieh ...
work page 2026
-
[2]
Advances in Neural Information Processing Systems , volume=
Blend: A benchmark for llms on everyday knowledge in diverse cultures and languages , author=. Advances in Neural Information Processing Systems , volume=
-
[3]
Proceedings of the 20th International Workshop on Semantic Evaluation (SemEval-2026). 2026
work page 2026
- [4]
-
[5]
Aya Expanse: Combining Research Breakthroughs for a New Multilingual Frontier , url =
John Dang and Shivalika Singh and Daniel D'souza and Arash Ahmadian and Alejandro Salamanca and Madeline Smith and Aidan Peppin and Sungjin Hong and Manoj Govindassamy and Terrence Zhao and Sandra Kublik and Meor Amer and Viraat Aryabumi and Jon Ander Campos and Yi-Chern Tan and Tom Kocmi and Florian Strub and Nathan Grinsztajn and Yannis Flet-Berliac and...
-
[6]
Cultural bias and cultural alignment of large language models , volume =
Tao, Yan and Viberg, Olga and Baker, Ryan S and Kizilcec, Ren. Cultural bias and cultural alignment of large language models , volume =. PNAS nexus , number =
-
[7]
NLLB Team and Marta R. Costa-jussà and James Cross and Onur Çelebi and Maha Elbayad and Kenneth Heafield and Kevin Heffernan and Elahe Kalbassi and Janice Lam and Daniel Licht and Jean Maillard and Anna Sun and Skyler Wang and Guillaume Wenzek and Al Youngblood and Bapi Akula and Loic Barrault and Gabriel Mejia Gonzalez and Prangthip Hansanti and John Hof...
-
[8]
Computational Linguistics , volume=
Survey of cultural awareness in language models: Text and beyond , author=. Computational Linguistics , volume=. 2025 , publisher=
work page 2025
-
[9]
Junho Myung and Nayeon Lee and Yi Zhou and Jiho Jin and Rifki Afina Putri and Dimosthenis Antypas and Hsuvas Borkakoty and Eunsu Kim and Carla P. BLEnD:. Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, NeurIPS 2024, Vancouver, BC, Canada, December 10 - 15, 2024 , editor =
work page 2024
-
[10]
Qwen2.5 Technical Report , url =
Qwen and : and An Yang and Baosong Yang and Beichen Zhang and Binyuan Hui and Bo Zheng and Bowen Yu and Chengyuan Li and Dayiheng Liu and Fei Huang and Haoran Wei and Huan Lin and Jian Yang and Jianhong Tu and Jianwei Zhang and Jianxin Yang and Jiaxi Yang and Jingren Zhou and Junyang Lin and Kai Dang and Keming Lu and Keqin Bao and Kexin Yang and Le Yu an...
-
[11]
Isolating culture neurons in multilingual large language models , author=. Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics , pages=
-
[12]
and Nguyen, Thien Huu , booktitle =
Nguyen, Thuat and Nguyen, Chien Van and Lai, Viet Dac and Man, Hieu and Ngo, Nghia Trung and Dernoncourt, Franck and Rossi, Ryan A. and Nguyen, Thien Huu , booktitle =
-
[13]
doi:10.18653/v1/2023.emnlp-main.981 , editor =
Mukherjee, Anjishnu and Raj, Chahat and Zhu, Ziwei and Anastasopoulos, Antonios , booktitle =. doi:10.18653/v1/2023.emnlp-main.981 , editor =
-
[14]
CultureLLM: Incorporating Cultural Differences into Large Language Models , url =
Cheng Li and Mengzhuo Chen and Jindong Wang and Sunayana Sitaram and Xing Xie , bibsource =. CultureLLM: Incorporating Cultural Differences into Large Language Models , url =. Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, NeurIPS 2024, Vancouver, BC, Canada, December 10 - 15, 2024 , editor =
work page 2024
-
[15]
Toxicity in chatgpt: Analyzing persona-assigned language models , url =
Deshpande, Ameet and Murahari, Vishvak and Rajpurohit, Tanmay and Kalyan, Ashwin and Narasimhan, Karthik , booktitle =. Toxicity in chatgpt: Analyzing persona-assigned language models , url =. doi:10.18653/v1/2023.findings-emnlp.88 , editor =
-
[16]
Understanding intermediate layers using linear classifier probes , url =
Guillaume Alain and Yoshua Bengio , journal =. Understanding intermediate layers using linear classifier probes , url =
-
[17]
Steering llama 2 via contrastive activation addition , year =
Rimsky, Nina and Gabrieli, Nick and Schulz, Julian and Tong, Meg and Hubinger, Evan and Turner, Alexander , booktitle =. Steering llama 2 via contrastive activation addition , year =
-
[18]
Andy Zou and Long Phan and Sarah Chen and James Campbell and Phillip Guo and Richard Ren and Alexander Pan and Xuwang Yin and Mantas Mazeika and Ann-Kathrin Dombrowski and Shashwat Goel and Nathaniel Li and Michael J. Byun and Zifan Wang and Alex Mallen and Steven Basart and Sanmi Koyejo and Dawn Song and Matt Fredrikson and J. Zico Kolter and Dan Hendryc...
-
[19]
Fluent but Culturally Distant: Can Regional Training Teach Cultural Understanding? , url =
Agarwal, Dhruv and Shukla, Anya and Sitaram, Sunayana and Vashistha, Aditya , journal =. Fluent but Culturally Distant: Can Regional Training Teach Cultural Understanding? , url =
- [20]
-
[21]
Disentangling Language and Culture for Evaluating Multilingual Large Language Models , url =
Ying, Jiahao and Tang, Wei and Zhao, Yiran and Cao, Yixin and Rong, Yu and Zhang, Wenxuan , booktitle =. Disentangling Language and Culture for Evaluating Multilingual Large Language Models , url =. doi:10.18653/v1/2025.acl-long.1082 , editor =
-
[22]
The Linear Representation Hypothesis and the Geometry of Large Language Models , url =
Kiho Park and Yo Joong Choe and Victor Veitch , bibsource =. The Linear Representation Hypothesis and the Geometry of Large Language Models , url =. Forty-first International Conference on Machine Learning,
-
[23]
Style Vectors for Steering Generative Large Language Model , url =
Kai Konen and Sophie Jentzsch and Diaoulé Diallo and Peer Schütt and Oliver Bensch and Roxanne El Baff and Dominik Opitz and Tobias Hecking , journal =. Style Vectors for Steering Generative Large Language Model , url =
-
[24]
Steering Llama 2 via Contrastive Activation Addition , url =
Rimsky, Nina and Gabrieli, Nick and Schulz, Julian and Tong, Meg and Hubinger, Evan and Turner, Alexander , booktitle =. Steering Llama 2 via Contrastive Activation Addition , url =. doi:10.18653/v1/2024.acl-long.828 , editor =
-
[25]
Tyler A. Chang and Catherine Arnett and Abdelrahman Eldesokey and Abdelrahman Sadallah and Abeer Kashar and Abolade Daud and Abosede Grace Olanihun and Adamu Labaran Mohammed and Adeyemi Praise and Adhikarinayum Meerajita Sharma and Aditi Gupta and Afitab Iyigun and Afonso Simplício and Ahmed Essouaied and Aicha Chorana and Akhil Eppa and Akintunde Oladip...
-
[26]
Yu Ying Chiu and Liwei Jiang and Bill Yuchen Lin and Chan Young Park and Shuyue Stella Li and Sahithya Ravi and Mehar Bhatia and Maria Antoniak and Yulia Tsvetkov and Vered Shwartz and Yejin Choi , journal =. CulturalBench: A Robust, Diverse, and Challenging Cultural Benchmark by Human-AI CulturalTeaming , url =
-
[27]
doi:10.18653/v1/2023.findings-acl.631 , editor =
Palta, Shramay and Rudinger, Rachel , booktitle =. doi:10.18653/v1/2023.findings-acl.631 , editor =
-
[28]
BertaQA: How Much Do Language Models Know About Local Culture? , url =
Julen Etxaniz and Gorka Azkune and Aitor Soroa and Oier Lopez de Lacalle and Mikel Artetxe , bibsource =. BertaQA: How Much Do Language Models Know About Local Culture? , url =. Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, NeurIPS 2024, Vancouver, BC, Canada, December 10 - 15, 2024 ...
work page 2024
-
[29]
Abhinav Rao and Akhila Yerukola and Vishwa Shah and Katharina Reinecke and Maarten Sap , journal =
-
[30]
David Romero and Chenyang Lyu and Haryo Akbarianto Wibowo and Teresa Lynn and Injy Hamed and Aditya Nanda Kishore and Aishik Mandal and Alina Dragonetti and Artem Abzaliev and Atnafu Lambebo Tonja and others , journal =
-
[31]
Feng, Fangxiaoyu and Yang, Yinfei and Cer, Daniel and Arivazhagan, Naveen and Wang, Wei , booktitle =. Language-agnostic. doi:10.18653/v1/2022.acl-long.62 , editor =
-
[32]
Weinberger and Yoav Artzi , bibsource =
Tianyi Zhang and Varsha Kishore and Felix Wu and Kilian Q. Weinberger and Yoav Artzi , bibsource =. BERTScore: Evaluating Text Generation with. 8th International Conference on Learning Representations,
-
[33]
Steering Large Language Model Activations in Sparse Spaces , url =
Reza Bayat and Ali Rahimi-Kalahroudi and Mohammad Pezeshki and Sarath Chandar and Pascal Vincent , journal =. Steering Large Language Model Activations in Sparse Spaces , url =
-
[34]
Language Arithmetics: Towards Systematic Language Neuron Identification and Manipulation , url =
Daniil Gurgurov and Katharina Trinley and Yusser Al Ghussin and Tanja Baeumel and Josef van Genabith and Simon Ostermann , journal =. Language Arithmetics: Towards Systematic Language Neuron Identification and Manipulation , url =
-
[35]
Tiedemann, J. Proceedings of the 22nd Annual Conference of the European Association for Machine Translation , editor =
-
[36]
doi:10.3115/1073083.1073135 , editor =
Papineni, Kishore and Roukos, Salim and Ward, Todd and Zhu, Wei-Jing , booktitle =. doi:10.3115/1073083.1073135 , editor =
-
[37]
WEIRD FAccTs: How western, educated, industrialized, rich, and democratic is FAccT? , year =
Septiandri, Ali Akbar and Constantinides, Marios and Tahaei, Mohammad and Quercia, Daniele , booktitle =. WEIRD FAccTs: How western, educated, industrialized, rich, and democratic is FAccT? , year =
-
[39]
Hellina Hailu Nigatu and John F. Canny and Sarah E. Chasins , bibsource =. Low-Resourced Languages and Online Knowledge Repositories:. Proceedings of the. doi:10.1145/3613904.3642605 , editor =
-
[40]
Kojima, Takeshi and Okimura, Itsuki and Iwasawa, Yusuke and Yanaka, Hitomi and Matsuo, Yutaka , booktitle =. On the Multilingual Ability of Decoder-based Pre-trained Language Models: Finding and Controlling Language-Specific Neurons , url =
-
[41]
The Tatoeba Translation Challenge
Tiedemann, J. The Tatoeba Translation Challenge. Proceedings of the Fifth Conference on Machine Translation , editor =
-
[42]
Unveiling Language-Specific Features in Large Language Models via Sparse Autoencoders , url =
Boyi Deng and Yu Wan and Yidan Zhang and Baosong Yang and Fuli Feng , journal =. Unveiling Language-Specific Features in Large Language Models via Sparse Autoencoders , url =
-
[43]
Self-conditioning Pre-Trained Language Models , url =
Xavier Suau Cuadros and Luca Zappella and Nicholas Apostoloff , bibsource =. Self-conditioning Pre-Trained Language Models , url =. International Conference on Machine Learning,
-
[44]
Tan, Shaomu and Wu, Di and Monz, Christof , journal =. Neuron Specialization: Leveraging intrinsic task modularity for multilingual machine translation , url =
-
[45]
Unveiling a core linguistic region in large language models , url =
Zhao, Jun and Zhang, Zhihao and Ma, Yide and Zhang, Qi and Gui, Tao and Gao, Luhui and Huang, Xuanjing , journal =. Unveiling a core linguistic region in large language models , url =
-
[46]
Beyond English-Centric LLMs: What Language Do Multilingual Language Models Think in? , url =
Zhong, Chengzhi and Cheng, Fei and Liu, Qianying and Jiang, Junfeng and Wan, Zhen and Chu, Chenhui and Murawaki, Yugo and Kurohashi, Sadao , journal =. Beyond English-Centric LLMs: What Language Do Multilingual Language Models Think in? , url =
-
[47]
Language-specific neurons: The key to multilingual capabilities in large language models , url =
Tang, Tianyi and Luo, Wenyang and Huang, Haoyang and Zhang, Dongdong and Wang, Xiaolei and Zhao, Xin and Wei, Furu and Wen, Ji-Rong , journal =. Language-specific neurons: The key to multilingual capabilities in large language models , url =
-
[48]
Do llamas work in english? on the latent language of multilingual transformers , year =
Wendler, Chris and Veselovsky, Veniamin and Monea, Giovanni and West, Robert , booktitle =. Do llamas work in english? on the latent language of multilingual transformers , year =
-
[49]
The llama 3 herd of models , url =
Grattafiori, Aaron and Dubey, Abhimanyu and Jauhri, Abhinav and Pandey, Abhinav and Kadian, Abhishek and Al-Dahle, Ahmad and Letman, Aiesha and Mathur, Akhil and Schelten, Alan and Vaughan, Alex and others , journal =. The llama 3 herd of models , url =
-
[50]
Toy models of superposition , url =
Elhage, Nelson and Hume, Tristan and Olsson, Catherine and Schiefer, Nicholas and Henighan, Tom and Kravec, Shauna and Hatfield-Dodds, Zac and Lasenby, Robert and Drain, Dawn and Chen, Carol and others , journal =. Toy models of superposition , url =
-
[51]
Sparse Autoencoders Find Highly Interpretable Features in Language Models , url =
Robert Huben and Hoagy Cunningham and Logan Riggs and Aidan Ewart and Lee Sharkey , bibsource =. Sparse Autoencoders Find Highly Interpretable Features in Language Models , url =. The Twelfth International Conference on Learning Representations,
-
[52]
Mistral Nemo , url =
-
[53]
Bloom: A 176b-parameter open-access multilingual language model , year =
Le Scao, Teven and Fan, Angela and Akiki, Christopher and Pavlick, Ellie and Ili. Bloom: A 176b-parameter open-access multilingual language model , year =
-
[54]
Sparse Autoencoders Can Capture Language-Specific Concepts Across Diverse Languages , url =
Andrylie, Lyzander Marciano and Rahmanisa, Inaya and Ihsani, Mahardika Krisna and Wicaksono, Alfan Farizki and Wibowo, Haryo Akbarianto and Aji, Alham Fikri , journal =. Sparse Autoencoders Can Capture Language-Specific Concepts Across Diverse Languages , url =
-
[55]
Interpreting GPT: The Logit Lens , url =
Nostalgebraist , journal =. Interpreting GPT: The Logit Lens , url =
-
[56]
Do Multilingual LLMs Think In English? , url =
Lisa Schut and Yarin Gal and Sebastian Farquhar , journal =. Do Multilingual LLMs Think In English? , url =
-
[57]
Llama 2: Open foundation and fine-tuned chat models , url =
Touvron, Hugo and Martin, Louis and Stone, Kevin and Albert, Peter and Almahairi, Amjad and Babaei, Yasmine and Bashlykov, Nikolay and Batra, Soumya and Bhargava, Prajjwal and Bhosale, Shruti and others , journal =. Llama 2: Open foundation and fine-tuned chat models , url =
-
[58]
How do Large Language Models Handle Multilingualism? , url =
Yiran Zhao and Wenxuan Zhang and Guizhen Chen and Kenji Kawaguchi and Lidong Bing , bibsource =. How do Large Language Models Handle Multilingualism? , url =. Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, NeurIPS 2024, Vancouver, BC, Canada, December 10 - 15, 2024 , editor =
work page 2024
-
[59]
On the Cross-lingual Transferability of Monolingual Representations , url =
Artetxe, Mikel and Ruder, Sebastian and Yogatama, Dani , booktitle =. On the Cross-lingual Transferability of Monolingual Representations , url =. doi:10.18653/v1/2020.acl-main.421 , editor =
-
[60]
Language Arithmetics: Towards Systematic Language Neuron Identification and Manipulation , url =
Gurgurov, Daniil and Trinley, Katharina and Al Ghussin, Yusser and Baeumel, Tanja and Genabith, Josef Van and Ostermann, Simon , booktitle =. Language Arithmetics: Towards Systematic Language Neuron Identification and Manipulation , url =
-
[61]
doi:10.18653/v1/D18-1269 , editor =
Conneau, Alexis and Rinott, Ruty and Lample, Guillaume and Williams, Adina and Bowman, Samuel and Schwenk, Holger and Stoyanov, Veselin , booktitle =. doi:10.18653/v1/D18-1269 , editor =
-
[62]
Language-Specific Neurons Do Not Facilitate Cross-Lingual Transfer , url =
Mondal, Soumen Kumar and Sen, Sayambhu and Singhania, Abhishek and Jyothi, Preethi , booktitle =. Language-Specific Neurons Do Not Facilitate Cross-Lingual Transfer , url =. doi:10.18653/v1/2025.insights-1.6 , editor =
-
[63]
A mathematical theory of communication , volume =
Shannon, Claude Elwood , journal =. A mathematical theory of communication , volume =
-
[64]
Yonghui Wu and Mike Schuster and Zhifeng Chen and Quoc V. Le and Mohammad Norouzi and Wolfgang Macherey and Maxim Krikun and Yuan Cao and Qin Gao and Klaus Macherey and Jeff Klingner and Apurva Shah and Melvin Johnson and Xiaobing Liu and Łukasz Kaiser and Stephan Gouws and Yoshikiyo Kato and Taku Kudo and Hideto Kazawa and Keith Stevens and George Kurian...
-
[65]
Adelani, David and Liu, Hannah and Shen, Xiaoyu and Vassilyev, Nikita and Alabi, Jesujoba and Mao, Yanke and Gao, Haonan and Lee, En-Shiun , booktitle =
-
[66]
Rethinking Interpretability in the Era of Large Language Models , url =
Chandan Singh and Jeevana Priya Inala and Michel Galley and Rich Caruana and Jianfeng Gao , journal =. Rethinking Interpretability in the Era of Large Language Models , url =
-
[67]
Tan, Shaomu and Wu, Di and Monz, Christof , booktitle =. Neuron Specialization: Leveraging Intrinsic Task Modularity for Multilingual Machine Translation , url =. doi:10.18653/v1/2024.emnlp-main.374 , editor =
-
[68]
Wikimedia Foundation , title =
-
[69]
Proceedings of the Twelfth Language Resources and Evaluation Conference , editor =
Wenzek, Guillaume and Lachaux, Marie-Anne and Conneau, Alexis and Chaudhary, Vishrav and Guzm. Proceedings of the Twelfth Language Resources and Evaluation Conference , editor =
-
[70]
Concrete Problems in AI Safety , url =
Dario Amodei and Chris Olah and Jacob Steinhardt and Paul Christiano and John Schulman and Dan Mané , journal =. Concrete Problems in AI Safety , url =
-
[71]
Team, Gemma and Riviere, Morgane and Pathak, Shreya and Sessa, Pier Giuseppe and Hardin, Cassidy and Bhupatiraju, Surya and Hussenot, L. ArXiv preprint , title =
- [72]
-
[73]
Mistral-Nemo-Base-2407 , year =
Mistral AI , howpublished =. Mistral-Nemo-Base-2407 , year =
-
[74]
Bag of Tricks for Efficient Text Classification , url =
Joulin, Armand and Grave, Edouard and Bojanowski, Piotr and Mikolov, Tomas , booktitle =. Bag of Tricks for Efficient Text Classification , url =
-
[75]
The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants , url =
Bandarkar, Lucas and Liang, Davis and Muller, Benjamin and Artetxe, Mikel and Shukla, Satya Narayan and Husa, Donald and Goyal, Naman and Krishnan, Abhinandan and Zettlemoyer, Luke and Khabsa, Madian , booktitle =. The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants , url =. doi:10.18653/v1/2024.acl-long.44 , editor =
-
[76]
Phi-2: The surprising power of small language models , volume =
Javaheripi, Mojan and Bubeck, S. Phi-2: The surprising power of small language models , volume =. Microsoft Research Blog , number =
-
[77]
and Stoica, Ion and Xing, Eric P
Chiang, Wei-Lin and Li, Zhuohan and Lin, Zi and Sheng, Ying and Wu, Zhanghao and Zhang, Hao and Zheng, Lianmin and Zhuang, Siyuan and Zhuang, Yonghao and Gonzalez, Joseph E. and Stoica, Ion and Xing, Eric P. , title =
-
[78]
Causal Language Control in Multilingual Transformers via Sparse Feature Steering , year =
Chou, Cheng-Ting and Liu, George and Sun, Jessica and Blondin, Cole and Zhu, Kevin and Sharma, Vasu and O'Brien, Sean , booktitle =. Causal Language Control in Multilingual Transformers via Sparse Feature Steering , year =
-
[79]
Steering llama 2 via contrastive activation addition , url =
Panickssery, Nina and Gabrieli, Nick and Schulz, Julian and Tong, Meg and Hubinger, Evan and Turner, Alexander Matt , journal =. Steering llama 2 via contrastive activation addition , url =
-
[80]
Marks, Samuel and Tegmark, Max , journal =. The geometry of truth: Emergent linear structure in large language model representations of true/false datasets , url =
-
[81]
Axbench: Steering llms? even simple baselines outperform sparse autoencoders , url =
Wu, Zhengxuan and Arora, Aryaman and Geiger, Atticus and Wang, Zheng and Huang, Jing and Jurafsky, Dan and Manning, Christopher D and Potts, Christopher , journal =. Axbench: Steering llms? even simple baselines outperform sparse autoencoders , url =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.