Recognition: 2 theorem links
· Lean TheoremTowards Measuring the Representation of Subjective Global Opinions in Language Models
Pith reviewed 2026-05-16 07:37 UTC · model grok-4.3
The pith
Large language models produce answers that match opinions from the United States and certain European and South American countries more closely than opinions from other nations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Using the GlobalOpinionQA dataset and a country-conditioned similarity metric, the authors show that default LLM responses align more closely with the opinion distributions of the USA and some European and South American countries than with the distributions recorded in other surveyed nations.
What carries the argument
A similarity metric that measures how closely an LLM's survey-style answers match the response distribution collected from each country's human respondents.
If this is right
- Default model outputs will over-represent the views of a subset of countries on contested global issues.
- Explicit country prompts can increase similarity to the target population but may introduce stereotype-like content.
- Machine translation of questions alone does not guarantee that answers will track the opinions of speakers of the target language.
Where Pith is reading between the lines
- Auditing pipelines built on this dataset could be applied to other models to track changes after training updates or fine-tuning.
- The same framework could be extended to track how opinion alignment shifts when models are trained on more geographically balanced data.
- Developers might need separate evaluation tracks for factual questions versus value-laden questions when testing global fairness.
Load-bearing premise
The chosen survey answers serve as a fair and unbiased record of what each country's population actually thinks.
What would settle it
A replication in which the same model produces answers whose similarity scores are statistically indistinguishable across all countries in the dataset.
read the original abstract
Large language models (LLMs) may not equitably represent diverse global perspectives on societal issues. In this paper, we develop a quantitative framework to evaluate whose opinions model-generated responses are more similar to. We first build a dataset, GlobalOpinionQA, comprised of questions and answers from cross-national surveys designed to capture diverse opinions on global issues across different countries. Next, we define a metric that quantifies the similarity between LLM-generated survey responses and human responses, conditioned on country. With our framework, we run three experiments on an LLM trained to be helpful, honest, and harmless with Constitutional AI. By default, LLM responses tend to be more similar to the opinions of certain populations, such as those from the USA, and some European and South American countries, highlighting the potential for biases. When we prompt the model to consider a particular country's perspective, responses shift to be more similar to the opinions of the prompted populations, but can reflect harmful cultural stereotypes. When we translate GlobalOpinionQA questions to a target language, the model's responses do not necessarily become the most similar to the opinions of speakers of those languages. We release our dataset for others to use and build on. Our data is at https://huggingface.co/datasets/Anthropic/llm_global_opinions. We also provide an interactive visualization at https://llmglobalvalues.anthropic.com.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces GlobalOpinionQA, a dataset compiled from cross-national surveys capturing opinions on global issues across countries, along with a similarity metric to compare LLM-generated responses against human responses conditioned on country. Experiments on a Constitutional AI-trained LLM (helpful, honest, and harmless) show that default outputs align more closely with opinions from the USA and certain European and South American populations; country-specific prompting shifts alignments toward the prompted group but risks cultural stereotypes; and translating questions to a target language does not reliably increase similarity to speakers of that language. The dataset is released publicly with an interactive visualization.
Significance. If the metric and findings are robust, the work supplies a practical, reproducible framework for quantifying cultural and geographic biases in LLM opinion representation, with direct relevance to fairness, alignment, and global equity concerns in NLP. The public release of GlobalOpinionQA and the visualization tool strengthens the contribution by enabling independent verification and extension.
major comments (3)
- [Section 2] Dataset construction (Section 2): The paper treats responses from the source cross-national surveys as representative ground truth for each country's population-level opinions without discussing or validating against known sampling issues (non-probability sampling, urban/educated oversampling, or small N in non-Western countries). Because the similarity metric and all three experiments rest on this assumption, the central claim that LLMs over-align with USA/European/South American opinions could instead track survey-participant demographics rather than true national opinions.
- [Section 3] Similarity metric definition (Section 3): The exact formula for the similarity metric, the statistical tests used to compare distributions, and any controls for question difficulty or response length are not specified. This under-specification directly affects the reliability and interpretability of the quantitative results reported for the default, prompting, and translation experiments.
- [Section 5.2] Prompting experiment (Section 5.2): The claim that country-specific prompting can produce harmful cultural stereotypes is stated but lacks concrete examples, frequency counts, or a systematic analysis of stereotype content, making it difficult to evaluate the practical severity of this side effect.
minor comments (2)
- [Abstract] The abstract refers to 'some European and South American countries' without naming them; listing the specific countries with highest alignment would improve precision.
- [Section 2] The paper provides dataset and visualization links but omits basic summary statistics (number of questions, countries covered, response distributions per question) that would help readers assess coverage and balance.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed feedback. We address each major comment below with point-by-point responses and indicate planned revisions to improve the manuscript's clarity and rigor.
read point-by-point responses
-
Referee: [Section 2] Dataset construction (Section 2): The paper treats responses from the source cross-national surveys as representative ground truth for each country's population-level opinions without discussing or validating against known sampling issues (non-probability sampling, urban/educated oversampling, or small N in non-Western countries). Because the similarity metric and all three experiments rest on this assumption, the central claim that LLMs over-align with USA/European/South American opinions could instead track survey-participant demographics rather than true national opinions.
Authors: We acknowledge that the cross-national surveys (e.g., Pew, World Values Survey) underlying GlobalOpinionQA have well-documented sampling limitations, including non-probability methods, urban/educated oversampling, and smaller samples in some non-Western countries. Our framework treats these responses as the best available empirical proxy for reported national opinions rather than claiming they represent 'true' population opinions; the similarity metric is therefore relative to the observed survey distributions. We agree a more explicit discussion is warranted and will revise Section 2 to describe the surveys' sampling characteristics and add a dedicated limitations subsection that caveats all findings accordingly. revision: yes
-
Referee: [Section 3] Similarity metric definition (Section 3): The exact formula for the similarity metric, the statistical tests used to compare distributions, and any controls for question difficulty or response length are not specified. This under-specification directly affects the reliability and interpretability of the quantitative results reported for the default, prompting, and translation experiments.
Authors: We apologize for the under-specification. The similarity metric averages a distributional divergence (Jensen-Shannon) between the LLM's response distribution and the human response distribution per question, conditioned on country. Statistical comparisons use permutation tests. No explicit controls for question difficulty or response length appear in the main results, though supplementary checks were performed. In revision we will state the precise formula in Section 3, detail the statistical procedures, and add an appendix with robustness analyses that control for response length and question characteristics. revision: yes
-
Referee: [Section 5.2] Prompting experiment (Section 5.2): The claim that country-specific prompting can produce harmful cultural stereotypes is stated but lacks concrete examples, frequency counts, or a systematic analysis of stereotype content, making it difficult to evaluate the practical severity of this side effect.
Authors: We agree that the stereotype observation requires more concrete support. Our experiments identified instances in which country-specific prompts elicited stereotypical content (e.g., associating particular nationalities with fixed economic or social traits). We will expand Section 5.2 with representative examples, report the approximate frequency of such outputs across the tested prompts based on manual inspection, and include a short qualitative categorization of the stereotype types observed. revision: yes
Circularity Check
No circularity; framework uses external survey data as benchmark
full rationale
The paper constructs GlobalOpinionQA from independent cross-national surveys and defines a similarity metric directly against those human responses conditioned on country. No parameters are fitted to the target similarity values, no self-citation chain justifies the core measurement, and the derivation does not reduce to self-definition or renaming. The central claims rest on external benchmarks rather than internal fits or author-specific uniqueness theorems.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Responses collected in cross-national surveys accurately reflect the distribution of opinions within each country's population.
Forward citations
Cited by 20 Pith papers
-
LLM-Based Persuasion Enables Guardrail Override in Frontier LLMs
LLM attackers persuade frontier LLMs to generate prohibited essays on consensus topics through multi-turn natural-language pressure, with success rates up to 100% in some model-topic pairs.
-
StereoTales: A Multilingual Framework for Open-Ended Stereotype Discovery in LLMs
StereoTales shows that all tested LLMs emit harmful stereotypes in open-ended stories, with associations adapting to prompt language and targeting locally salient groups rather than transferring uniformly across languages.
-
StereoTales: A Multilingual Framework for Open-Ended Stereotype Discovery in LLMs
StereoTales shows that LLMs produce harmful, culturally adapted stereotypes in open-ended multilingual stories, with patterns consistent across providers and aligned human-LLM harm judgments.
-
Cross-Family Universality of Behavioral Axes via Anchor-Projected Representations
Behavioral directions from one LLM family transfer to others via projection into a shared anchor coordinate space, yielding 0.83 ten-way detection accuracy and steering effects up to 0.46% on held-out models.
-
XL-SafetyBench: A Country-Grounded Cross-Cultural Benchmark for LLM Safety and Cultural Sensitivity
XL-SafetyBench is a new cross-cultural benchmark showing frontier LLMs decouple jailbreak robustness from cultural sensitivity while local models trade off attack success against neutral-safe rates in a near-linear pa...
-
Large Language Models Exhibit Normative Conformity
Large language models exhibit normative conformity in addition to informational conformity, and subtle social context can direct which group they conform to.
-
C-Mining: Unsupervised Discovery of Seeds for Cultural Data Synthesis via Geometric Misalignment
C-Mining automatically mines high-fidelity Culture Points from raw multilingual text by treating cross-lingual geometric isolation in embeddings as a quantifiable signal for cultural specificity, then uses them to syn...
-
Overtrained, Not Misaligned
Emergent misalignment arises from overtraining after primary task convergence and is preventable by early stopping, which retains 93% of task performance on average.
-
Training-Free Cultural Alignment of Large Language Models via Persona Disagreement
DISCA uses disagreement among WVS-grounded persona panels to apply loss-averse logit corrections that reduce cultural misalignment by 10-24% on MultiTP for models 3.8B and larger, without weight changes.
-
Positive Alignment: Artificial Intelligence for Human Flourishing
Positive Alignment introduces AI systems that support human flourishing pluralistically and proactively while remaining safe, as a necessary complement to traditional safety-focused alignment research.
-
Pseudo-Deliberation in Language Models: When Reasoning Fails to Align Values and Actions
LLMs exhibit pseudo-deliberation, with consistent value-action misalignment in generated dialogues despite reasoning, as measured by the new VALDI framework across 4941 scenarios.
-
The Collapse of Heterogeneity in Silicon Philosophers
Large language models collapse philosophical heterogeneity by over-correlating judgments across domains, creating artificial consensus unlike the views of 277 professional philosophers.
-
Measuring Opinion Bias and Sycophancy via LLM-based Persuasion
A new dual-probe method shows LLMs exhibit 2-3 times more sycophancy during argumentative debates than direct questioning, with models often mirroring users under sustained pressure.
-
Whose Story Gets Told? Positionality and Bias in LLM Summaries of Life Narratives
A proposed pipeline shows LLMs introduce detectable race and gender biases when summarizing life narratives, creating potential for representational harm in research.
-
Cultural Authenticity: Comparing LLM Cultural Representations to Native Human Expectations
LLMs display Western-centric cultural representations that align poorly with native priorities in non-Western countries and share highly correlated error patterns.
-
Representational Harms in LLM-Generated Narratives Against Global Majority Nationalities
LLMs generate narratives containing persistent stereotypes, erasure, and one-dimensional portrayals of Global Majority national identities, with minoritized groups overrepresented in subordinated roles by more than fi...
-
When Bigger Isn't Better: A Comprehensive Fairness Evaluation of Political Bias in Multi-News Summarisation
Mid-sized LLMs outperform larger models on fairness in multi-document news summarization, with entity sentiment bias proving hardest to mitigate across prompt and judge-based interventions.
-
Normative Common Ground Replication (NormCoRe): Replication-by-Translation for Studying Norms in Multi-Agent AI
NormCoRe is a replication-by-translation framework that maps human subject studies onto multi-agent AI environments, showing AI normative judgments on fairness differ from human baselines and vary with model choice an...
-
Positive Alignment: Artificial Intelligence for Human Flourishing
Positive Alignment is introduced as a distinct AI agenda that supports human flourishing through pluralistic and context-sensitive design, complementing traditional safety-focused alignment.
-
SemEval-2026 Task 7: Everyday Knowledge Across Diverse Languages and Cultures
SemEval-2026 Task 7 presents a benchmark and two evaluation tracks for assessing LLMs on everyday knowledge in diverse languages and cultures without allowing training on the test data.
Reference graph
Works this paper leans on
-
[1]
Persistent anti-muslim bias in large language models
Abubakar Abid, Maheen Farooqi, and James Zou. Persistent anti-muslim bias in large language models. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, AIES ’21, page 298–306, New York, NY , USA, 2021. Association for Comput- ing Machinery. ISBN 9781450384735. doi: 10.1145/3461702.3462624. URL https: //doi.org/10.1145/3461702.3462624
-
[2]
Subjective natural language problems: Motivations, applications, characterizations, and implications
Cecilia Ovesdotter Alm. Subjective natural language problems: Motivations, applications, characterizations, and implications. In Proceedings of the 49th Annual Meeting of the Associa- tion for Computational Linguistics: Human Language Technologies: Short Papers - Volume 2, HLT ’11, page 107–112, USA, 2011. Association for Computational Linguistics. ISBN 9...
work page 2011
-
[3]
Probing pre-trained language models for cross-cultural differences in values
Arnav Arora, Lucie-aimée Kaffee, and Isabelle Augenstein. Probing pre-trained language models for cross-cultural differences in values. In Proceedings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP), pages 114–130, Dubrovnik, Croatia, May
-
[4]
URL https://aclanthology.org/2023
Association for Computational Linguistics. URL https://aclanthology.org/2023. c3nlp-1.12
work page 2023
-
[5]
A general language assistant as a laboratory for alignment, 2021
Amanda Askell, Yuntao Bai, Anna Chen, Dawn Drain, Deep Ganguli, Tom Henighan, Andy Jones, Nicholas Joseph, Ben Mann, Nova DasSarma, Nelson Elhage, Zac Hatfield-Dodds, Danny Hernandez, Jackson Kernion, Kamal Ndousse, Catherine Olsson, Dario Amodei, Tom Brown, Jack Clark, Sam McCandlish, Chris Olah, and Jared Kaplan. A general language assistant as a labora...
work page 2021
-
[6]
Yuntao Bai, Andy Jones, Kamal Ndousse, Amanda Askell, Anna Chen, Nova DasSarma, Dawn Drain, Stanislav Fort, Deep Ganguli, T. J. Henighan, Nicholas Joseph, Saurav Kadavath, John Kernion, Tom Conerly, Sheer El-Showk, Nelson Elhage, Zac Hatfield-Dodds, Danny Hernandez, Tristan Hume, Scott Johnston, Shauna Kravec, Liane Lovitt, Neel Nanda, Catherine Olsson, D...
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[7]
Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, Carol Chen, Catherine Olsson, Christopher Olah, Danny Hernandez, Dawn Drain, Deep Ganguli, Dustin Li, Eli Tran-Johnson, Ethan Perez, Jamie Kerr, Jared Mueller, Jeffrey Ladish, Joshua Landau, Kamal Ndousse, K...
work page 2022
-
[8]
Solon Barocas and Andrew D. Selbst. Big data’s disparate impact. California Law Review, 104: 671, 2016
work page 2016
-
[9]
and Gebru, Timnit and McMillan-Major, Angelina and Shmitchell, Shmargaret , title =
Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’21, page 610–623, 11 New York, NY , USA, 2021. Association for Computing Machinery. ISBN 9781450383097. doi:...
-
[10]
Adam J. Berinsky. Measuring public opinion with surveys. Annual Review of Political Science, 20(1):309–329, 2017. doi: 10.1146/annurev-polisci-101513-113724. URL https://doi. org/10.1146/annurev-polisci-101513-113724
-
[11]
Language (technol- ogy) is power: A critical survey of “bias” in NLP
Su Lin Blodgett, Solon Barocas, Hal Daumé III, and Hanna Wallach. Language (technol- ogy) is power: A critical survey of “bias” in NLP. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , pages 5454–5476, Online, July
-
[12]
doi: 10.18653/v1/2020.acl-main.485
Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.485. URL https://aclanthology.org/2020.acl-main.485
-
[13]
Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S
Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, S. Buch, Dallas Card, Rodrigo Castellon, Niladri S. Chatterji, Annie S. Chen, Kathleen A. Creel, Jared Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano...
work page 2021
-
[14]
Creel, Ananya Kumar, Dan Jurafsky, and Percy S Liang
Rishi Bommasani, Kathleen A. Creel, Ananya Kumar, Dan Jurafsky, and Percy S Liang. Picking on the same person: Does algorithmic monoculture lead to outcome homogenization? In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 3663–3678. Curran Associates, Inc., 20...
work page 2022
-
[15]
Shikha Bordia and Samuel R. Bowman. Identifying and reducing gender bias in word-level language models. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop , pages 7–15, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi: 10.18653/ v1...
work page 2019
-
[16]
Language models are few-shot learners
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhari- wal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agar- wal, Ariel Herbert-V oss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Ma- teusz Litwin, S...
work page 1901
-
[17]
Identity and interaction: a sociocultural linguistic approach
Mary Bucholtz and Kira Hall. Identity and interaction: a sociocultural linguistic approach. Discourse Studies, 7(4-5):585–614, 2005. doi: 10.1177/1461445605054407. URL https: //doi.org/10.1177/1461445605054407. 12
-
[18]
Stephen Cave and Kanta Dihal. The whiteness of ai. Philosophy & Technology, 33:1–19, 12
-
[19]
doi: 10.1007/s13347-020-00415-6
-
[20]
Marked personas: Using natural language prompts to measure stereotypes in language models, 2023
Myra Cheng, Esin Durmus, and Dan Jurafsky. Marked personas: Using natural language prompts to measure stereotypes in language models, 2023
work page 2023
-
[21]
Deep reinforcement learning from human preferences
Paul F Christiano, Jan Leike, Tom Brown, Miljan Martic, Shane Legg, and Dario Amodei. Deep reinforcement learning from human preferences. In I. Guyon, U. V on Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, edi- tors, Advances in Neural Information Processing Systems , volume 30. Curran Associates, Inc., 2017. URL https://proce...
work page 2017
-
[22]
No Language Left Behind: Scaling Human-Centered Machine Translation
Marta R Costa-jussà, James Cross, Onur Çelebi, Maha Elbayad, Kenneth Heafield, Kevin Heffernan, Elahe Kalbassi, Janice Lam, Daniel Licht, Jean Maillard, et al. No language left behind: Scaling human-centered machine translation. arXiv preprint arXiv:2207.04672, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[23]
Aida Mostafazadeh Davani, Mark Díaz, and Vinodkumar Prabhakaran. Dealing with dis- agreements: Looking beyond the majority vote in subjective annotations. Transactions of the Association for Computational Linguistics , 10:92–110, 2022. doi: 10.1162/tacl\_a\_00449. URL https://aclanthology.org/2022.tacl-1.6
work page internal anchor Pith review doi:10.1162/tacl 2022
-
[24]
Western civilization, our tradition, Nov 2020
Fad-Admin. Western civilization, our tradition, Nov 2020. URL https://isi.org/ intercollegiate-review/western-civilization-our-tradition/
work page 2020
-
[25]
Measuring diversity of artificial intelligence conferences
Ana Freire, Lorenzo Porcaro, and Emilia Gómez. Measuring diversity of artificial intelligence conferences. In Deepti Lamba and William H. Hsu, editors, Proceedings of 2nd Workshop on Diversity in Artificial Intelligence (AIDBEI), volume 142 of Proceedings of Machine Learning Research, pages 39–50. PMLR, 09 Feb 2021. URL https://proceedings.mlr.press/ v142...
work page 2021
-
[26]
Artificial intelligence, values, and alignment
Iason Gabriel. Artificial intelligence, values, and alignment. Minds and Machines , 30(3): 411–437, sep 2020. doi: 10.1007/s11023-020-09539-2. URL https://doi.org/10.1007% 2Fs11023-020-09539-2
-
[27]
The challenge of value alignment: from fairer algorithms to AI safety
Iason Gabriel and Vafa Ghazavi. The challenge of value alignment: from fairer algorithms to AI safety. CoRR, abs/2101.06060, 2021. URL https://arxiv.org/abs/2101.06060
-
[28]
Predictability and surprise in large generative models
Deep Ganguli, Danny Hernandez, Liane Lovitt, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, Nova Dassarma, Dawn Drain, Nelson Elhage, Sheer El Showk, Stanislav Fort, Zac Hatfield-Dodds, Tom Henighan, Scott Johnston, Andy Jones, Nicholas Joseph, Jackson Kernian, Shauna Kravec, Ben Mann, Neel Nanda, Kamal Ndousse, Catherine Olsson, Daniela Amodei, Tom B...
-
[29]
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli, Liane Lovitt, Jackson Kernion, Amanda Askell, Yuntao Bai, Saurav Kadavath, Ben Mann, Ethan Perez, Nicholas Schiefer, Kamal Ndousse, et al. Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned. arXiv preprint arXiv:2209.07858, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[30]
Deep Ganguli, Amanda Askell, Nicholas Schiefer, Thomas I. Liao, Kamil˙e Lukoši¯ut˙e, Anna Chen, Anna Goldie, Azalia Mirhoseini, Catherine Olsson, Danny Hernandez, Dawn Drain, Dustin Li, Eli Tran-Johnson, Ethan Perez, Jackson Kernion, Jamie Kerr, Jared Mueller, Joshua Landau, Kamal Ndousse, Karina Nguyen, Liane Lovitt, Michael Sellitto, Nelson Elhage, Noem...
work page 2023
-
[31]
Samuel Gehman, Suchin Gururangan, Maarten Sap, Yejin Choi, and Noah A. Smith. Real- ToxicityPrompts: Evaluating neural toxic degeneration in language models. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 3356–3369, Online, November
work page 2020
-
[32]
doi: 10.18653/v1/2020.findings-emnlp.301
Association for Computational Linguistics. doi: 10.18653/v1/2020.findings-emnlp.301. URL https://aclanthology.org/2020.findings-emnlp.301. 13
-
[33]
Improving alignment of dialogue agents via targeted human judgements, 2022
Amelia Glaese, Nat McAleese, Maja Tr˛ ebacz, John Aslanides, Vlad Firoiu, Timo Ewalds, Mari- beth Rauh, Laura Weidinger, Martin Chadwick, Phoebe Thacker, Lucy Campbell-Gillingham, Jonathan Uesato, Po-Sen Huang, Ramona Comanescu, Fan Yang, Abigail See, Sumanth Dathathri, Rory Greig, Charlie Chen, Doug Fritz, Jaume Sanchez Elias, Richard Green, Soˇna Mokrá,...
work page 2022
-
[34]
Gordon, Kaitlyn Zhou, Kayur Patel, Tatsunori Hashimoto, and Michael S
Mitchell L. Gordon, Kaitlyn Zhou, Kayur Patel, Tatsunori Hashimoto, and Michael S. Bernstein. The disagreement deconvolution: Bringing machine learning performance metrics in line with re- ality. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, CHI ’21, New York, NY , USA, 2021. Association for Computing Machinery. ISBN 978...
-
[35]
Mitchell L. Gordon, Michelle S. Lam, Joon Sung Park, Kayur Patel, Jeff Hancock, Tatsunori Hashimoto, and Michael S. Bernstein. Jury learning: Integrating dissenting voices into machine learning models. In CHI Conference on Human Factors in Computing Systems. ACM, apr 2022. doi: 10.1145/3491102.3502004. URL https://doi.org/10.1145%2F3491102.3502004
-
[36]
Kivlichan, Rachel Rosen, and Lucy Vasserman
Nitesh Goyal, Ian D. Kivlichan, Rachel Rosen, and Lucy Vasserman. Is your toxicity my toxicity? exploring the impact of rater identity on toxicity annotation. Proceedings of the ACM on Human-Computer Interaction, 6:1–28, 2022
work page 2022
-
[37]
World values survey: Round seven – country-pooled datafile version 5.0.0, 2022
Christian Haerpfer, Ronald Inglehart, Alejandro Moreno, Christian Welzel, Kseniya Kizilova, Jaime Diez-Medrano, Milena Lagos, Pippa Norris, Eduard Ponarin, and Bianca Puranen. World values survey: Round seven – country-pooled datafile version 5.0.0, 2022
work page 2022
-
[38]
Jochen Hartmann, Jasper Schwenzow, and Maximilian Witte. The political ideology of conver- sational ai: Converging evidence on chatgpt’s pro-environmental, left-libertarian orientation, 2023
work page 2023
-
[39]
ToxiGen: A large-scale machine-generated dataset for adversarial and implicit hate speech detection
Thomas Hartvigsen, Saadia Gabriel, Hamid Palangi, Maarten Sap, Dipankar Ray, and Ece Kamar. ToxiGen: A large-scale machine-generated dataset for adversarial and implicit hate speech detection. In Proceedings of the 60th Annual Meeting of the Association for Com- putational Linguistics (Volume 1: Long Papers) , pages 3309–3326, Dublin, Ireland, May
-
[40]
doi: 10.18653/v1/2022.acl-long.234
Association for Computational Linguistics. doi: 10.18653/v1/2022.acl-long.234. URL https://aclanthology.org/2022.acl-long.234
-
[41]
Aligning {ai} with shared human values
Dan Hendrycks, Collin Burns, Steven Basart, Andrew Critch, Jerry Li, Dawn Song, and Jacob Steinhardt. Aligning {ai} with shared human values. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=dNy_RKzJacY
work page 2021
-
[42]
Joseph Henrich, Steven J. Heine, and Ara Norenzayan. The weirdest people in the world? Behavioral and Brain Sciences , 33(2-3):61–83, June 2010. ISSN 1469-1825. URL http: //journals.cambridge.org/abstract_S0140525X0999152X
work page 2010
-
[43]
Dirk Hovy and Diyi Yang. The importance of modeling social factors of language: Theory and practice. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 588– 602, Online, June 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021. naacl-ma...
-
[44]
URL https: //aclanthology.org/2023.rocling-1.1/
Ben Hutchinson, Vinodkumar Prabhakaran, Emily Denton, Kellie Webster, Yu Zhong, and Stephen Denuyl. Social biases in NLP models as barriers for persons with disabilities. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5491–5501, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v...
-
[45]
Co-writing with opinionated language models affects users’ views
Maurice Jakesch, Advait Bhat, Daniel Buschek, Lior Zalmanson, and Mor Naaman. Co-writing with opinionated language models affects users’ views. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, CHI ’23, New York, NY , USA, 2023. Association for Computing Machinery. ISBN 9781450394215. doi: 10.1145/3544548.3581196. URL https:...
-
[46]
CommunityLM: Probing partisan worldviews from language models
Hang Jiang, Doug Beeferman, Brandon Roy, and Deb Roy. CommunityLM: Probing partisan worldviews from language models. In Proceedings of the 29th International Conference on Computational Linguistics, pages 6818–6826, Gyeongju, Republic of Korea, October 2022. 14 International Committee on Computational Linguistics. URL https://aclanthology.org/ 2022.coling-1.593
work page 2022
-
[47]
The ghost in the machine has an american accent: value conflict in gpt-3, 2022
Rebecca L Johnson, Giada Pistilli, Natalia Menédez-González, Leslye Denisse Dias Duran, Enrico Panai, Julija Kalpokiene, and Donald Jay Bertulfo. The ghost in the machine has an american accent: value conflict in gpt-3, 2022
work page 2022
-
[48]
The state and fate of linguistic diversity and inclusion in the NLP world
Pratik Joshi, Sebastin Santy, Amar Budhiraja, Kalika Bali, and Monojit Choudhury. The state and fate of linguistic diversity and inclusion in the NLP world. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 6282–6293, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-mai...
-
[49]
Language models (mostly) know what they know, 2022
Saurav Kadavath, Tom Conerly, Amanda Askell, Tom Henighan, Dawn Drain, Ethan Perez, Nicholas Schiefer, Zac Hatfield-Dodds, Nova DasSarma, Eli Tran-Johnson, Scott Johnston, Sheer El-Showk, Andy Jones, Nelson Elhage, Tristan Hume, Anna Chen, Yuntao Bai, Sam Bowman, Stanislav Fort, Deep Ganguli, Danny Hernandez, Josh Jacobson, Jackson Kernion, Shauna Kravec,...
work page 2022
-
[50]
Don’t ask if artificial intelligence is good or fair, ask how it shifts power
Pratyusha Kalluri. Don’t ask if artificial intelligence is good or fair, ask how it shifts power. Nature, 583:169, 2020
work page 2020
-
[51]
Estimating the Personality of White-Box Language Models
Saketh Reddy Karra, Son The Nguyen, and Theja Tulabandhula. Estimating the Personality of White-Box Language Models. arXiv e-prints, art. arXiv:2204.12000, April 2022. doi: 10.48550/arXiv.2204.12000
-
[52]
In conversation with artificial intelligence: aligning language models with human values
Atoosa Kasirzadeh and Iason Gabriel. In conversation with artificial intelligence: aligning language models with human values. Philosophy & Technology, 36(2):1–24, 2023
work page 2023
-
[53]
When do pre-training biases propagate to downstream tasks? a case study in text summarization
Faisal Ladhak, Esin Durmus, Mirac Suzgun, Tianyi Zhang, Dan Jurafsky, Kathleen McKeown, and Tatsunori Hashimoto. When do pre-training biases propagate to downstream tasks? a case study in text summarization. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 3206–3219, Dubrovnik, Croatia, May
-
[54]
URL https://aclanthology.org/2023
Association for Computational Linguistics. URL https://aclanthology.org/2023. eacl-main.234
work page 2023
-
[55]
Evaluating the output of machine translation systems
Alon Lavie. Evaluating the output of machine translation systems. In Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Tutorials, 2010
work page 2010
-
[56]
Towards understanding and mitigating social biases in language models
Paul Pu Liang, Chiyu Wu, Louis-Philippe Morency, and Ruslan Salakhutdinov. Towards understanding and mitigating social biases in language models. In International Conference on Machine Learning, pages 6565–6576. PMLR, 2021
work page 2021
-
[57]
Manning, Christopher Ré, Diana Acosta-Navas, Drew A
Percy Liang, Rishi Bommasani, Tony Lee, Dimitris Tsipras, Dilara Soylu, Michihiro Yasunaga, Yian Zhang, Deepak Narayanan, Yuhuai Wu, Ananya Kumar, Benjamin Newman, Binhang Yuan, Bobby Yan, Ce Zhang, Christian Cosgrove, Christopher D. Manning, Christopher Ré, Diana Acosta-Navas, Drew A. Hudson, Eric Zelikman, Esin Durmus, Faisal Ladhak, Frieda Rong, Hongyu...
work page 2022
-
[58]
URL https://aclanthology.org/2022.acl-long.556
Yao Lu, Max Bartolo, Alastair Moore, Sebastian Riedel, and Pontus Stenetorp. Fantasti- cally ordered prompts and where to find them: Overcoming few-shot prompt order sensi- tivity. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages 8086–8098, Dublin, Ireland, May 2022. Asso- ciation f...
-
[59]
Gender and representation bias in GPT-3 generated stories
Li Lucy and David Bamman. Gender and representation bias in GPT-3 generated stories. In Proceedings of the Third Workshop on Narrative Understanding, pages 48–55, Virtual, June
-
[60]
doi: 10.18653/v1/2021.nuse-1.5
Association for Computational Linguistics. doi: 10.18653/v1/2021.nuse-1.5. URL https://aclanthology.org/2021.nuse-1.5. 15
-
[61]
S. McConnell-Ginet. Words Matter: Meaning and Power. Cambridge University Press, 2020. ISBN 9781108427210. URL https://books.google.com/books?id=gKVTzQEACAAJ
work page 2020
-
[62]
StereoSet: Measuring stereotypical bias in pretrained language models
Moin Nadeem, Anna Bethke, and Siva Reddy. StereoSet: Measuring stereotypical bias in pretrained language models. In Proceedings of the 59th Annual Meeting of the Asso- ciation for Computational Linguistics and the 11th International Joint Conference on Nat- ural Language Processing (Volume 1: Long Papers) , pages 5356–5371, Online, August
-
[63]
doi: 10.18653/v1/2021.acl-long.416
Association for Computational Linguistics. doi: 10.18653/v1/2021.acl-long.416. URL https://aclanthology.org/2021.acl-long.416
-
[64]
Nationality bias in text generation
Pranav Narayanan Venkit, Sanjana Gautam, Ruchi Panchanadikar, Ting-Hao Huang, and Shomir Wilson. Nationality bias in text generation. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics , pages 116–122, Dubrovnik, Croatia, May 2023. Association for Computational Linguistics. URL https: //aclanthol...
work page 2023
-
[65]
Joon Sung Park, Lindsay Popowski, Carrie Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. Social simulacra: Creating populated prototypes for social computing systems. In Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology, UIST ’22, New York, NY , USA, 2022. Association for Computing Machinery. ISBN 9...
-
[66]
Bender, Emily Denton, and Alex Hanna
Amandalynne Paullada, Inioluwa Deborah Raji, Emily M. Bender, Emily Denton, and Alex Hanna. Data and its (dis)contents: A survey of dataset development and use in machine learning research. Patterns, 2(11):100336, 2021. ISSN 2666-3899. doi: https://doi.org/10.1016/ j.patter.2021.100336. URL https://www.sciencedirect.com/science/article/pii/ S2666389921001847
-
[67]
Red Teaming Language Models with Language Models
Ethan Perez, Saffron Huang, H. Francis Song, Trevor Cai, Roman Ring, John Aslanides, Amelia Glaese, Nat McAleese, and Geoffrey Irving. Red teaming language models with language models. CoRR, abs/2202.03286, 2022. URL https://arxiv.org/abs/2202.03286
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[68]
Ethan Perez, Sam Ringer, Kamil˙e Lukoši¯ut˙e, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, Andy Jones, Anna Chen, Ben Mann, Brian Israel, Bryan Seethor, Cameron McKinnon, Christopher Olah, Da Yan, Daniela Amodei, Dario Amodei, Dawn Drain, Dustin Li, Eli Tran-Johnson, Guro Khundadze, Jackson Kern...
work page 2022
-
[69]
A human rights-based approach to responsible ai, 2022
Vinodkumar Prabhakaran, Margaret Mitchell, Timnit Gebru, and Iason Gabriel. A human rights-based approach to responsible ai, 2022
work page 2022
-
[70]
Jack W. Rae, Sebastian Borgeaud, Trevor Cai, Katie Millican, Jordan Hoffmann, Francis Song, John Aslanides, Sarah Henderson, Roman Ring, Susannah Young, Eliza Rutherford, Tom Hennigan, Jacob Menick, Albin Cassirer, Richard Powell, George van den Driessche, Lisa Anne Hendricks, Maribeth Rauh, Po-Sen Huang, Amelia Glaese, Johannes Welbl, Sumanth Dathathri, ...
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[71]
Bender, Alex Hanna, and Amandalynne Paullada
Deborah Raji, Emily Denton, Emily M. Bender, Alex Hanna, and Amandalynne Paullada. Ai and the everything in the whole wide world benchmark. In J. Van- schoren and S. Yeung, editors, Proceedings of the Neural Information Process- ing Systems Track on Datasets and Benchmarks , volume 1. Curran, 2021. URL https://datasets-benchmarks-proceedings.neurips.cc/pa...
work page 2021
-
[72]
Characteristics of harmful text: Towards rigorous benchmarking of language models, 2022
Maribeth Rauh, John Mellor, Jonathan Uesato, Po-Sen Huang, Johannes Welbl, Laura Wei- dinger, Sumanth Dathathri, Amelia Glaese, Geoffrey Irving, Iason Gabriel, William Isaac, and Lisa Anne Hendricks. Characteristics of harmful text: Towards rigorous benchmarking of language models, 2022
work page 2022
-
[73]
Why You Should Do NLP Beyond English
Sebastian Ruder. Why You Should Do NLP Beyond English. http://ruder.io/ nlp-beyond-english, 2020
work page 2020
-
[74]
Re-imagining algorithmic fairness in india and beyond
Nithya Sambasivan, Erin Arnesen, Ben Hutchinson, Tulsee Doshi, and Vinodkumar Prabhakaran. Re-imagining algorithmic fairness in india and beyond. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’21, page 315–328, New York, NY , USA, 2021. Association for Computing Machinery. ISBN 9781450383097. doi: 10.1145/3...
-
[75]
Whose opinions do language models reflect?, 2023
Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. Whose opinions do language models reflect?, 2023
work page 2023
-
[76]
S2ORC: The Semantic Scholar Open Research Corpus
Maarten Sap, Saadia Gabriel, Lianhui Qin, Dan Jurafsky, Noah A. Smith, and Yejin Choi. Social bias frames: Reasoning about social and power implications of language. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5477–5490, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/202...
-
[77]
URL https://aclanthology.org/2020.acl-main.486
work page 2020
-
[78]
Maarten Sap, Swabha Swayamdipta, Laura Vianna, Xuhui Zhou, Yejin Choi, and Noah A. Smith. Annotators with attitudes: How annotator beliefs and identities bias toxic language detection. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , pages 5884–5906, Seattle...
-
[79]
Andrew D. Selbst, Danah Boyd, Sorelle A. Friedler, Suresh Venkatasubramanian, and Janet Vertesi. Fairness and abstraction in sociotechnical systems. In Proceedings of the Conference on Fairness, Accountability, and Transparency, FAT*’19, page 59–68, New York, NY , USA, 2019. Association for Computing Machinery. ISBN 9781450361255. doi: 10.1145/3287560.328...
-
[80]
The woman worked as a babysitter: On biases in language generation
Emily Sheng, Kai-Wei Chang, Premkumar Natarajan, and Nanyun Peng. The woman worked as a babysitter: On biases in language generation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3407–3412, Hong Kong, China, November...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.