pith. machine review for the scientific record. sign in

arxiv: 2306.16388 · v2 · submitted 2023-06-28 · 💻 cs.CL · cs.AI

Recognition: 2 theorem links

· Lean Theorem

Towards Measuring the Representation of Subjective Global Opinions in Language Models

Authors on Pith no claims yet

Pith reviewed 2026-05-16 07:37 UTC · model grok-4.3

classification 💻 cs.CL cs.AI
keywords language modelsglobal opinionsopinion biassurvey responsescross-national dataresponse similarity
0
0 comments X

The pith

Large language models produce answers that match opinions from the United States and certain European and South American countries more closely than opinions from other nations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper builds a dataset of questions and answers drawn from cross-national surveys to capture how people in different countries actually respond to the same issues. It then defines a metric that scores how similar a model's generated answers are to the human answers collected from each country. Experiments on one helpful-and-harmless model show that its default outputs line up best with responses from the United States and selected European and South American populations. When the model is explicitly told to answer from a particular country's viewpoint, the similarity shifts toward that population but can also reproduce cultural stereotypes. Translating the survey questions into another language does not automatically make the model's answers match the views of people who speak that language.

Core claim

Using the GlobalOpinionQA dataset and a country-conditioned similarity metric, the authors show that default LLM responses align more closely with the opinion distributions of the USA and some European and South American countries than with the distributions recorded in other surveyed nations.

What carries the argument

A similarity metric that measures how closely an LLM's survey-style answers match the response distribution collected from each country's human respondents.

If this is right

  • Default model outputs will over-represent the views of a subset of countries on contested global issues.
  • Explicit country prompts can increase similarity to the target population but may introduce stereotype-like content.
  • Machine translation of questions alone does not guarantee that answers will track the opinions of speakers of the target language.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Auditing pipelines built on this dataset could be applied to other models to track changes after training updates or fine-tuning.
  • The same framework could be extended to track how opinion alignment shifts when models are trained on more geographically balanced data.
  • Developers might need separate evaluation tracks for factual questions versus value-laden questions when testing global fairness.

Load-bearing premise

The chosen survey answers serve as a fair and unbiased record of what each country's population actually thinks.

What would settle it

A replication in which the same model produces answers whose similarity scores are statistically indistinguishable across all countries in the dataset.

read the original abstract

Large language models (LLMs) may not equitably represent diverse global perspectives on societal issues. In this paper, we develop a quantitative framework to evaluate whose opinions model-generated responses are more similar to. We first build a dataset, GlobalOpinionQA, comprised of questions and answers from cross-national surveys designed to capture diverse opinions on global issues across different countries. Next, we define a metric that quantifies the similarity between LLM-generated survey responses and human responses, conditioned on country. With our framework, we run three experiments on an LLM trained to be helpful, honest, and harmless with Constitutional AI. By default, LLM responses tend to be more similar to the opinions of certain populations, such as those from the USA, and some European and South American countries, highlighting the potential for biases. When we prompt the model to consider a particular country's perspective, responses shift to be more similar to the opinions of the prompted populations, but can reflect harmful cultural stereotypes. When we translate GlobalOpinionQA questions to a target language, the model's responses do not necessarily become the most similar to the opinions of speakers of those languages. We release our dataset for others to use and build on. Our data is at https://huggingface.co/datasets/Anthropic/llm_global_opinions. We also provide an interactive visualization at https://llmglobalvalues.anthropic.com.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces GlobalOpinionQA, a dataset compiled from cross-national surveys capturing opinions on global issues across countries, along with a similarity metric to compare LLM-generated responses against human responses conditioned on country. Experiments on a Constitutional AI-trained LLM (helpful, honest, and harmless) show that default outputs align more closely with opinions from the USA and certain European and South American populations; country-specific prompting shifts alignments toward the prompted group but risks cultural stereotypes; and translating questions to a target language does not reliably increase similarity to speakers of that language. The dataset is released publicly with an interactive visualization.

Significance. If the metric and findings are robust, the work supplies a practical, reproducible framework for quantifying cultural and geographic biases in LLM opinion representation, with direct relevance to fairness, alignment, and global equity concerns in NLP. The public release of GlobalOpinionQA and the visualization tool strengthens the contribution by enabling independent verification and extension.

major comments (3)
  1. [Section 2] Dataset construction (Section 2): The paper treats responses from the source cross-national surveys as representative ground truth for each country's population-level opinions without discussing or validating against known sampling issues (non-probability sampling, urban/educated oversampling, or small N in non-Western countries). Because the similarity metric and all three experiments rest on this assumption, the central claim that LLMs over-align with USA/European/South American opinions could instead track survey-participant demographics rather than true national opinions.
  2. [Section 3] Similarity metric definition (Section 3): The exact formula for the similarity metric, the statistical tests used to compare distributions, and any controls for question difficulty or response length are not specified. This under-specification directly affects the reliability and interpretability of the quantitative results reported for the default, prompting, and translation experiments.
  3. [Section 5.2] Prompting experiment (Section 5.2): The claim that country-specific prompting can produce harmful cultural stereotypes is stated but lacks concrete examples, frequency counts, or a systematic analysis of stereotype content, making it difficult to evaluate the practical severity of this side effect.
minor comments (2)
  1. [Abstract] The abstract refers to 'some European and South American countries' without naming them; listing the specific countries with highest alignment would improve precision.
  2. [Section 2] The paper provides dataset and visualization links but omits basic summary statistics (number of questions, countries covered, response distributions per question) that would help readers assess coverage and balance.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. We address each major comment below with point-by-point responses and indicate planned revisions to improve the manuscript's clarity and rigor.

read point-by-point responses
  1. Referee: [Section 2] Dataset construction (Section 2): The paper treats responses from the source cross-national surveys as representative ground truth for each country's population-level opinions without discussing or validating against known sampling issues (non-probability sampling, urban/educated oversampling, or small N in non-Western countries). Because the similarity metric and all three experiments rest on this assumption, the central claim that LLMs over-align with USA/European/South American opinions could instead track survey-participant demographics rather than true national opinions.

    Authors: We acknowledge that the cross-national surveys (e.g., Pew, World Values Survey) underlying GlobalOpinionQA have well-documented sampling limitations, including non-probability methods, urban/educated oversampling, and smaller samples in some non-Western countries. Our framework treats these responses as the best available empirical proxy for reported national opinions rather than claiming they represent 'true' population opinions; the similarity metric is therefore relative to the observed survey distributions. We agree a more explicit discussion is warranted and will revise Section 2 to describe the surveys' sampling characteristics and add a dedicated limitations subsection that caveats all findings accordingly. revision: yes

  2. Referee: [Section 3] Similarity metric definition (Section 3): The exact formula for the similarity metric, the statistical tests used to compare distributions, and any controls for question difficulty or response length are not specified. This under-specification directly affects the reliability and interpretability of the quantitative results reported for the default, prompting, and translation experiments.

    Authors: We apologize for the under-specification. The similarity metric averages a distributional divergence (Jensen-Shannon) between the LLM's response distribution and the human response distribution per question, conditioned on country. Statistical comparisons use permutation tests. No explicit controls for question difficulty or response length appear in the main results, though supplementary checks were performed. In revision we will state the precise formula in Section 3, detail the statistical procedures, and add an appendix with robustness analyses that control for response length and question characteristics. revision: yes

  3. Referee: [Section 5.2] Prompting experiment (Section 5.2): The claim that country-specific prompting can produce harmful cultural stereotypes is stated but lacks concrete examples, frequency counts, or a systematic analysis of stereotype content, making it difficult to evaluate the practical severity of this side effect.

    Authors: We agree that the stereotype observation requires more concrete support. Our experiments identified instances in which country-specific prompts elicited stereotypical content (e.g., associating particular nationalities with fixed economic or social traits). We will expand Section 5.2 with representative examples, report the approximate frequency of such outputs across the tested prompts based on manual inspection, and include a short qualitative categorization of the stereotype types observed. revision: yes

Circularity Check

0 steps flagged

No circularity; framework uses external survey data as benchmark

full rationale

The paper constructs GlobalOpinionQA from independent cross-national surveys and defines a similarity metric directly against those human responses conditioned on country. No parameters are fitted to the target similarity values, no self-citation chain justifies the core measurement, and the derivation does not reduce to self-definition or renaming. The central claims rest on external benchmarks rather than internal fits or author-specific uniqueness theorems.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework treats existing cross-national survey responses as ground-truth distributions of opinion per country; this is a domain assumption rather than a derived quantity.

axioms (1)
  • domain assumption Responses collected in cross-national surveys accurately reflect the distribution of opinions within each country's population.
    The similarity metric conditions directly on these survey answers as the reference distribution for each country.

pith-pipeline@v0.9.0 · 5608 in / 1251 out tokens · 24078 ms · 2026-05-16T07:37:44.317773+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 20 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. LLM-Based Persuasion Enables Guardrail Override in Frontier LLMs

    cs.CL 2026-05 conditional novelty 7.0

    LLM attackers persuade frontier LLMs to generate prohibited essays on consensus topics through multi-turn natural-language pressure, with success rates up to 100% in some model-topic pairs.

  2. StereoTales: A Multilingual Framework for Open-Ended Stereotype Discovery in LLMs

    cs.CY 2026-05 accept novelty 7.0

    StereoTales shows that all tested LLMs emit harmful stereotypes in open-ended stories, with associations adapting to prompt language and targeting locally salient groups rather than transferring uniformly across languages.

  3. StereoTales: A Multilingual Framework for Open-Ended Stereotype Discovery in LLMs

    cs.CY 2026-05 unverdicted novelty 7.0

    StereoTales shows that LLMs produce harmful, culturally adapted stereotypes in open-ended multilingual stories, with patterns consistent across providers and aligned human-LLM harm judgments.

  4. Cross-Family Universality of Behavioral Axes via Anchor-Projected Representations

    cs.AI 2026-05 unverdicted novelty 7.0

    Behavioral directions from one LLM family transfer to others via projection into a shared anchor coordinate space, yielding 0.83 ten-way detection accuracy and steering effects up to 0.46% on held-out models.

  5. XL-SafetyBench: A Country-Grounded Cross-Cultural Benchmark for LLM Safety and Cultural Sensitivity

    cs.CL 2026-05 unverdicted novelty 7.0

    XL-SafetyBench is a new cross-cultural benchmark showing frontier LLMs decouple jailbreak robustness from cultural sensitivity while local models trade off attack success against neutral-safe rates in a near-linear pa...

  6. Large Language Models Exhibit Normative Conformity

    cs.AI 2026-04 unverdicted novelty 7.0

    Large language models exhibit normative conformity in addition to informational conformity, and subtle social context can direct which group they conform to.

  7. C-Mining: Unsupervised Discovery of Seeds for Cultural Data Synthesis via Geometric Misalignment

    cs.CL 2026-04 unverdicted novelty 7.0

    C-Mining automatically mines high-fidelity Culture Points from raw multilingual text by treating cross-lingual geometric isolation in embeddings as a quantifiable signal for cultural specificity, then uses them to syn...

  8. Overtrained, Not Misaligned

    cs.LG 2026-05 unverdicted novelty 6.0

    Emergent misalignment arises from overtraining after primary task convergence and is preventable by early stopping, which retains 93% of task performance on average.

  9. Training-Free Cultural Alignment of Large Language Models via Persona Disagreement

    cs.CL 2026-05 unverdicted novelty 6.0

    DISCA uses disagreement among WVS-grounded persona panels to apply loss-averse logit corrections that reduce cultural misalignment by 10-24% on MultiTP for models 3.8B and larger, without weight changes.

  10. Positive Alignment: Artificial Intelligence for Human Flourishing

    cs.AI 2026-05 unverdicted novelty 6.0

    Positive Alignment introduces AI systems that support human flourishing pluralistically and proactively while remaining safe, as a necessary complement to traditional safety-focused alignment research.

  11. Pseudo-Deliberation in Language Models: When Reasoning Fails to Align Values and Actions

    cs.CL 2026-05 unverdicted novelty 6.0

    LLMs exhibit pseudo-deliberation, with consistent value-action misalignment in generated dialogues despite reasoning, as measured by the new VALDI framework across 4941 scenarios.

  12. The Collapse of Heterogeneity in Silicon Philosophers

    cs.CY 2026-04 unverdicted novelty 6.0

    Large language models collapse philosophical heterogeneity by over-correlating judgments across domains, creating artificial consensus unlike the views of 277 professional philosophers.

  13. Measuring Opinion Bias and Sycophancy via LLM-based Persuasion

    cs.CL 2026-04 unverdicted novelty 6.0

    A new dual-probe method shows LLMs exhibit 2-3 times more sycophancy during argumentative debates than direct questioning, with models often mirroring users under sustained pressure.

  14. Whose Story Gets Told? Positionality and Bias in LLM Summaries of Life Narratives

    cs.CL 2026-04 unverdicted novelty 6.0

    A proposed pipeline shows LLMs introduce detectable race and gender biases when summarizing life narratives, creating potential for representational harm in research.

  15. Cultural Authenticity: Comparing LLM Cultural Representations to Native Human Expectations

    cs.CL 2026-04 unverdicted novelty 6.0

    LLMs display Western-centric cultural representations that align poorly with native priorities in non-Western countries and share highly correlated error patterns.

  16. Representational Harms in LLM-Generated Narratives Against Global Majority Nationalities

    cs.CL 2026-04 unverdicted novelty 5.0

    LLMs generate narratives containing persistent stereotypes, erasure, and one-dimensional portrayals of Global Majority national identities, with minoritized groups overrepresented in subordinated roles by more than fi...

  17. When Bigger Isn't Better: A Comprehensive Fairness Evaluation of Political Bias in Multi-News Summarisation

    cs.CL 2026-04 unverdicted novelty 5.0

    Mid-sized LLMs outperform larger models on fairness in multi-document news summarization, with entity sentiment bias proving hardest to mitigate across prompt and judge-based interventions.

  18. Normative Common Ground Replication (NormCoRe): Replication-by-Translation for Studying Norms in Multi-Agent AI

    cs.AI 2026-03 conditional novelty 5.0

    NormCoRe is a replication-by-translation framework that maps human subject studies onto multi-agent AI environments, showing AI normative judgments on fairness differ from human baselines and vary with model choice an...

  19. Positive Alignment: Artificial Intelligence for Human Flourishing

    cs.AI 2026-05 unverdicted novelty 4.0

    Positive Alignment is introduced as a distinct AI agenda that supports human flourishing through pluralistic and context-sensitive design, complementing traditional safety-focused alignment.

  20. SemEval-2026 Task 7: Everyday Knowledge Across Diverse Languages and Cultures

    cs.CL 2026-05 unverdicted novelty 4.0

    SemEval-2026 Task 7 presents a benchmark and two evaluation tracks for assessing LLMs on everyday knowledge in diverse languages and cultures without allowing training on the test data.

Reference graph

Works this paper leans on

97 extracted references · 97 canonical work pages · cited by 18 Pith papers · 7 internal anchors

  1. [1]

    Persistent anti-muslim bias in large language models

    Abubakar Abid, Maheen Farooqi, and James Zou. Persistent anti-muslim bias in large language models. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, AIES ’21, page 298–306, New York, NY , USA, 2021. Association for Comput- ing Machinery. ISBN 9781450384735. doi: 10.1145/3461702.3462624. URL https: //doi.org/10.1145/3461702.3462624

  2. [2]

    Subjective natural language problems: Motivations, applications, characterizations, and implications

    Cecilia Ovesdotter Alm. Subjective natural language problems: Motivations, applications, characterizations, and implications. In Proceedings of the 49th Annual Meeting of the Associa- tion for Computational Linguistics: Human Language Technologies: Short Papers - Volume 2, HLT ’11, page 107–112, USA, 2011. Association for Computational Linguistics. ISBN 9...

  3. [3]

    Probing pre-trained language models for cross-cultural differences in values

    Arnav Arora, Lucie-aimée Kaffee, and Isabelle Augenstein. Probing pre-trained language models for cross-cultural differences in values. In Proceedings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP), pages 114–130, Dubrovnik, Croatia, May

  4. [4]

    URL https://aclanthology.org/2023

    Association for Computational Linguistics. URL https://aclanthology.org/2023. c3nlp-1.12

  5. [5]

    A general language assistant as a laboratory for alignment, 2021

    Amanda Askell, Yuntao Bai, Anna Chen, Dawn Drain, Deep Ganguli, Tom Henighan, Andy Jones, Nicholas Joseph, Ben Mann, Nova DasSarma, Nelson Elhage, Zac Hatfield-Dodds, Danny Hernandez, Jackson Kernion, Kamal Ndousse, Catherine Olsson, Dario Amodei, Tom Brown, Jack Clark, Sam McCandlish, Chris Olah, and Jared Kaplan. A general language assistant as a labora...

  6. [6]

    Yuntao Bai, Andy Jones, Kamal Ndousse, Amanda Askell, Anna Chen, Nova DasSarma, Dawn Drain, Stanislav Fort, Deep Ganguli, T. J. Henighan, Nicholas Joseph, Saurav Kadavath, John Kernion, Tom Conerly, Sheer El-Showk, Nelson Elhage, Zac Hatfield-Dodds, Danny Hernandez, Tristan Hume, Scott Johnston, Shauna Kravec, Liane Lovitt, Neel Nanda, Catherine Olsson, D...

  7. [7]

    Bowman, Zac Hatfield-Dodds, Ben Mann, Dario Amodei, Nicholas Joseph, Sam McCandlish, Tom Brown, and Jared Kaplan

    Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, Carol Chen, Catherine Olsson, Christopher Olah, Danny Hernandez, Dawn Drain, Deep Ganguli, Dustin Li, Eli Tran-Johnson, Ethan Perez, Jamie Kerr, Jared Mueller, Jeffrey Ladish, Joshua Landau, Kamal Ndousse, K...

  8. [8]

    Solon Barocas and Andrew D. Selbst. Big data’s disparate impact. California Law Review, 104: 671, 2016

  9. [9]

    and Gebru, Timnit and McMillan-Major, Angelina and Shmitchell, Shmargaret , title =

    Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’21, page 610–623, 11 New York, NY , USA, 2021. Association for Computing Machinery. ISBN 9781450383097. doi:...

  10. [10]

    Berinsky

    Adam J. Berinsky. Measuring public opinion with surveys. Annual Review of Political Science, 20(1):309–329, 2017. doi: 10.1146/annurev-polisci-101513-113724. URL https://doi. org/10.1146/annurev-polisci-101513-113724

  11. [11]

    Language (technol- ogy) is power: A critical survey of “bias” in NLP

    Su Lin Blodgett, Solon Barocas, Hal Daumé III, and Hanna Wallach. Language (technol- ogy) is power: A critical survey of “bias” in NLP. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , pages 5454–5476, Online, July

  12. [12]

    doi: 10.18653/v1/2020.acl-main.485

    Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.485. URL https://aclanthology.org/2020.acl-main.485

  13. [13]

    Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S

    Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, S. Buch, Dallas Card, Rodrigo Castellon, Niladri S. Chatterji, Annie S. Chen, Kathleen A. Creel, Jared Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano...

  14. [14]

    Creel, Ananya Kumar, Dan Jurafsky, and Percy S Liang

    Rishi Bommasani, Kathleen A. Creel, Ananya Kumar, Dan Jurafsky, and Percy S Liang. Picking on the same person: Does algorithmic monoculture lead to outcome homogenization? In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 3663–3678. Curran Associates, Inc., 20...

  15. [15]

    Shikha Bordia and Samuel R. Bowman. Identifying and reducing gender bias in word-level language models. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop , pages 7–15, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi: 10.18653/ v1...

  16. [16]

    Language models are few-shot learners

    Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhari- wal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agar- wal, Ariel Herbert-V oss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Ma- teusz Litwin, S...

  17. [17]

    Identity and interaction: a sociocultural linguistic approach

    Mary Bucholtz and Kira Hall. Identity and interaction: a sociocultural linguistic approach. Discourse Studies, 7(4-5):585–614, 2005. doi: 10.1177/1461445605054407. URL https: //doi.org/10.1177/1461445605054407. 12

  18. [18]

    The whiteness of ai

    Stephen Cave and Kanta Dihal. The whiteness of ai. Philosophy & Technology, 33:1–19, 12

  19. [19]

    doi: 10.1007/s13347-020-00415-6

  20. [20]

    Marked personas: Using natural language prompts to measure stereotypes in language models, 2023

    Myra Cheng, Esin Durmus, and Dan Jurafsky. Marked personas: Using natural language prompts to measure stereotypes in language models, 2023

  21. [21]

    Deep reinforcement learning from human preferences

    Paul F Christiano, Jan Leike, Tom Brown, Miljan Martic, Shane Legg, and Dario Amodei. Deep reinforcement learning from human preferences. In I. Guyon, U. V on Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, edi- tors, Advances in Neural Information Processing Systems , volume 30. Curran Associates, Inc., 2017. URL https://proce...

  22. [22]

    No Language Left Behind: Scaling Human-Centered Machine Translation

    Marta R Costa-jussà, James Cross, Onur Çelebi, Maha Elbayad, Kenneth Heafield, Kevin Heffernan, Elahe Kalbassi, Janice Lam, Daniel Licht, Jean Maillard, et al. No language left behind: Scaling human-centered machine translation. arXiv preprint arXiv:2207.04672, 2022

  23. [23]

    ISBN 9781713829546

    Aida Mostafazadeh Davani, Mark Díaz, and Vinodkumar Prabhakaran. Dealing with dis- agreements: Looking beyond the majority vote in subjective annotations. Transactions of the Association for Computational Linguistics , 10:92–110, 2022. doi: 10.1162/tacl\_a\_00449. URL https://aclanthology.org/2022.tacl-1.6

  24. [24]

    Western civilization, our tradition, Nov 2020

    Fad-Admin. Western civilization, our tradition, Nov 2020. URL https://isi.org/ intercollegiate-review/western-civilization-our-tradition/

  25. [25]

    Measuring diversity of artificial intelligence conferences

    Ana Freire, Lorenzo Porcaro, and Emilia Gómez. Measuring diversity of artificial intelligence conferences. In Deepti Lamba and William H. Hsu, editors, Proceedings of 2nd Workshop on Diversity in Artificial Intelligence (AIDBEI), volume 142 of Proceedings of Machine Learning Research, pages 39–50. PMLR, 09 Feb 2021. URL https://proceedings.mlr.press/ v142...

  26. [26]

    Artificial intelligence, values, and alignment

    Iason Gabriel. Artificial intelligence, values, and alignment. Minds and Machines , 30(3): 411–437, sep 2020. doi: 10.1007/s11023-020-09539-2. URL https://doi.org/10.1007% 2Fs11023-020-09539-2

  27. [27]

    The challenge of value alignment: from fairer algorithms to AI safety

    Iason Gabriel and Vafa Ghazavi. The challenge of value alignment: from fairer algorithms to AI safety. CoRR, abs/2101.06060, 2021. URL https://arxiv.org/abs/2101.06060

  28. [28]

    Predictability and surprise in large generative models

    Deep Ganguli, Danny Hernandez, Liane Lovitt, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, Nova Dassarma, Dawn Drain, Nelson Elhage, Sheer El Showk, Stanislav Fort, Zac Hatfield-Dodds, Tom Henighan, Scott Johnston, Andy Jones, Nicholas Joseph, Jackson Kernian, Shauna Kravec, Ben Mann, Neel Nanda, Kamal Ndousse, Catherine Olsson, Daniela Amodei, Tom B...

  29. [29]

    Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

    Deep Ganguli, Liane Lovitt, Jackson Kernion, Amanda Askell, Yuntao Bai, Saurav Kadavath, Ben Mann, Ethan Perez, Nicholas Schiefer, Kamal Ndousse, et al. Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned. arXiv preprint arXiv:2209.07858, 2022

  30. [30]

    Deep Ganguli, Amanda Askell, Nicholas Schiefer, Thomas I. Liao, Kamil˙e Lukoši¯ut˙e, Anna Chen, Anna Goldie, Azalia Mirhoseini, Catherine Olsson, Danny Hernandez, Dawn Drain, Dustin Li, Eli Tran-Johnson, Ethan Perez, Jackson Kernion, Jamie Kerr, Jared Mueller, Joshua Landau, Kamal Ndousse, Karina Nguyen, Liane Lovitt, Michael Sellitto, Nelson Elhage, Noem...

  31. [31]

    Samuel Gehman, Suchin Gururangan, Maarten Sap, Yejin Choi, and Noah A. Smith. Real- ToxicityPrompts: Evaluating neural toxic degeneration in language models. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 3356–3369, Online, November

  32. [32]

    doi: 10.18653/v1/2020.findings-emnlp.301

    Association for Computational Linguistics. doi: 10.18653/v1/2020.findings-emnlp.301. URL https://aclanthology.org/2020.findings-emnlp.301. 13

  33. [33]

    Improving alignment of dialogue agents via targeted human judgements, 2022

    Amelia Glaese, Nat McAleese, Maja Tr˛ ebacz, John Aslanides, Vlad Firoiu, Timo Ewalds, Mari- beth Rauh, Laura Weidinger, Martin Chadwick, Phoebe Thacker, Lucy Campbell-Gillingham, Jonathan Uesato, Po-Sen Huang, Ramona Comanescu, Fan Yang, Abigail See, Sumanth Dathathri, Rory Greig, Charlie Chen, Doug Fritz, Jaume Sanchez Elias, Richard Green, Soˇna Mokrá,...

  34. [34]

    Gordon, Kaitlyn Zhou, Kayur Patel, Tatsunori Hashimoto, and Michael S

    Mitchell L. Gordon, Kaitlyn Zhou, Kayur Patel, Tatsunori Hashimoto, and Michael S. Bernstein. The disagreement deconvolution: Bringing machine learning performance metrics in line with re- ality. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, CHI ’21, New York, NY , USA, 2021. Association for Computing Machinery. ISBN 978...

  35. [35]

    Gordon, Michelle S

    Mitchell L. Gordon, Michelle S. Lam, Joon Sung Park, Kayur Patel, Jeff Hancock, Tatsunori Hashimoto, and Michael S. Bernstein. Jury learning: Integrating dissenting voices into machine learning models. In CHI Conference on Human Factors in Computing Systems. ACM, apr 2022. doi: 10.1145/3491102.3502004. URL https://doi.org/10.1145%2F3491102.3502004

  36. [36]

    Kivlichan, Rachel Rosen, and Lucy Vasserman

    Nitesh Goyal, Ian D. Kivlichan, Rachel Rosen, and Lucy Vasserman. Is your toxicity my toxicity? exploring the impact of rater identity on toxicity annotation. Proceedings of the ACM on Human-Computer Interaction, 6:1–28, 2022

  37. [37]

    World values survey: Round seven – country-pooled datafile version 5.0.0, 2022

    Christian Haerpfer, Ronald Inglehart, Alejandro Moreno, Christian Welzel, Kseniya Kizilova, Jaime Diez-Medrano, Milena Lagos, Pippa Norris, Eduard Ponarin, and Bianca Puranen. World values survey: Round seven – country-pooled datafile version 5.0.0, 2022

  38. [38]

    The political ideology of conver- sational ai: Converging evidence on chatgpt’s pro-environmental, left-libertarian orientation, 2023

    Jochen Hartmann, Jasper Schwenzow, and Maximilian Witte. The political ideology of conver- sational ai: Converging evidence on chatgpt’s pro-environmental, left-libertarian orientation, 2023

  39. [39]

    ToxiGen: A large-scale machine-generated dataset for adversarial and implicit hate speech detection

    Thomas Hartvigsen, Saadia Gabriel, Hamid Palangi, Maarten Sap, Dipankar Ray, and Ece Kamar. ToxiGen: A large-scale machine-generated dataset for adversarial and implicit hate speech detection. In Proceedings of the 60th Annual Meeting of the Association for Com- putational Linguistics (Volume 1: Long Papers) , pages 3309–3326, Dublin, Ireland, May

  40. [40]

    doi: 10.18653/v1/2022.acl-long.234

    Association for Computational Linguistics. doi: 10.18653/v1/2022.acl-long.234. URL https://aclanthology.org/2022.acl-long.234

  41. [41]

    Aligning {ai} with shared human values

    Dan Hendrycks, Collin Burns, Steven Basart, Andrew Critch, Jerry Li, Dawn Song, and Jacob Steinhardt. Aligning {ai} with shared human values. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=dNy_RKzJacY

  42. [42]

    Heine, and Ara Norenzayan

    Joseph Henrich, Steven J. Heine, and Ara Norenzayan. The weirdest people in the world? Behavioral and Brain Sciences , 33(2-3):61–83, June 2010. ISSN 1469-1825. URL http: //journals.cambridge.org/abstract_S0140525X0999152X

  43. [43]

    Izacard, E

    Dirk Hovy and Diyi Yang. The importance of modeling social factors of language: Theory and practice. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 588– 602, Online, June 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021. naacl-ma...

  44. [44]

    URL https: //aclanthology.org/2023.rocling-1.1/

    Ben Hutchinson, Vinodkumar Prabhakaran, Emily Denton, Kellie Webster, Yu Zhong, and Stephen Denuyl. Social biases in NLP models as barriers for persons with disabilities. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5491–5501, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v...

  45. [45]

    Co-writing with opinionated language models affects users’ views

    Maurice Jakesch, Advait Bhat, Daniel Buschek, Lior Zalmanson, and Mor Naaman. Co-writing with opinionated language models affects users’ views. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, CHI ’23, New York, NY , USA, 2023. Association for Computing Machinery. ISBN 9781450394215. doi: 10.1145/3544548.3581196. URL https:...

  46. [46]

    CommunityLM: Probing partisan worldviews from language models

    Hang Jiang, Doug Beeferman, Brandon Roy, and Deb Roy. CommunityLM: Probing partisan worldviews from language models. In Proceedings of the 29th International Conference on Computational Linguistics, pages 6818–6826, Gyeongju, Republic of Korea, October 2022. 14 International Committee on Computational Linguistics. URL https://aclanthology.org/ 2022.coling-1.593

  47. [47]

    The ghost in the machine has an american accent: value conflict in gpt-3, 2022

    Rebecca L Johnson, Giada Pistilli, Natalia Menédez-González, Leslye Denisse Dias Duran, Enrico Panai, Julija Kalpokiene, and Donald Jay Bertulfo. The ghost in the machine has an american accent: value conflict in gpt-3, 2022

  48. [48]

    The state and fate of linguistic diversity and inclusion in the NLP world

    Pratik Joshi, Sebastin Santy, Amar Budhiraja, Kalika Bali, and Monojit Choudhury. The state and fate of linguistic diversity and inclusion in the NLP world. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 6282–6293, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-mai...

  49. [49]

    Language models (mostly) know what they know, 2022

    Saurav Kadavath, Tom Conerly, Amanda Askell, Tom Henighan, Dawn Drain, Ethan Perez, Nicholas Schiefer, Zac Hatfield-Dodds, Nova DasSarma, Eli Tran-Johnson, Scott Johnston, Sheer El-Showk, Andy Jones, Nelson Elhage, Tristan Hume, Anna Chen, Yuntao Bai, Sam Bowman, Stanislav Fort, Deep Ganguli, Danny Hernandez, Josh Jacobson, Jackson Kernion, Shauna Kravec,...

  50. [50]

    Don’t ask if artificial intelligence is good or fair, ask how it shifts power

    Pratyusha Kalluri. Don’t ask if artificial intelligence is good or fair, ask how it shifts power. Nature, 583:169, 2020

  51. [51]

    Estimating the Personality of White-Box Language Models

    Saketh Reddy Karra, Son The Nguyen, and Theja Tulabandhula. Estimating the Personality of White-Box Language Models. arXiv e-prints, art. arXiv:2204.12000, April 2022. doi: 10.48550/arXiv.2204.12000

  52. [52]

    In conversation with artificial intelligence: aligning language models with human values

    Atoosa Kasirzadeh and Iason Gabriel. In conversation with artificial intelligence: aligning language models with human values. Philosophy & Technology, 36(2):1–24, 2023

  53. [53]

    When do pre-training biases propagate to downstream tasks? a case study in text summarization

    Faisal Ladhak, Esin Durmus, Mirac Suzgun, Tianyi Zhang, Dan Jurafsky, Kathleen McKeown, and Tatsunori Hashimoto. When do pre-training biases propagate to downstream tasks? a case study in text summarization. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 3206–3219, Dubrovnik, Croatia, May

  54. [54]

    URL https://aclanthology.org/2023

    Association for Computational Linguistics. URL https://aclanthology.org/2023. eacl-main.234

  55. [55]

    Evaluating the output of machine translation systems

    Alon Lavie. Evaluating the output of machine translation systems. In Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Tutorials, 2010

  56. [56]

    Towards understanding and mitigating social biases in language models

    Paul Pu Liang, Chiyu Wu, Louis-Philippe Morency, and Ruslan Salakhutdinov. Towards understanding and mitigating social biases in language models. In International Conference on Machine Learning, pages 6565–6576. PMLR, 2021

  57. [57]

    Manning, Christopher Ré, Diana Acosta-Navas, Drew A

    Percy Liang, Rishi Bommasani, Tony Lee, Dimitris Tsipras, Dilara Soylu, Michihiro Yasunaga, Yian Zhang, Deepak Narayanan, Yuhuai Wu, Ananya Kumar, Benjamin Newman, Binhang Yuan, Bobby Yan, Ce Zhang, Christian Cosgrove, Christopher D. Manning, Christopher Ré, Diana Acosta-Navas, Drew A. Hudson, Eric Zelikman, Esin Durmus, Faisal Ladhak, Frieda Rong, Hongyu...

  58. [58]

    URL https://aclanthology.org/2022.acl-long.556

    Yao Lu, Max Bartolo, Alastair Moore, Sebastian Riedel, and Pontus Stenetorp. Fantasti- cally ordered prompts and where to find them: Overcoming few-shot prompt order sensi- tivity. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages 8086–8098, Dublin, Ireland, May 2022. Asso- ciation f...

  59. [59]

    Gender and representation bias in GPT-3 generated stories

    Li Lucy and David Bamman. Gender and representation bias in GPT-3 generated stories. In Proceedings of the Third Workshop on Narrative Understanding, pages 48–55, Virtual, June

  60. [60]

    doi: 10.18653/v1/2021.nuse-1.5

    Association for Computational Linguistics. doi: 10.18653/v1/2021.nuse-1.5. URL https://aclanthology.org/2021.nuse-1.5. 15

  61. [61]

    McConnell-Ginet

    S. McConnell-Ginet. Words Matter: Meaning and Power. Cambridge University Press, 2020. ISBN 9781108427210. URL https://books.google.com/books?id=gKVTzQEACAAJ

  62. [62]

    StereoSet: Measuring stereotypical bias in pretrained language models

    Moin Nadeem, Anna Bethke, and Siva Reddy. StereoSet: Measuring stereotypical bias in pretrained language models. In Proceedings of the 59th Annual Meeting of the Asso- ciation for Computational Linguistics and the 11th International Joint Conference on Nat- ural Language Processing (Volume 1: Long Papers) , pages 5356–5371, Online, August

  63. [63]

    doi: 10.18653/v1/2021.acl-long.416

    Association for Computational Linguistics. doi: 10.18653/v1/2021.acl-long.416. URL https://aclanthology.org/2021.acl-long.416

  64. [64]

    Nationality bias in text generation

    Pranav Narayanan Venkit, Sanjana Gautam, Ruchi Panchanadikar, Ting-Hao Huang, and Shomir Wilson. Nationality bias in text generation. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics , pages 116–122, Dubrovnik, Croatia, May 2023. Association for Computational Linguistics. URL https: //aclanthol...

  65. [65]

    Bernstein

    Joon Sung Park, Lindsay Popowski, Carrie Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. Social simulacra: Creating populated prototypes for social computing systems. In Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology, UIST ’22, New York, NY , USA, 2022. Association for Computing Machinery. ISBN 9...

  66. [66]

    Bender, Emily Denton, and Alex Hanna

    Amandalynne Paullada, Inioluwa Deborah Raji, Emily M. Bender, Emily Denton, and Alex Hanna. Data and its (dis)contents: A survey of dataset development and use in machine learning research. Patterns, 2(11):100336, 2021. ISSN 2666-3899. doi: https://doi.org/10.1016/ j.patter.2021.100336. URL https://www.sciencedirect.com/science/article/pii/ S2666389921001847

  67. [67]

    Red Teaming Language Models with Language Models

    Ethan Perez, Saffron Huang, H. Francis Song, Trevor Cai, Roman Ring, John Aslanides, Amelia Glaese, Nat McAleese, and Geoffrey Irving. Red teaming language models with language models. CoRR, abs/2202.03286, 2022. URL https://arxiv.org/abs/2202.03286

  68. [68]

    Bowman, Amanda Askell, Roger Grosse, Danny Hernandez, Deep Ganguli, Evan Hubinger, Nicholas Schiefer, and Jared Kaplan

    Ethan Perez, Sam Ringer, Kamil˙e Lukoši¯ut˙e, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, Andy Jones, Anna Chen, Ben Mann, Brian Israel, Bryan Seethor, Cameron McKinnon, Christopher Olah, Da Yan, Daniela Amodei, Dario Amodei, Dawn Drain, Dustin Li, Eli Tran-Johnson, Guro Khundadze, Jackson Kern...

  69. [69]

    A human rights-based approach to responsible ai, 2022

    Vinodkumar Prabhakaran, Margaret Mitchell, Timnit Gebru, and Iason Gabriel. A human rights-based approach to responsible ai, 2022

  70. [70]

    Jack W. Rae, Sebastian Borgeaud, Trevor Cai, Katie Millican, Jordan Hoffmann, Francis Song, John Aslanides, Sarah Henderson, Roman Ring, Susannah Young, Eliza Rutherford, Tom Hennigan, Jacob Menick, Albin Cassirer, Richard Powell, George van den Driessche, Lisa Anne Hendricks, Maribeth Rauh, Po-Sen Huang, Amelia Glaese, Johannes Welbl, Sumanth Dathathri, ...

  71. [71]

    Bender, Alex Hanna, and Amandalynne Paullada

    Deborah Raji, Emily Denton, Emily M. Bender, Alex Hanna, and Amandalynne Paullada. Ai and the everything in the whole wide world benchmark. In J. Van- schoren and S. Yeung, editors, Proceedings of the Neural Information Process- ing Systems Track on Datasets and Benchmarks , volume 1. Curran, 2021. URL https://datasets-benchmarks-proceedings.neurips.cc/pa...

  72. [72]

    Characteristics of harmful text: Towards rigorous benchmarking of language models, 2022

    Maribeth Rauh, John Mellor, Jonathan Uesato, Po-Sen Huang, Johannes Welbl, Laura Wei- dinger, Sumanth Dathathri, Amelia Glaese, Geoffrey Irving, Iason Gabriel, William Isaac, and Lisa Anne Hendricks. Characteristics of harmful text: Towards rigorous benchmarking of language models, 2022

  73. [73]

    Why You Should Do NLP Beyond English

    Sebastian Ruder. Why You Should Do NLP Beyond English. http://ruder.io/ nlp-beyond-english, 2020

  74. [74]

    Re-imagining algorithmic fairness in india and beyond

    Nithya Sambasivan, Erin Arnesen, Ben Hutchinson, Tulsee Doshi, and Vinodkumar Prabhakaran. Re-imagining algorithmic fairness in india and beyond. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’21, page 315–328, New York, NY , USA, 2021. Association for Computing Machinery. ISBN 9781450383097. doi: 10.1145/3...

  75. [75]

    Whose opinions do language models reflect?, 2023

    Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. Whose opinions do language models reflect?, 2023

  76. [76]

    S2ORC: The Semantic Scholar Open Research Corpus

    Maarten Sap, Saadia Gabriel, Lianhui Qin, Dan Jurafsky, Noah A. Smith, and Yejin Choi. Social bias frames: Reasoning about social and power implications of language. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5477–5490, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/202...

  77. [77]

    URL https://aclanthology.org/2020.acl-main.486

  78. [78]

    Maarten Sap, Swabha Swayamdipta, Laura Vianna, Xuhui Zhou, Yejin Choi, and Noah A. Smith. Annotators with attitudes: How annotator beliefs and identities bias toxic language detection. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , pages 5884–5906, Seattle...

  79. [79]

    Selbst, Danah Boyd, Sorelle A

    Andrew D. Selbst, Danah Boyd, Sorelle A. Friedler, Suresh Venkatasubramanian, and Janet Vertesi. Fairness and abstraction in sociotechnical systems. In Proceedings of the Conference on Fairness, Accountability, and Transparency, FAT*’19, page 59–68, New York, NY , USA, 2019. Association for Computing Machinery. ISBN 9781450361255. doi: 10.1145/3287560.328...

  80. [80]

    The woman worked as a babysitter: On biases in language generation

    Emily Sheng, Kai-Wei Chang, Premkumar Natarajan, and Nanyun Peng. The woman worked as a babysitter: On biases in language generation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3407–3412, Hong Kong, China, November...

Showing first 80 references.