pith. machine review for the scientific record. sign in

arxiv: 2604.24429 · v1 · submitted 2026-04-27 · 💻 cs.CL

Recognition: unknown

A Multi-Dimensional Audit of Politically Aligned Large Language Models

Authors on Pith no claims yet

Pith reviewed 2026-05-08 03:34 UTC · model grok-4.3

classification 💻 cs.CL
keywords LLM alignmentpolitical biasfairness in AItruthfulnessfine-tuningrole-playing promptsmodel auditing
0
0 comments X

The pith

Politically aligned LLMs trade fairness for effectiveness and truthfulness as model size grows.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a quantitative audit framework with four dimensions to assess how well LLMs can be steered toward specific political positions without unwanted side effects. When nine common models were tested after alignment by either fine-tuning or role-playing prompts, larger models proved better at maintaining the assigned ideology and sticking to facts, yet they produced more angry and toxic language toward opposing views. Fine-tuned alignments reduced bias and improved adherence compared with simple prompting, but they lowered performance on reasoning tasks and raised the rate of fabricated statements. Every model fell short on at least one dimension, pointing to the difficulty of achieving balanced political alignment.

Core claim

Applying the four-dimensional audit to nine LLMs aligned via fine-tuning or role-playing shows that larger models achieve greater effectiveness in role-playing political ideologies and higher truthfulness, yet they exhibit reduced fairness through increased bias expressed as angry and toxic language toward differing ideologies. Fine-tuned models deliver lower bias and stronger alignment than role-playing approaches but suffer declines in reasoning performance and rises in hallucinations. All tested models display deficiencies in at least one of the four metrics.

What carries the argument

A multi-dimensional audit framework using four dimensions-effectiveness, fairness, truthfulness, and persuasiveness-derived from Habermas' Theory of Communicative Action and measured with automated quantitative metrics.

If this is right

  • Larger models are more effective at role-playing political ideologies and more truthful in responses but less fair.
  • Fine-tuned models show lower bias and more effective alignment than role-playing models.
  • Fine-tuned models experience reduced performance on reasoning tasks and increased hallucinations.
  • All tested models exhibit deficiencies in at least one of the four audit metrics.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same audit approach could be applied to alignment on other contested topics such as public health or climate policy to check for similar trade-offs.
  • Developers may need training objectives that jointly optimize the four dimensions rather than focusing on one at a time.
  • Public benchmarks using this framework could inform standards for deploying politically aligned models in campaigns or media.

Load-bearing premise

The four dimensions drawn from Habermas' Theory of Communicative Action form a valid and comprehensive basis for auditing political alignment through automated metrics.

What would settle it

An experiment in which a larger fine-tuned model scores high on all four dimensions simultaneously without elevated toxicity or hallucinations would contradict the reported trade-offs.

Figures

Figures reproduced from arXiv: 2604.24429 by Lisa Korver, Mohamed Mostagir, Sherief Reda.

Figure 1
Figure 1. Figure 1: Mapping of the audit dimensions to the Habermas’ Theory of Communicative Action. view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the evaluation dimensions, outlining the metrics used for each one. view at source ↗
Figure 3
Figure 3. Figure 3: Comparison of the BERT- and LLM-based classification methods to the survey responses for toxicity scores. view at source ↗
Figure 4
Figure 4. Figure 4: Results from the political alignment evaluation. Negative scores indicate a left-leaning ideology. view at source ↗
Figure 5
Figure 5. Figure 5: Political Alignment results for right- and left-aligned models. Negative scores indicate a left-leaning ideology. view at source ↗
Figure 6
Figure 6. Figure 6: Results from the sentiment analysis evaluations. view at source ↗
Figure 7
Figure 7. Figure 7: Difference in emotion scores given by the LLM-based classification. view at source ↗
Figure 8
Figure 8. Figure 8: Percentage of questions answered correctly by each model for the politically relevant TruthfulQA categories view at source ↗
Figure 9
Figure 9. Figure 9: User survey results In view at source ↗
Figure 10
Figure 10. Figure 10: Survey respondent’s change of opinion on political topics after reading an argument from a model by view at source ↗
Figure 11
Figure 11. Figure 11: Pearson correlation coefficients between various metrics of Effectiveness (E), Fairness (F), Truthfulness (T), view at source ↗
Figure 12
Figure 12. Figure 12: Multi-dimensional audit results for various metrics of Effectiveness (E), Fairness (F), Truthfulness (T), and view at source ↗
read the original abstract

As the application of Large Language Models (LLMs) spreads across various industries, there are increasing concerns about the potential for their misuse, especially in sensitive areas such as political discourse. Deliberately aligning LLMs with specific political ideologies, through prompt engineering or fine-tuning techniques, can be advantageous in use cases such as political campaigns, but requires careful consideration due to heightened risks of performance degradation, misinformation, or increased biased behavior. In this work, we propose a multi-dimensional framework inspired by Habermas' Theory of Communicative Action to audit politically aligned language models across four dimensions: effectiveness, fairness, truthfulness, and persuasiveness using automated, quantitative metrics. Applying this to nine popular LLMs aligned via fine-tuning or role-playing revealed consistent trade-offs: while larger models tend to be more effective at role-playing political ideologies and truthful in their responses, they were also less fair, exhibiting higher levels of bias in the form of angry and toxic language towards people of different ideologies. Fine-tuned models exhibited lower bias and more effective alignment than the corresponding role-playing models, but also saw a decline in performance reasoning tasks and an increase in hallucinations. Overall, all of the models tested exhibited some deficiency in at least one of the four metrics, highlighting the need for more balanced and robust alignment strategies. Ultimately, this work aims to ensure politically-aligned LLMs generate legitimate, harmless arguments, offering a framework to evaluate the responsible political alignment of these models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes a multi-dimensional audit framework for politically aligned LLMs, drawing on Habermas' Theory of Communicative Action to evaluate four dimensions—effectiveness, fairness, truthfulness, and persuasiveness—via automated quantitative metrics. It applies the framework to nine popular LLMs aligned either through fine-tuning or role-playing prompts, reporting consistent trade-offs: larger models are more effective at role-playing ideologies and more truthful but less fair (higher bias via angry/toxic language); fine-tuned models show lower bias and stronger alignment than role-play versions but suffer declines in reasoning performance and increased hallucinations. All tested models exhibit deficiencies in at least one dimension, underscoring the need for balanced alignment strategies.

Significance. If the automated metrics are shown to faithfully operationalize the Habermas-derived dimensions and correlate with human judgments, the work would provide a practical, quantitative tool for auditing political alignment risks in LLMs, highlighting actionable trade-offs between alignment techniques. The multi-model evaluation and explicit focus on both fine-tuning and prompting methods add empirical breadth, though the absence of validation data currently constrains the framework's reliability for policy or deployment decisions.

major comments (3)
  1. [Abstract and §3] Abstract and §3 (Framework): The central claim that the framework reveals reliable trade-offs rests on automated proxies (toxicity/anger detectors for fairness, factuality checks for truthfulness, etc.), yet no validation against human or expert political-science ratings is reported. In contested political discourse these proxies are known to be noisy; without correlation evidence the reported patterns (e.g., larger models more truthful yet more toxic) remain uninterpretable.
  2. [Abstract and §4] Abstract and §4 (Experiments): No details are supplied on the concrete datasets, prompt templates, or statistical methods used to compute the four metrics, nor on how role-playing vs. fine-tuning alignments were implemented across the nine models. This absence prevents assessment of whether the quantitative results actually support the stated trade-offs.
  3. [§5] §5 (Results): The finding that fine-tuned models exhibit lower bias but increased hallucinations and reduced reasoning performance is load-bearing for the paper's policy recommendation, yet it is presented without error bars, significance tests, or controls for model size and base capability, making it impossible to isolate the effect of the alignment method.
minor comments (2)
  1. [Abstract] The abstract and introduction would benefit from a brief table or bullet list explicitly defining each of the four dimensions and the automated metric chosen for it.
  2. [Introduction] Citation of prior work on LLM bias/toxicity benchmarks (e.g., RealToxicityPrompts, TruthfulQA) is missing; adding these would clarify how the new metrics relate to existing ones.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thoughtful and constructive comments. We agree that additional validation, experimental details, and statistical controls will strengthen the manuscript and address concerns about interpretability. We outline our point-by-point responses and planned revisions below.

read point-by-point responses
  1. Referee: [Abstract and §3] Abstract and §3 (Framework): The central claim that the framework reveals reliable trade-offs rests on automated proxies (toxicity/anger detectors for fairness, factuality checks for truthfulness, etc.), yet no validation against human or expert political-science ratings is reported. In contested political discourse these proxies are known to be noisy; without correlation evidence the reported patterns (e.g., larger models more truthful yet more toxic) remain uninterpretable.

    Authors: We agree that human validation is essential to demonstrate that the automated proxies faithfully capture the Habermas-derived dimensions. In the revised manuscript we will add a dedicated subsection to §3 reporting a human evaluation: we will sample 200 model responses across the four dimensions, obtain ratings from three political-science experts per response on 5-point scales for each dimension, and report Pearson/Spearman correlations between these ratings and our automated metrics. This will directly address the concern about proxy noise and support the reported trade-offs. revision: yes

  2. Referee: [Abstract and §4] Abstract and §4 (Experiments): No details are supplied on the concrete datasets, prompt templates, or statistical methods used to compute the four metrics, nor on how role-playing vs. fine-tuning alignments were implemented across the nine models. This absence prevents assessment of whether the quantitative results actually support the stated trade-offs.

    Authors: We acknowledge the current version lacks sufficient methodological transparency. In the revised §4 we will provide: (i) the exact prompt templates and source datasets (including political ideology statements and reasoning/hallucination benchmarks) used for each metric; (ii) full implementation details for both fine-tuning (base models, alignment datasets, training hyperparameters) and role-play prompting (system prompts and few-shot examples); and (iii) the precise aggregation and normalization procedures for each of the four quantitative metrics. These additions will enable full reproducibility and allow readers to evaluate the strength of the observed trade-offs. revision: yes

  3. Referee: [§5] §5 (Results): The finding that fine-tuned models exhibit lower bias but increased hallucinations and reduced reasoning performance is load-bearing for the paper's policy recommendation, yet it is presented without error bars, significance tests, or controls for model size and base capability, making it impossible to isolate the effect of the alignment method.

    Authors: We concur that stronger statistical presentation is required to isolate alignment effects. In the revised §5 we will: add error bars (standard error of the mean) to all bar plots and tables; perform and report paired t-tests (or Wilcoxon tests where normality fails) comparing fine-tuned versus role-play versions within each model family; and introduce model-size controls by reporting separate subgroup analyses for models of comparable scale (e.g., 7B vs. 7B, 70B vs. 70B). These changes will clarify whether the observed declines are attributable to the alignment technique rather than base capability. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical audit applies external framework without self-referential reduction

full rationale

The paper introduces a four-dimensional audit framework drawn from Habermas' Theory of Communicative Action and applies it via automated metrics to nine LLMs under two alignment methods. No equations, fitted parameters, or predictions are defined in terms of the framework's own outputs; the reported trade-offs (larger models more truthful yet less fair; fine-tuning lowers bias but raises hallucinations) are presented as direct empirical results from metric application rather than logical necessities or renamed inputs. The framework is positioned as an independent evaluation tool grounded in external theory, with no load-bearing self-citations or ansatzes that collapse the derivation chain. This is the standard non-circular pattern for an audit study.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Only the abstract was available for review; no explicit free parameters, axioms, or invented entities are detailed beyond the high-level inspiration from Habermas' theory.

axioms (1)
  • domain assumption Habermas' Theory of Communicative Action can be translated into four quantitative metrics (effectiveness, fairness, truthfulness, persuasiveness) suitable for auditing LLM political alignment.
    The framework is explicitly inspired by the theory, but the abstract gives no specifics on the translation or validation of the metrics.

pith-pipeline@v0.9.0 · 5556 in / 1396 out tokens · 57009 ms · 2026-05-08T03:34:01.219405+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

43 extracted references · 3 canonical work pages · 1 internal anchor

  1. [1]

    Navigating llm ethics: Advancements, challenges, and future directions.arXiv preprint arXiv:2406.18841, 2024

    Junfeng Jiao, Saleh Afroogh, Yiming Xu, and Connor Phillips. Navigating llm ethics: Advancements, challenges, and future directions.arXiv preprint arXiv:2406.18841, 2024

  2. [2]

    Political bias in large language models: A comparative analysis of chatgpt-4, perplexity, google gemini, and claude.IEEE Access, 13:11341–11379, 2025

    Tavishi Choudhary. Political bias in large language models: A comparative analysis of chatgpt-4, perplexity, google gemini, and claude.IEEE Access, 13:11341–11379, 2025

  3. [3]

    The political ideology of conversational ai: Converging evidence on chatgpt’s pro-environmental, left-libertarian orientation, 2023

    Jochen Hartmann, Jasper Schwenzow, and Maximilian Witte. The political ideology of conversational ai: Converging evidence on chatgpt’s pro-environmental, left-libertarian orientation, 2023

  4. [4]

    Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. Whose opinions do language models reflect? In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett, editors,Proceedings of the 40th International Conference on Machine Learning, volume 202 ofProceedings of Machi...

  5. [5]

    The political preferences of llms.PLOS ONE, 19(7):1–15, 07 2024

    David Rozado. The political preferences of llms.PLOS ONE, 19(7):1–15, 07 2024

  6. [6]

    Ahmed Agiza, Mohamed Mostagir, and Sherief Reda. Politune: Analyzing the impact of data selection and fine-tuning on economic and political biases in large language models.Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 7(1):2–12, Oct. 2024

  7. [7]

    Ecker, S

    U.K.H. Ecker, S. Lewandowsky, J. Cook, et al. The psychological drivers of misinformation belief and its resistance to correction.Nature Reviews Psychology, 1:13–29, 2022

  8. [8]

    Media manipulation and disinformation online.New York: Data & Society Research Institute, 359:1146–1151, 2017

    Alice Marwick and Rebecca Lewis. Media manipulation and disinformation online.New York: Data & Society Research Institute, 359:1146–1151, 2017

  9. [9]

    Jürgen Habermas.The Theory of Communicative action.Beacon Press, Boston, Mass., 1981

  10. [10]

    A comprehensive survey of bias in llms: Current landscape and future directions, 2024

    Rajesh Ranjan, Shailja Gupta, and Surya Narayan Singh. A comprehensive survey of bias in llms: Current landscape and future directions, 2024

  11. [11]

    Training language models to follow instructions with human feedback

    Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul F Christiano, Jan Leike, and Ryan Lowe. Training language models to follow instructions with human feedbac...

  12. [12]

    Direct preference optimization: Your language model is secretly a reward model

    Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D Manning, Stefano Ermon, and Chelsea Finn. Direct preference optimization: Your language model is secretly a reward model. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors,Advances in Neural Information Processing Systems, volume 36, pages 53728–53741. Curran Associ...

  13. [13]

    A comprehensive survey of llm alignment techniques: Rlhf, rlaif, ppo, dpo and more, 2024

    Zhichao Wang, Bin Bi, Shiva Kumar Pentyala, Kiran Ramnath, Sougata Chaudhuri, Shubham Mehrotra, Zixu, Zhu, Xiang-Bo Mao, Sitaram Asur, Na, and Cheng. A comprehensive survey of llm alignment techniques: Rlhf, rlaif, ppo, dpo and more, 2024

  14. [14]

    Measuring political bias in large language models: What is said and how it is said

    Yejin Bang, Delong Chen, Nayeon Lee, and Pascale Fung. Measuring political bias in large language models: What is said and how it is said. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, editors,Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 11142–11159, Bangkok, Thailand, August 20...

  15. [15]

    From pretraining data to language models to downstream tasks: Tracking the trails of political biases leading to unfair NLP models

    Shangbin Feng, Chan Young Park, Yuhan Liu, and Yulia Tsvetkov. From pretraining data to language models to downstream tasks: Tracking the trails of political biases leading to unfair NLP models. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki, editors,Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1:...

  16. [16]

    Measuring political bias in large language models: What is said and how it is said, 2024

    Yejin Bang, Delong Chen, Nayeon Lee, and Pascale Fung. Measuring political bias in large language models: What is said and how it is said, 2024

  17. [17]

    Llama meets EU: Investigating the European political spectrum through the lens of LLMs

    Ilias Chalkidis and Stephanie Brandl. Llama meets EU: Investigating the European political spectrum through the lens of LLMs. In Kevin Duh, Helena Gomez, and Steven Bethard, editors,Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers), pages ...

  18. [18]

    Ieee standard review—ethically aligned design: A vision for prioritizing human wellbeing with artificial intelligence and autonomous systems

    Kyarash Shahriari and Mana Shahriari. Ieee standard review—ethically aligned design: A vision for prioritizing human wellbeing with artificial intelligence and autonomous systems. In2017 IEEE Canada International Humanitarian Technology Conference (IHTC), pages 197–201. IEEE, 2017

  19. [19]

    Trustllm: Trustworthiness in large language models

    Lichao Sun, Yue Huang, Haoran Wang, Siyuan Wu, Qihui Zhang, Chujie Gao, Yixin Huang, Wenhan Lyu, Yixuan Zhang, Xiner Li, et al. Trustllm: Trustworthiness in large language models.arXiv preprint arXiv:2401.05561, 3, 2024

  20. [20]

    Niles, Ken Pathak, and Steven Sloan

    Md Meftahul Ferdaus, Mahdi Abdelguerfi, Elias Ioup, Kendall N. Niles, Ken Pathak, and Steven Sloan. Towards trustworthy ai: A review of ethical and robust large language models, 2024

  21. [21]

    Trustworthy llms: a survey and guideline for evaluating large language models’ alignment

    Yang Liu, Yuanshun Yao Yao, Jean-Francois Ton Ton, Xiaoying Zhang, Ruocheng Guo, Yegor Klochkov, Muhammad Faaiz Taufiq, and Hang Li. Trustworthy llms: a survey and guideline for evaluating large language models’ alignment. Inpreprint, 2023

  22. [22]

    On the validity of normative life: Habermas’ discourse ethics | epoché magazine, Aug 2023

    editors. On the validity of normative life: Habermas’ discourse ethics | epoché magazine, Aug 2023

  23. [23]

    Springer US, Boston, MA, 2004

    Wendy Cukier, Robert Bauer, and Catherine Middleton.Applying Habermas’ Validity Claims as a Standard for Critical Discourse Analysis, pages 233–258. Springer US, Boston, MA, 2004

  24. [24]

    Measuring political deliberation: A discourse quality index.Comparative European Politics, 1(1):21–48, Mar 2003

    Marco R Steenbergen, André Bächtiger, Markus Spörndli, and Jürg Steiner. Measuring political deliberation: A discourse quality index.Comparative European Politics, 1(1):21–48, Mar 2003

  25. [25]

    Bakker, Daniel Jarrett, Hannah Sheahan, Martin J

    Michael Henry Tessler, Michiel A. Bakker, Daniel Jarrett, Hannah Sheahan, Martin J. Chadwick, Raphael Koster, Georgina Evans, Lucy Campbell-Gillingham, Tantum Collins, David C. Parkes, Matthew Botvinick, and Christopher Summerfield. Ai can help humans find common ground in democratic deliberation.Science, 386(6719):eadq2852, 2024

  26. [26]

    Towards automating deliberation? the idea of deliberative democracy embedded in google’s habermas machine

    Nicolás Palomo Hernández. Towards automating deliberation? the idea of deliberative democracy embedded in google’s habermas machine. 8:1951–1960

  27. [27]

    Computation and deliberation: The ghost in the habermas machine, Dec 2024

    Lawrence Fisher. Computation and deliberation: The ghost in the habermas machine, Dec 2024

  28. [28]

    TruthfulQA: Measuring How Models Mimic Human Falsehoods

    Stephanie Lin, Jacob Hilton, and Owain Evans. Truthfulqa: Measuring how models mimic human falsehoods. CoRR, abs/2109.07958, 2021

  29. [29]

    BERT: Pre-training of deep bidirectional transformers for language understanding

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. In Jill Burstein, Christy Doran, and Thamar Solorio, editors,Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human 14 A Multi-Dimensional Audi...

  30. [30]

    Association for Computational Linguistics

  31. [31]

    Deberta: Decoding-enhanced bert with disentan- gled attention

    Pengcheng He, Xiaodong Liu, Jianfeng Gao, and Weizhu Chen. Deberta: Decoding-enhanced bert with disentan- gled attention. InInternational Conference on Learning Representations, 2021

  32. [32]

    Roberta-lstm: A hybrid model for sentiment analysis with transformer and recurrent neural network.IEEE Access, 10:21517–21525, 2022

    Kian Long Tan, Chin Poo Lee, Kalaiarasi Sonai Muthu Anbananthen, and Kian Ming Lim. Roberta-lstm: A hybrid model for sentiment analysis with transformer and recurrent neural network.IEEE Access, 10:21517–21525, 2022

  33. [33]

    Sentence-BERT: Sentence embeddings using Siamese BERT-networks

    Nils Reimers and Iryna Gurevych. Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan, editors,Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992, ...

  34. [34]

    Robust hate speech detection in social media: A cross- dataset empirical evaluation

    Dimosthenis Antypas and Jose Camacho-Collados. Robust hate speech detection in social media: A cross- dataset empirical evaluation. In Yi-ling Chung, Paul R{\”ottger}, Debora Nozza, Zeerak Talat, and Aida Mostafazadeh Davani, editors,The 7th Workshop on Online Abuse and Harms (WOAH), pages 231–242, Toronto, Canada, July 2023. Association for Computational...

  35. [35]

    Taylor & Francis, 1963

    Maurice Duverger.Political Parties. Taylor & Francis, 1963

  36. [36]

    Political compass test

    Pace News Ltd. Political compass test. https://www.politicalcompass.org/test, 2001. Accessed: 2025- 07-29

  37. [37]

    Measuring massive multitask language understanding.Proceedings of the International Conference on Learning Representations (ICLR), 2021

    Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt. Measuring massive multitask language understanding.Proceedings of the International Conference on Learning Representations (ICLR), 2021

  38. [38]

    Bold: Dataset and metrics for measuring biases in open-ended language generation

    Jwala Dhamala, Tony Sun, Varun Kumar, Satyapriya Krishna, Yada Pruksachatkun, Kai-Wei Chang, and Rahul Gupta. Bold: Dataset and metrics for measuring biases in open-ended language generation. InProceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’21, page 862–872, New York, NY , USA, 2021. Association for Computing ...

  39. [39]

    Detoxify

    Laura Hanu and Unitary team. Detoxify. Github. https://github.com/unitaryai/detoxify, 2020

  40. [40]

    BLEURT: Learning robust metrics for text generation

    Thibault Sellam, Dipanjan Das, and Ankur Parikh. BLEURT: Learning robust metrics for text generation. In Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault, editors,Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7881–7892, Online, July 2020. Association for Computational Linguistics

  41. [41]

    Tappin, Luke Hewitt, Ed Saunders, Sid Black, Hause Lin, Catherine Fist, Helen Margetts, David G

    Kobi Hackenburg, Ben M. Tappin, Luke Hewitt, Ed Saunders, Sid Black, Hause Lin, Catherine Fist, Helen Margetts, David G. Rand, and Christopher Summerfield. The levers of political persuasion with conversational ai, 2025

  42. [42]

    The persuasive power of large language models.Proceedings of the International AAAI Conference on Web and Social Media, 18(1):152–163, May 2024

    Simon Martin Breum, Daniel Vædele Egdal, Victor Gram Mortensen, Anders Giovanni Møller, and Luca Maria Aiello. The persuasive power of large language models.Proceedings of the International AAAI Conference on Web and Social Media, 18(1):152–163, May 2024

  43. [43]

    [Argument]

    Francesco Barbieri, Jose Camacho-Collados, Luis Espinosa-Anke, and Leonardo Neves. TweetEval:Unified Benchmark and Comparative Evaluation for Tweet Classification. InProceedings of Findings of EMNLP, 2020. 15 A Multi-Dimensional Audit of Politically Aligned Large Language Models A LLM Scoring Prompts Below is the prompt provided to the evaluator LLMs to s...