pith. sign in

arxiv: 2605.20191 · v1 · pith:GLMUWD3Fnew · submitted 2026-04-02 · 💻 cs.CL

Shiny Stories, Hidden Struggles: Investigating the Representation of Disability Through the Lens of LLMs

Pith reviewed 2026-05-21 09:53 UTC · model grok-4.3

classification 💻 cs.CL
keywords disability representationLLM biaspositive stereotypessocial media simulationsentiment analysismarginalized groupsAI ethicspersona generation
0
0 comments X

The pith

Large language models produce overly positive stereotypes when generating social media posts from the perspective of people with disabilities.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper has LLMs generate social media posts as if written by individuals with disabilities and compares those outputs to real posts on the same platforms. The comparison focuses on emotional tone, sentiment scores, and recurring words or themes. It finds that the model outputs lean heavily toward uplifting and inspirational content while real posts reflect a wider range of difficulties and mixed emotions. The same models also link everyday topics such as careers and entertainment more strongly to people without disabilities. These patterns matter because they can shape public understanding and limit how disability is discussed online.

Core claim

When prompted to simulate the perspectives of individuals with disabilities, large language models generate posts that emphasize positive stereotypes and inspirational narratives rather than the full range of lived experiences. Direct comparison with authentic posts written by people with disabilities shows lower authenticity in tone and theme coverage. A parallel comparison of disabled and nondisabled simulations further reveals that certain topics are disproportionately assigned to nondisabled individuals, creating exclusionary associations that do not match real-world distributions.

What carries the argument

Generation of simulated social media posts by LLMs followed by side-by-side analysis against real posts using sentiment, emotional tone, and thematic word distributions.

If this is right

  • LLM outputs can reinforce exclusionary narratives by linking topics such as career and entertainment more to nondisabled people.
  • Idealized portrayals may erase the day-to-day challenges that people with disabilities actually face.
  • Current debiasing efforts can produce overcorrections that create unrealistic depictions of marginalized groups.
  • Developers and users need to apply critical checks before relying on LLMs to represent any demographic experience.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same over-idealization pattern could appear when models simulate other marginalized identities such as race or gender.
  • Incorporating larger volumes of authentic first-person writing from disability communities into training or evaluation sets might narrow the observed gap.
  • New evaluation benchmarks focused on representation fidelity could be built around direct comparisons of simulated versus real text distributions.

Load-bearing premise

Prompting large language models to write from the viewpoint of people with disabilities yields outputs that can be fairly compared to genuine posts without the results being dominated by the models' training-data stereotypes or the wording of the prompts themselves.

What would settle it

A large collection of real social media posts by people with disabilities that shows the same average sentiment scores and theme frequencies as the LLM-generated posts would challenge the claim of systematic idealization.

Figures

Figures reproduced from arXiv: 2605.20191 by Marco Bombieri, Marco Rospocher, Simone Paolo Ponzetto.

Figure 1
Figure 1. Figure 1: A comparison of sentiment, depression levels, and emotions between posts from the disability [PITH_FULL_IMAGE:figures/full_fig_p010_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Emotional distributions of distinctive words in LLM [PITH_FULL_IMAGE:figures/full_fig_p013_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Emotional distributions of posts generated by three different LLMs before and after specifying [PITH_FULL_IMAGE:figures/full_fig_p015_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Emotional distributions of distinctive words in LLM [PITH_FULL_IMAGE:figures/full_fig_p018_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Comparison of the sentiment of datasets produced with different LLMs mentioning or not in [PITH_FULL_IMAGE:figures/full_fig_p031_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Comparison of the depression level of datasets produced with different LLMs, mentioning or [PITH_FULL_IMAGE:figures/full_fig_p031_6.png] view at source ↗
read the original abstract

Modern Large Language Models (LLMs) have recently attracted much attention for their ability to simulate human behavior and generate text that reflects personas and demographic groups. While these capabilities can open up a multitude of diverse applications across fields, it is crucial to examine how such models represent various target groups since LLMs can perpetuate and amplify biases or discrimination against historically marginalized communities or, alternatively, as a result of debiasing efforts, overcorrect by portraying overly positive stereotypes. This overcompensation can idealize these groups, erasing the complexities and challenges they face in favor of unrealistic depictions. In this paper, we investigate how LLMs represent disability by simulating the perspectives of individuals with disabilities in generating social media posts. These posts are then compared with those written by real people with disabilities, focusing on emotional tone, sentiment, and representative words and themes. Our analysis reveals two key findings: (1) LLMs often idealize the experiences of people with disabilities, producing overly positive stereotypes that, despite appearing uplifting, fail to authentically capture their lived realities; and (2) a comparative analysis of posts simulating individuals with and without disabilities highlights a negative bias, where certain topics, such as career and entertainment, are disproportionately associated with nondisabled individuals. This reinforces exclusionary narratives and over-idealized portrayals of disability, misrepresenting the actual challenges faced by this community. These findings align with broader concerns and ongoing research showing that LLMs struggle to reflect the diverse realities of society, particularly the nuanced experiences of marginalized groups, and underscore the need for critical scrutiny of their representations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript examines how large language models represent disability by generating social media posts that simulate the perspectives of individuals with disabilities and comparing these to authentic posts written by people with disabilities. The analysis focuses on emotional tone, sentiment, and thematic content, revealing that LLMs tend to idealize disability experiences with overly positive stereotypes and exhibit a negative bias by associating topics like career and entertainment more with nondisabled individuals.

Significance. Should the results be substantiated with rigorous methodology, this study would contribute valuable insights into the biases present in LLMs regarding the portrayal of marginalized groups such as people with disabilities. It aligns with ongoing research on AI ethics and could inform better practices for model training and prompt design to avoid both under- and over-representation issues.

major comments (2)
  1. [Methods] The description of the prompting strategy for simulating perspectives of individuals with disabilities lacks specifics on prompt phrasing, few-shot examples, controls, or model versions used. This is load-bearing for the central claim, as the observed idealization could stem from prompt artifacts or safety tuning rather than internalized model representations (see skeptic concern and abstract description of generation process).
  2. [Results] No information is provided on sample sizes for generated or real posts, selection/filtering criteria, statistical tests for comparisons, or inter-annotator agreement for theme/sentiment analysis. These omissions make it impossible to evaluate the support for the two key findings on positive stereotypes and negative bias.
minor comments (1)
  1. [Abstract] The abstract could include a one-sentence overview of the empirical setup to help readers assess the scope of the comparison.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback, which highlights important areas for improving the clarity and rigor of our manuscript. We address each major comment below and commit to revisions that will strengthen the presentation of our methods and results without altering the core findings.

read point-by-point responses
  1. Referee: [Methods] The description of the prompting strategy for simulating perspectives of individuals with disabilities lacks specifics on prompt phrasing, few-shot examples, controls, or model versions used. This is load-bearing for the central claim, as the observed idealization could stem from prompt artifacts or safety tuning rather than internalized model representations (see skeptic concern and abstract description of generation process).

    Authors: We agree that greater specificity is needed to support replicability and to address potential concerns about prompt engineering or safety alignments influencing the outputs. The current manuscript provides a high-level overview of the generation process in the abstract and methods, but we will expand this in the revision by including the full prompt templates, any few-shot examples employed, details on control conditions (such as neutral prompts), and the exact model versions and parameters used (e.g., temperature settings). We will also add a brief discussion of how we mitigated prompt artifacts, for instance by testing variations, to better substantiate that the idealization reflects model representations rather than solely external factors. revision: yes

  2. Referee: [Results] No information is provided on sample sizes for generated or real posts, selection/filtering criteria, statistical tests for comparisons, or inter-annotator agreement for theme/sentiment analysis. These omissions make it impossible to evaluate the support for the two key findings on positive stereotypes and negative bias.

    Authors: The referee correctly identifies that these quantitative details were not reported, which limits the ability to fully assess the robustness of the comparisons. In the revised manuscript, we will report the exact sample sizes for both LLM-generated and real posts, describe the criteria and sources used for selecting and filtering the authentic posts (e.g., from public disability-related social media accounts), include statistical tests such as t-tests or chi-squared tests for sentiment and topic differences, and report inter-annotator agreement (e.g., Cohen's kappa) for the manual coding of themes and sentiment. These additions will provide clearer empirical support for the findings on overly positive stereotypes and topic biases. revision: yes

Circularity Check

0 steps flagged

Empirical comparison of generated vs. real posts with no internal derivations

full rationale

The paper conducts a direct empirical comparison: LLMs are prompted to simulate disability perspectives to generate social media posts, which are then analyzed for tone, sentiment, themes, and contrasted against real posts written by people with disabilities. No equations, fitted parameters, predictive models, or derivations appear in the described methodology or abstract. Central claims rest on observable differences between LLM outputs and external real-world data rather than any quantity defined or fitted inside the study itself. Self-citations, if present, are not load-bearing for the core findings, and the study does not rename known results or smuggle ansatzes via prior work. This is a standard self-contained empirical investigation against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper is an empirical study of LLM text generation and comparison; no free parameters, mathematical axioms, or newly postulated entities are introduced or required by the abstract.

pith-pipeline@v0.9.0 · 5821 in / 1170 out tokens · 44733 ms · 2026-05-21T09:53:59.254519+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

54 extracted references · 54 canonical work pages · 1 internal anchor

  1. [1]

    Aher, Rosa I

    Gati V . Aher, Rosa I. Arriaga, and Adam Tauman Kalai. 2023. Using Large Language Models to Simulate Multiple Humans and Replicate Human Subject Studies. InInternational Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA (Proceedings of Machine Learning Research, Vol. 202), Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barb...

  2. [2]

    Jamal J Al-Menayes. 2015. Motivations for using social media: An exploratory factor analysis.International Journal of Psychological Studies7, 1 (2015), 43

  3. [3]

    Busby, Nancy Fulda, Joshua R

    Lisa P. Argyle, Ethan C. Busby, Nancy Fulda, Joshua R. Gubler, Christopher Rytting, and David Wingate. 2023. Out of One, Many: Using Language Models to Simulate Human Samples.Political Analysis31, 3 (2023), 337–351. doi:10.1017/pan.2023.2

  4. [4]

    Ayers and Katherine A

    Kara B. Ayers and Katherine A. Reed. 2022.Chapter 10 Inspiration Porn and Desperation Porn: Disrupting the Objectification of Disability in Media. Brill, Leiden, The Netherlands, 90 – 101. doi:10.1163/9789004512702_014

  5. [5]

    Iryna Babik and Elena S. Gardner. 2021. Factors Affecting the Perception of Disability: A Developmental Perspective. Frontiers in PsychologyV olume 12 - 2021 (2021). doi:10.3389/fpsyg.2021.702166

  6. [6]

    1992.Disabling Imagery and the Media: An Exploration of the Principles for Media Representations of Disabled People

    Colin Barnes and British Council of Organizations of Disabled People. 1992.Disabling Imagery and the Media: An Exploration of the Principles for Media Representations of Disabled People. BCODP. 60 pages. https://books.google. it/books?id=iXIeNAAACAAJ

  7. [7]

    Zou, Venkatesh Saligrama, and Adam Tauman Kalai

    Tolga Bolukbasi, Kai-Wei Chang, James Y . Zou, Venkatesh Saligrama, and Adam Tauman Kalai. 2016. Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. InAdvances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain, Daniel D. Lee, Ma...

  8. [8]

    Marco Bombieri and Marco Rospocher. 2025. Mining Impersonification Bias in LLMs via Survey Filling.Information 16, 11 (2025). doi:10.3390/info16110931

  9. [9]

    2009.Contours of Ableism: The Production of Disability and Abledness

    Fiona Kumari Campbell. 2009.Contours of Ableism: The Production of Disability and Abledness. Palgrave Macmillan London. 231 pages. doi:10.1057/9780230245181

  10. [10]

    Myra Cheng, Esin Durmus, and Dan Jurafsky. 2023. Marked Personas: Using Natural Language Prompts to Measure Stereotypes in Language Models. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, Anna Rogers, Jordan L. Boyd-Graber, and Naoaki Okazaki (Ed...

  11. [11]

    Zhibo Chu, Zichong Wang, and Wenbin Zhang. 2024. Fairness in Large Language Models: A Taxonomic Survey. SIGKDD Explor. Newsl.26, 1 (July 2024), 34–48. doi:10.1145/3682112.3682117

  12. [12]

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. InProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Jill Burstein, Christy ...

  13. [13]

    Mark Díaz, Isaac Johnson, Amanda Lazar, Anne Marie Piper, and Darren Gergle. 2019. Addressing Age-Related Bias in Sentiment Analysis. InProceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, August 10-16, 2019, Sarit Kraus (Ed.). ijcai.org, 6146–6150. doi:10.24963/IJCAI.2019/852

  14. [14]

    Ricardo Fitas. 2025. Inclusive education with AI: supporting special needs and tackling language barriers.AI and Ethics (2025). doi:10.1007/s43681-025-00824-3

  15. [16]

    I wouldn’t say offensive but

    Vinitha Gadiraju, Shaun K. Kane, Sunipa Dev, Alex S. Taylor, Ding Wang, Emily Denton, and Robin Brewer. 2023. "I wouldn’t say offensive but...": Disability-Centered Perspectives on Large Language Models. InProceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, FAccT 2023, Chicago, IL, USA, June 12-15, 2023. ACM, 205–216. doi...

  16. [17]

    Gallegos, Ryan A

    Isabel O. Gallegos, Ryan A. Rossi, Joe Barrow, Md Mehrab Tanjim, Sungchul Kim, Franck Dernoncourt, Tong Yu, Ruiyi Zhang, and Nesreen K. Ahmed. 2024. Bias and Fairness in Large Language Models: A Survey.Computa- tional Linguistics50, 3 (09 2024), 1097–1179. doi:10.1162/coli_a_00524 arXiv:https://direct.mit.edu/coli/article- pdf/50/3/1097/2471010/coli_a_00524.pdf

  17. [18]

    Nikhil Garg, Londa Schiebinger, Dan Jurafsky, and James Zou. 2018. Word embeddings quantify 100 years of gender and ethnic stereotypes.Proc. Natl. Acad. Sci. USA115, 16 (2018), E3635–E3644. doi:10.1073/PNAS.1720347115

  18. [19]

    Glazko, Yusuf Mohammed, Ben Kosa, Venkatesh Potluri, and Jennifer Mankoff

    Kate S. Glazko, Yusuf Mohammed, Ben Kosa, Venkatesh Potluri, and Jennifer Mankoff. 2024. Identifying and Improving Disability Bias in GPT-Based Resume Screening. InThe 2024 ACM Conference on Fairness, Accountability, and Transparency, FAccT 2024, Rio de Janeiro, Brazil, June 3-6, 2024. ACM, 687–700. doi:10.1145/3630106.3658933

  19. [20]

    Jan Grue. 2015. The Problem of the Supercrip: Representation and Misrepresentation of Disability.Disability Research Today: International Perspectives(01 2015), 204–218

  20. [21]

    Jan Grue. 2016. The problem with inspiration porn: a tentative definition and a provisional critique.Disability & Society 31, 6 (2016), 838–849. doi:10.1080/09687599.2016.1205473

  21. [23]

    Saad Hassan, Matt Huenerfauth, and Cecilia Ovesdotter Alm. 2021. Unpacking the Interdependent Systems of Discrimination: Ableist Bias in NLP Systems through an Intersectional Lens. InFindings of the Association for Computational Linguistics: EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 16-20 November, 2021, Marie-Francine Moens, Xuanjing Hu...

  22. [24]

    Brienna Herold, James Waller, and Raja Kushalnagar. 2022. Applying the Stereotype Content Model to assess disability bias in popular pre-trained NLP models underlying AI-based assistive technologies. InNinth Workshop on Speech and Language Processing for Assistive Technologies (SLPAT-2022), Sarah Ebling, Emily Prud’hommeaux, and Preethi Vaidyanathan (Eds....

  23. [25]

    Tiancheng Hu and Nigel Collier. 2024. Quantifying the Persona Effect in LLM Simulations. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2024, Bangkok, Thailand, August 11-16, 2024, Lun-Wei Ku, Andre Martins, and Vivek Srikumar (Eds.). Association for Computational Linguistics, 10289–1...

  24. [26]

    Ben Hutchinson, Vinodkumar Prabhakaran, Emily Denton, Kellie Webster, Yu Zhong, and Stephen Denuyl. 2020. Social Biases in NLP Models as Barriers for Persons with Disabilities. InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel R...

  25. [27]

    Hutto and Eric Gilbert

    Clayton J. Hutto and Eric Gilbert. 2014. V ADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text. InProceedings of the Eighth International Conference on Weblogs and Social Media, ICWSM 2014, Ann Arbor, Michigan, USA, June 1-4, 2014, Eytan Adar, Paul Resnick, Munmun De Choudhury, Bernie Hogan, and Alice Oh (Eds.). The AAAI Press

  26. [29]

    Gauri Kambhatla, Ian Stewart, and Rada Mihalcea. 2022. Surfacing Racial Stereotypes through Identity Portrayal. In FAccT ’22: 2022 ACM Conference on Fairness, Accountability, and Transparency, Seoul, Republic of Korea, June 21 - 24, 2022. ACM, 1604–1615. doi:10.1145/3531146.3533217

  27. [30]

    Mohammad

    Svetlana Kiritchenko and Saif M. Mohammad. 2018. Examining Gender and Race Bias in Two Hundred Senti- ment Analysis Systems. InProceedings of the Seventh Joint Conference on Lexical and Computational Semantics, *SEM@NAACL-HLT 2018, New Orleans, Louisiana, USA, June 5-6, 2018, Malvina Nissim, Jonathan Berant, and Shiny Stories, Hidden Struggles: Investigat...

  28. [31]

    Richard Landis and Gary G

    J. Richard Landis and Gary G. Koch. 1977. The Measurement of Observer Agreement for Categorical Data.Biometrics 33, 1 (1977)

  29. [32]

    François Ledoyen, Gaël Dias, Jeremie Pantin, Alexis Lechervy, Fabrice Maurel, and Youssef Chahir. 2025. Facilitating Cognitive Accessibility with LLMs: A Multi-Task Approach to Easy-to-Read Text Generation. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Ros...

  30. [33]

    Rong Li, Ashwini Kamaraj, Jing Ma, and Sarah Ebling. 2024. Decoding Ableism in Large Language Models: An Intersectional Approach. InProceedings of the Third Workshop on NLP for Positive Impact, Daryna Dementieva, Oana Ignat, Zhijing Jin, Rada Mihalcea, Giorgio Piatti, Joel Tetreault, Steven Wilson, and Jieyu Zhao (Eds.). Association for Computational Ling...

  31. [35]

    Elham Madjidi and Christopher Crick. 2025. Towards Inclusive Reading: A Neural Text Generation Framework for Dyslexia Accessibility. InProceedings of the 11th International Conference on Software Development and Technologies for Enhancing Accessibility and Fighting Info-Exclusion (DSAI ’24). Association for Computing Machinery, New York, NY , USA, 360–367...

  32. [36]

    Thomas Manzini, Lim Yao Chong, Alan W Black, and Yulia Tsvetkov. 2019. Black is to Criminal as Caucasian is to Police: Detecting and Removing Multiclass Bias in Word Embeddings. InProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers),...

  33. [37]

    Michal Mˇechura. 2022. A Taxonomy of Bias-Causing Ambiguities in Machine Translation. InProceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP), Christian Hardmeier, Christine Basta, Marta R. Costa-jussà, Gabriel Stanovsky, and Hila Gonen (Eds.). Association for Computational Linguistics, Seattle, Washington, 168–173. doi:10...

  34. [38]

    Katelyn Mei, Sonia Fereidooni, and Aylin Caliskan. 2023. Bias Against 93 Stigmatized Groups in Masked Language Models and Downstream Sentiment Classification Tasks. InProceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, FAccT 2023, Chicago, IL, USA, June 12-15, 2023. ACM, 1699–1710. doi:10.1145/ 3593013.3594109

  35. [39]

    Mohammad and Peter D

    Saif M. Mohammad and Peter D. Turney. 2013. Crowdsourcing a Word-Emotion Association Lexicon.Comput. Intell. 29, 3 (2013), 436–465. https://doi.org/10.1111/j.1467-8640.2012.00460.x

  36. [40]

    #DisabledOnIn- dianTwitter

    Ishani Mondal, Sukhnidh Kaur, Kalika Bali, Aditya Vashistha, and Manohar Swaminathan. 2022. "#DisabledOnIn- dianTwitter" : A Dataset towards Understanding the Expression of People with Disabilities on Indian Twitter. In Findings of the Association for Computational Linguistics: AACL-IJCNLP 2022, Online only, November 20-23, 2022, Yulan He, Heng Ji, Yang L...

  37. [41]

    Monroe, Michael P

    Burt L. Monroe, Michael P. Colaresi, and Kevin M. Quinn. 2017. Fightin’ Words: Lexical Feature Selection and Evaluation for Identifying the Content of Political Conflict.Political Analysis16, 4 (2017), 372–403. doi:10.1093/pan/ mpn018

  38. [42]

    1990.Politics of Disablement

    Michael Oliver. 1990.Politics of Disablement. Red Globe Press London, London. 152 pages. doi:10.1007/978-1-349- 20895-1

  39. [43]

    World Health Organization. 2023. World Health Organization - Disability. https://www.who.int/health-topics/disability. Accessed: 2025-01-13

  40. [44]

    Srikant Panda, Amit Agarwal, and Hitesh Laxmichand Patel. 2025. AccessEval: Benchmarking Disability Bias in Large Language Models. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng (Eds.). Association for Computational Linguistics, Suzhou...

  41. [45]

    Alicia Parrish, Angelica Chen, Nikita Nangia, Vishakh Padmakumar, Jason Phang, Jana Thompson, Phu Mon Htut, and Samuel R. Bowman. 2022. BBQ: A hand-built bias benchmark for question answering. InFindings of the Association for Computational Linguistics: ACL 2022, Dublin, Ireland, May 22-27, 2022, Smaranda Muresan, Preslav Nakov, and Aline Villavicencio (E...

  42. [46]

    Rafał Po´swiata and Michał Perełkiewicz. 2022. OPI@LT-EDI-ACL2022: Detecting Signs of Depression from Social Media Text using RoBERTa Pre-trained Language Models. InProceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion. Association for Computational Linguistics, Dublin, Ireland, 276–282. doi:10.18653/v1/2022.ltedi-1.40

  43. [47]

    Rebecca Qian, Candace Ross, Jude Fernandes, Eric Michael Smith, Douwe Kiela, and Adina Williams. 2022. Perturba- tion Augmentation for Fairer NLP. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (Eds.). Association for Computational Linguistics, Abu Dhabi, United Ara...

  44. [48]

    Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language Models are Unsupervised Multitask Learners.OpenAI(2019)

  45. [49]

    Abel Salinas, Parth Shah, Yuzhong Huang, Robert McCormack, and Fred Morstatter. 2023. The Unequal Opportunities of Large Language Models: Examining Demographic Biases in Job Recommendations by ChatGPT and LLaMA. InProceedings of the 3rd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization. Association for Computing Machinery, Ne...

  46. [50]

    Emily Sheng, Kai-Wei Chang, Prem Natarajan, and Nanyun Peng. 2021. Societal Biases in Language Generation: Progress and Challenges. InProceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, Au...

  47. [51]

    I‘m sorry to hear that

    Eric Michael Smith, Melissa Hall, Melanie Kambadur, Eleonora Presani, and Adina Williams. 2022. “I‘m sorry to hear that”: Finding New Biases in Language Models with a Holistic Descriptor Dataset. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (Eds.). Association for...

  48. [52]

    Karthik Sreedhar and Lydia B. Chilton. 2024. Simulating Human Strategic Behavior: Comparing Single and Multi-agent LLMs.CoRRabs/2402.08189 (2024). doi:10.48550/ARXIV .2402.08189

  49. [53]

    Grzegorz Szumski, Joanna Smogorzewska, and Paweł Grygiel. 2020. Attitudes of students toward people with disabilities, moral identity and inclusive education—A two-level analysis.Research in Developmental Disabilities102 (2020), 103685. doi:10.1016/j.ridd.2020.103685

  50. [54]

    Nicholas Tilmes. 2022. Disability, fairness, and algorithmic bias in AI recruitment.Ethics Inf. Technol.24, 2 (2022), 21. doi:10.1007/S10676-022-09633-2

  51. [55]

    Laura VanPuymbrouck, Carli Friedman, and Heather Ann Feldner. 2020. Explicit and implicit disability attitudes of healthcare providers.Rehabilitation psychology(2020)

  52. [56]

    Pranav Narayanan Venkit, Mukund Srinath, and Shomir Wilson. 2022. A Study of Implicit Bias in Pretrained Language Models against People with Disabilities. InProceedings of the 29th International Conference on Computational Linguistics, COLING 2022, Gyeongju, Republic of Korea, October 12-17, 2022, Nicoletta Calzolari, Chu-Ren Huang, Hansaem Kim, James Pus...

  53. [57]

    Vijay Viswanathan, Kiril Gashteovski, Kiril Gashteovski, Carolin Lawrence, Tongshuang Wu, and Graham Neu- big. 2024. Large Language Models Enable Few-Shot Clustering.Transactions of the Association for Com- putational Linguistics12 (04 2024), 321–333. doi:10.1162/tacl_a_00648 arXiv:https://direct.mit.edu/tacl/article- pdf/doi/10.1162/tacl_a_00648/2362202/...

  54. [58]

    1" or "2

    Zoe Wyatt. 2024. The Dark Side of #PositiveVibes: Understanding Toxic Positivity in Modern Culture.Psychiatry and Behavioral Health3 (09 2024), 1–6. Shiny Stories, Hidden Struggles: Investigating the Representation of Disability Through the Lens of LLMs 25 A Appendix A.1 Prompt used for preprocessing the REDDdataset To extract from subreddits only posts w...