Shiny Stories, Hidden Struggles: Investigating the Representation of Disability Through the Lens of LLMs

Marco Bombieri; Marco Rospocher; Simone Paolo Ponzetto

arxiv: 2605.20191 · v1 · pith:GLMUWD3Fnew · submitted 2026-04-02 · 💻 cs.CL

Shiny Stories, Hidden Struggles: Investigating the Representation of Disability Through the Lens of LLMs

Marco Bombieri , Simone Paolo Ponzetto , Marco Rospocher This is my paper

Pith reviewed 2026-05-21 09:53 UTC · model grok-4.3

classification 💻 cs.CL

keywords disability representationLLM biaspositive stereotypessocial media simulationsentiment analysismarginalized groupsAI ethicspersona generation

0 comments

The pith

Large language models produce overly positive stereotypes when generating social media posts from the perspective of people with disabilities.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper has LLMs generate social media posts as if written by individuals with disabilities and compares those outputs to real posts on the same platforms. The comparison focuses on emotional tone, sentiment scores, and recurring words or themes. It finds that the model outputs lean heavily toward uplifting and inspirational content while real posts reflect a wider range of difficulties and mixed emotions. The same models also link everyday topics such as careers and entertainment more strongly to people without disabilities. These patterns matter because they can shape public understanding and limit how disability is discussed online.

Core claim

When prompted to simulate the perspectives of individuals with disabilities, large language models generate posts that emphasize positive stereotypes and inspirational narratives rather than the full range of lived experiences. Direct comparison with authentic posts written by people with disabilities shows lower authenticity in tone and theme coverage. A parallel comparison of disabled and nondisabled simulations further reveals that certain topics are disproportionately assigned to nondisabled individuals, creating exclusionary associations that do not match real-world distributions.

What carries the argument

Generation of simulated social media posts by LLMs followed by side-by-side analysis against real posts using sentiment, emotional tone, and thematic word distributions.

If this is right

LLM outputs can reinforce exclusionary narratives by linking topics such as career and entertainment more to nondisabled people.
Idealized portrayals may erase the day-to-day challenges that people with disabilities actually face.
Current debiasing efforts can produce overcorrections that create unrealistic depictions of marginalized groups.
Developers and users need to apply critical checks before relying on LLMs to represent any demographic experience.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same over-idealization pattern could appear when models simulate other marginalized identities such as race or gender.
Incorporating larger volumes of authentic first-person writing from disability communities into training or evaluation sets might narrow the observed gap.
New evaluation benchmarks focused on representation fidelity could be built around direct comparisons of simulated versus real text distributions.

Load-bearing premise

Prompting large language models to write from the viewpoint of people with disabilities yields outputs that can be fairly compared to genuine posts without the results being dominated by the models' training-data stereotypes or the wording of the prompts themselves.

What would settle it

A large collection of real social media posts by people with disabilities that shows the same average sentiment scores and theme frequencies as the LLM-generated posts would challenge the claim of systematic idealization.

Figures

Figures reproduced from arXiv: 2605.20191 by Marco Bombieri, Marco Rospocher, Simone Paolo Ponzetto.

**Figure 2.** Figure 2: Emotional distributions of distinctive words in LLM [PITH_FULL_IMAGE:figures/full_fig_p013_2.png] view at source ↗

**Figure 3.** Figure 3: Emotional distributions of posts generated by three different LLMs before and after specifying [PITH_FULL_IMAGE:figures/full_fig_p015_3.png] view at source ↗

**Figure 4.** Figure 4: Emotional distributions of distinctive words in LLM [PITH_FULL_IMAGE:figures/full_fig_p018_4.png] view at source ↗

**Figure 5.** Figure 5: Comparison of the sentiment of datasets produced with different LLMs mentioning or not in [PITH_FULL_IMAGE:figures/full_fig_p031_5.png] view at source ↗

**Figure 6.** Figure 6: Comparison of the depression level of datasets produced with different LLMs, mentioning or [PITH_FULL_IMAGE:figures/full_fig_p031_6.png] view at source ↗

read the original abstract

Modern Large Language Models (LLMs) have recently attracted much attention for their ability to simulate human behavior and generate text that reflects personas and demographic groups. While these capabilities can open up a multitude of diverse applications across fields, it is crucial to examine how such models represent various target groups since LLMs can perpetuate and amplify biases or discrimination against historically marginalized communities or, alternatively, as a result of debiasing efforts, overcorrect by portraying overly positive stereotypes. This overcompensation can idealize these groups, erasing the complexities and challenges they face in favor of unrealistic depictions. In this paper, we investigate how LLMs represent disability by simulating the perspectives of individuals with disabilities in generating social media posts. These posts are then compared with those written by real people with disabilities, focusing on emotional tone, sentiment, and representative words and themes. Our analysis reveals two key findings: (1) LLMs often idealize the experiences of people with disabilities, producing overly positive stereotypes that, despite appearing uplifting, fail to authentically capture their lived realities; and (2) a comparative analysis of posts simulating individuals with and without disabilities highlights a negative bias, where certain topics, such as career and entertainment, are disproportionately associated with nondisabled individuals. This reinforces exclusionary narratives and over-idealized portrayals of disability, misrepresenting the actual challenges faced by this community. These findings align with broader concerns and ongoing research showing that LLMs struggle to reflect the diverse realities of society, particularly the nuanced experiences of marginalized groups, and underscore the need for critical scrutiny of their representations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LLMs generate overly positive disability posts in simulations compared to real ones, but the prompting method may be creating the idealization effect.

read the letter

The main takeaway is that this paper finds LLMs produce social media posts simulating disability that lean too positive and upbeat, missing the real difficulties, while also tying topics like careers and entertainment more to nondisabled personas than to disabled ones. The comparison uses generated text versus authentic posts from people with disabilities, looking at tone, sentiment, and themes. This is a focused extension of bias studies rather than a new method overall, but it applies the idea directly to disability representation in a way that connects to content generation risks. It does a solid job noting that positive stereotypes can still distort by smoothing over lived challenges, which aligns with broader worries about how models handle marginalized groups. The second finding on topic skew adds a useful angle on exclusion. The soft spot sits in the simulation setup. Prompting models to take on disability perspectives can easily push outputs toward safe, uplifting language due to alignment training, so the positivity might trace more to the prompt than to what the model has internalized. The abstract gives no prompt wording, controls, sample sizes, or stats details, which leaves the strength of the evidence unclear until the full methods are checked. If those sections show robust controls and transparent analysis, the claims hold better; otherwise the comparison risks being an artifact. This work targets researchers in AI ethics and fairness who care about how generative models shape views on disability. Readers already following bias audits or representation studies would get practical value from the concrete examples. It deserves peer review because the question is timely and the empirical contrast is a reasonable probe, even with the need for tighter methods reporting.

Referee Report

2 major / 1 minor

Summary. The manuscript examines how large language models represent disability by generating social media posts that simulate the perspectives of individuals with disabilities and comparing these to authentic posts written by people with disabilities. The analysis focuses on emotional tone, sentiment, and thematic content, revealing that LLMs tend to idealize disability experiences with overly positive stereotypes and exhibit a negative bias by associating topics like career and entertainment more with nondisabled individuals.

Significance. Should the results be substantiated with rigorous methodology, this study would contribute valuable insights into the biases present in LLMs regarding the portrayal of marginalized groups such as people with disabilities. It aligns with ongoing research on AI ethics and could inform better practices for model training and prompt design to avoid both under- and over-representation issues.

major comments (2)

[Methods] The description of the prompting strategy for simulating perspectives of individuals with disabilities lacks specifics on prompt phrasing, few-shot examples, controls, or model versions used. This is load-bearing for the central claim, as the observed idealization could stem from prompt artifacts or safety tuning rather than internalized model representations (see skeptic concern and abstract description of generation process).
[Results] No information is provided on sample sizes for generated or real posts, selection/filtering criteria, statistical tests for comparisons, or inter-annotator agreement for theme/sentiment analysis. These omissions make it impossible to evaluate the support for the two key findings on positive stereotypes and negative bias.

minor comments (1)

[Abstract] The abstract could include a one-sentence overview of the empirical setup to help readers assess the scope of the comparison.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback, which highlights important areas for improving the clarity and rigor of our manuscript. We address each major comment below and commit to revisions that will strengthen the presentation of our methods and results without altering the core findings.

read point-by-point responses

Referee: [Methods] The description of the prompting strategy for simulating perspectives of individuals with disabilities lacks specifics on prompt phrasing, few-shot examples, controls, or model versions used. This is load-bearing for the central claim, as the observed idealization could stem from prompt artifacts or safety tuning rather than internalized model representations (see skeptic concern and abstract description of generation process).

Authors: We agree that greater specificity is needed to support replicability and to address potential concerns about prompt engineering or safety alignments influencing the outputs. The current manuscript provides a high-level overview of the generation process in the abstract and methods, but we will expand this in the revision by including the full prompt templates, any few-shot examples employed, details on control conditions (such as neutral prompts), and the exact model versions and parameters used (e.g., temperature settings). We will also add a brief discussion of how we mitigated prompt artifacts, for instance by testing variations, to better substantiate that the idealization reflects model representations rather than solely external factors. revision: yes
Referee: [Results] No information is provided on sample sizes for generated or real posts, selection/filtering criteria, statistical tests for comparisons, or inter-annotator agreement for theme/sentiment analysis. These omissions make it impossible to evaluate the support for the two key findings on positive stereotypes and negative bias.

Authors: The referee correctly identifies that these quantitative details were not reported, which limits the ability to fully assess the robustness of the comparisons. In the revised manuscript, we will report the exact sample sizes for both LLM-generated and real posts, describe the criteria and sources used for selecting and filtering the authentic posts (e.g., from public disability-related social media accounts), include statistical tests such as t-tests or chi-squared tests for sentiment and topic differences, and report inter-annotator agreement (e.g., Cohen's kappa) for the manual coding of themes and sentiment. These additions will provide clearer empirical support for the findings on overly positive stereotypes and topic biases. revision: yes

Circularity Check

0 steps flagged

Empirical comparison of generated vs. real posts with no internal derivations

full rationale

The paper conducts a direct empirical comparison: LLMs are prompted to simulate disability perspectives to generate social media posts, which are then analyzed for tone, sentiment, themes, and contrasted against real posts written by people with disabilities. No equations, fitted parameters, predictive models, or derivations appear in the described methodology or abstract. Central claims rest on observable differences between LLM outputs and external real-world data rather than any quantity defined or fitted inside the study itself. Self-citations, if present, are not load-bearing for the core findings, and the study does not rename known results or smuggle ansatzes via prior work. This is a standard self-contained empirical investigation against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper is an empirical study of LLM text generation and comparison; no free parameters, mathematical axioms, or newly postulated entities are introduced or required by the abstract.

pith-pipeline@v0.9.0 · 5821 in / 1170 out tokens · 44733 ms · 2026-05-21T09:53:59.254519+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We investigate how LLMs represent disability by simulating the perspectives of individuals with disabilities in generating social media posts... focusing on emotional tone, sentiment, and representative words and themes.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

LLMs often idealize the experiences of people with disabilities, producing overly positive stereotypes

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

54 extracted references · 54 canonical work pages · 1 internal anchor

[1]

Aher, Rosa I

Gati V . Aher, Rosa I. Arriaga, and Adam Tauman Kalai. 2023. Using Large Language Models to Simulate Multiple Humans and Replicate Human Subject Studies. InInternational Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA (Proceedings of Machine Learning Research, Vol. 202), Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barb...

work page 2023
[2]

Jamal J Al-Menayes. 2015. Motivations for using social media: An exploratory factor analysis.International Journal of Psychological Studies7, 1 (2015), 43

work page 2015
[3]

Busby, Nancy Fulda, Joshua R

Lisa P. Argyle, Ethan C. Busby, Nancy Fulda, Joshua R. Gubler, Christopher Rytting, and David Wingate. 2023. Out of One, Many: Using Language Models to Simulate Human Samples.Political Analysis31, 3 (2023), 337–351. doi:10.1017/pan.2023.2

work page doi:10.1017/pan.2023.2 2023
[4]

Ayers and Katherine A

Kara B. Ayers and Katherine A. Reed. 2022.Chapter 10 Inspiration Porn and Desperation Porn: Disrupting the Objectification of Disability in Media. Brill, Leiden, The Netherlands, 90 – 101. doi:10.1163/9789004512702_014

work page doi:10.1163/9789004512702_014 2022
[5]

Iryna Babik and Elena S. Gardner. 2021. Factors Affecting the Perception of Disability: A Developmental Perspective. Frontiers in PsychologyV olume 12 - 2021 (2021). doi:10.3389/fpsyg.2021.702166

work page doi:10.3389/fpsyg.2021.702166 2021
[6]

1992.Disabling Imagery and the Media: An Exploration of the Principles for Media Representations of Disabled People

Colin Barnes and British Council of Organizations of Disabled People. 1992.Disabling Imagery and the Media: An Exploration of the Principles for Media Representations of Disabled People. BCODP. 60 pages. https://books.google. it/books?id=iXIeNAAACAAJ

work page 1992
[7]

Zou, Venkatesh Saligrama, and Adam Tauman Kalai

Tolga Bolukbasi, Kai-Wei Chang, James Y . Zou, Venkatesh Saligrama, and Adam Tauman Kalai. 2016. Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. InAdvances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain, Daniel D. Lee, Ma...

work page 2016
[8]

Marco Bombieri and Marco Rospocher. 2025. Mining Impersonification Bias in LLMs via Survey Filling.Information 16, 11 (2025). doi:10.3390/info16110931

work page doi:10.3390/info16110931 2025
[9]

2009.Contours of Ableism: The Production of Disability and Abledness

Fiona Kumari Campbell. 2009.Contours of Ableism: The Production of Disability and Abledness. Palgrave Macmillan London. 231 pages. doi:10.1057/9780230245181

work page doi:10.1057/9780230245181 2009
[10]

Myra Cheng, Esin Durmus, and Dan Jurafsky. 2023. Marked Personas: Using Natural Language Prompts to Measure Stereotypes in Language Models. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, Anna Rogers, Jordan L. Boyd-Graber, and Naoaki Okazaki (Ed...

work page doi:10.18653/v1/2023.acl- 2023
[11]

Zhibo Chu, Zichong Wang, and Wenbin Zhang. 2024. Fairness in Large Language Models: A Taxonomic Survey. SIGKDD Explor. Newsl.26, 1 (July 2024), 34–48. doi:10.1145/3682112.3682117

work page doi:10.1145/3682112.3682117 2024
[12]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. InProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Jill Burstein, Christy ...

work page doi:10.18653/v1/n19-1423 2019
[13]

Mark Díaz, Isaac Johnson, Amanda Lazar, Anne Marie Piper, and Darren Gergle. 2019. Addressing Age-Related Bias in Sentiment Analysis. InProceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, August 10-16, 2019, Sarit Kraus (Ed.). ijcai.org, 6146–6150. doi:10.24963/IJCAI.2019/852

work page doi:10.24963/ijcai.2019/852 2019
[14]

Ricardo Fitas. 2025. Inclusive education with AI: supporting special needs and tackling language barriers.AI and Ethics (2025). doi:10.1007/s43681-025-00824-3

work page doi:10.1007/s43681-025-00824-3 2025
[16]

I wouldn’t say offensive but

Vinitha Gadiraju, Shaun K. Kane, Sunipa Dev, Alex S. Taylor, Ding Wang, Emily Denton, and Robin Brewer. 2023. "I wouldn’t say offensive but...": Disability-Centered Perspectives on Large Language Models. InProceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, FAccT 2023, Chicago, IL, USA, June 12-15, 2023. ACM, 205–216. doi...

work page doi:10.1145/3593013.3593989 2023
[17]

Gallegos, Ryan A

Isabel O. Gallegos, Ryan A. Rossi, Joe Barrow, Md Mehrab Tanjim, Sungchul Kim, Franck Dernoncourt, Tong Yu, Ruiyi Zhang, and Nesreen K. Ahmed. 2024. Bias and Fairness in Large Language Models: A Survey.Computa- tional Linguistics50, 3 (09 2024), 1097–1179. doi:10.1162/coli_a_00524 arXiv:https://direct.mit.edu/coli/article- pdf/50/3/1097/2471010/coli_a_00524.pdf

work page doi:10.1162/coli_a_00524 2024
[18]

Nikhil Garg, Londa Schiebinger, Dan Jurafsky, and James Zou. 2018. Word embeddings quantify 100 years of gender and ethnic stereotypes.Proc. Natl. Acad. Sci. USA115, 16 (2018), E3635–E3644. doi:10.1073/PNAS.1720347115

work page doi:10.1073/pnas.1720347115 2018
[19]

Glazko, Yusuf Mohammed, Ben Kosa, Venkatesh Potluri, and Jennifer Mankoff

Kate S. Glazko, Yusuf Mohammed, Ben Kosa, Venkatesh Potluri, and Jennifer Mankoff. 2024. Identifying and Improving Disability Bias in GPT-Based Resume Screening. InThe 2024 ACM Conference on Fairness, Accountability, and Transparency, FAccT 2024, Rio de Janeiro, Brazil, June 3-6, 2024. ACM, 687–700. doi:10.1145/3630106.3658933

work page doi:10.1145/3630106.3658933 2024
[20]

Jan Grue. 2015. The Problem of the Supercrip: Representation and Misrepresentation of Disability.Disability Research Today: International Perspectives(01 2015), 204–218

work page 2015
[21]

Jan Grue. 2016. The problem with inspiration porn: a tentative definition and a provisional critique.Disability & Society 31, 6 (2016), 838–849. doi:10.1080/09687599.2016.1205473

work page doi:10.1080/09687599.2016.1205473 2016
[23]

Saad Hassan, Matt Huenerfauth, and Cecilia Ovesdotter Alm. 2021. Unpacking the Interdependent Systems of Discrimination: Ableist Bias in NLP Systems through an Intersectional Lens. InFindings of the Association for Computational Linguistics: EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 16-20 November, 2021, Marie-Francine Moens, Xuanjing Hu...

work page doi:10.18653/v1/2021.findings-emnlp.267 2021
[24]

Brienna Herold, James Waller, and Raja Kushalnagar. 2022. Applying the Stereotype Content Model to assess disability bias in popular pre-trained NLP models underlying AI-based assistive technologies. InNinth Workshop on Speech and Language Processing for Assistive Technologies (SLPAT-2022), Sarah Ebling, Emily Prud’hommeaux, and Preethi Vaidyanathan (Eds....

work page doi:10.18653/v1/2022.slpat-1.8 2022
[25]

Tiancheng Hu and Nigel Collier. 2024. Quantifying the Persona Effect in LLM Simulations. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2024, Bangkok, Thailand, August 11-16, 2024, Lun-Wei Ku, Andre Martins, and Vivek Srikumar (Eds.). Association for Computational Linguistics, 10289–1...

work page doi:10.18653/v1/2024.acl-long.554 2024
[26]

Ben Hutchinson, Vinodkumar Prabhakaran, Emily Denton, Kellie Webster, Yu Zhong, and Stephen Denuyl. 2020. Social Biases in NLP Models as Barriers for Persons with Disabilities. InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel R...

work page doi:10.18653/v1/2020 2020
[27]

Hutto and Eric Gilbert

Clayton J. Hutto and Eric Gilbert. 2014. V ADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text. InProceedings of the Eighth International Conference on Weblogs and Social Media, ICWSM 2014, Ann Arbor, Michigan, USA, June 1-4, 2014, Eytan Adar, Paul Resnick, Munmun De Choudhury, Bernie Hogan, and Alice Oh (Eds.). The AAAI Press

work page 2014
[29]

Gauri Kambhatla, Ian Stewart, and Rada Mihalcea. 2022. Surfacing Racial Stereotypes through Identity Portrayal. In FAccT ’22: 2022 ACM Conference on Fairness, Accountability, and Transparency, Seoul, Republic of Korea, June 21 - 24, 2022. ACM, 1604–1615. doi:10.1145/3531146.3533217

work page doi:10.1145/3531146.3533217 2022
[30]

Mohammad

Svetlana Kiritchenko and Saif M. Mohammad. 2018. Examining Gender and Race Bias in Two Hundred Senti- ment Analysis Systems. InProceedings of the Seventh Joint Conference on Lexical and Computational Semantics, *SEM@NAACL-HLT 2018, New Orleans, Louisiana, USA, June 5-6, 2018, Malvina Nissim, Jonathan Berant, and Shiny Stories, Hidden Struggles: Investigat...

work page doi:10.18653/v1/s18-2005 2018
[31]

Richard Landis and Gary G

J. Richard Landis and Gary G. Koch. 1977. The Measurement of Observer Agreement for Categorical Data.Biometrics 33, 1 (1977)

work page 1977
[32]

François Ledoyen, Gaël Dias, Jeremie Pantin, Alexis Lechervy, Fabrice Maurel, and Youssef Chahir. 2025. Facilitating Cognitive Accessibility with LLMs: A Multi-Task Approach to Easy-to-Read Text Generation. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Ros...

work page 2025
[33]

Rong Li, Ashwini Kamaraj, Jing Ma, and Sarah Ebling. 2024. Decoding Ableism in Large Language Models: An Intersectional Approach. InProceedings of the Third Workshop on NLP for Positive Impact, Daryna Dementieva, Oana Ignat, Zhijing Jin, Rada Mihalcea, Giorgio Piatti, Joel Tetreault, Steven Wilson, and Jieyu Zhao (Eds.). Association for Computational Ling...

work page doi:10.18653/v1/2024.nlp4pi-1.22 2024
[35]

Elham Madjidi and Christopher Crick. 2025. Towards Inclusive Reading: A Neural Text Generation Framework for Dyslexia Accessibility. InProceedings of the 11th International Conference on Software Development and Technologies for Enhancing Accessibility and Fighting Info-Exclusion (DSAI ’24). Association for Computing Machinery, New York, NY , USA, 360–367...

work page doi:10.1145/3696593.3696625 2025
[36]

Thomas Manzini, Lim Yao Chong, Alan W Black, and Yulia Tsvetkov. 2019. Black is to Criminal as Caucasian is to Police: Detecting and Removing Multiclass Bias in Word Embeddings. InProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers),...

work page doi:10.18653/v1/n19-1062 2019
[37]

Michal Mˇechura. 2022. A Taxonomy of Bias-Causing Ambiguities in Machine Translation. InProceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP), Christian Hardmeier, Christine Basta, Marta R. Costa-jussà, Gabriel Stanovsky, and Hila Gonen (Eds.). Association for Computational Linguistics, Seattle, Washington, 168–173. doi:10...

work page doi:10.18653/v1/2022.gebnlp-1.18 2022
[38]

Katelyn Mei, Sonia Fereidooni, and Aylin Caliskan. 2023. Bias Against 93 Stigmatized Groups in Masked Language Models and Downstream Sentiment Classification Tasks. InProceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, FAccT 2023, Chicago, IL, USA, June 12-15, 2023. ACM, 1699–1710. doi:10.1145/ 3593013.3594109

work page arXiv 2023
[39]

Mohammad and Peter D

Saif M. Mohammad and Peter D. Turney. 2013. Crowdsourcing a Word-Emotion Association Lexicon.Comput. Intell. 29, 3 (2013), 436–465. https://doi.org/10.1111/j.1467-8640.2012.00460.x

work page doi:10.1111/j.1467-8640.2012.00460.x 2013
[40]

#DisabledOnIn- dianTwitter

Ishani Mondal, Sukhnidh Kaur, Kalika Bali, Aditya Vashistha, and Manohar Swaminathan. 2022. "#DisabledOnIn- dianTwitter" : A Dataset towards Understanding the Expression of People with Disabilities on Indian Twitter. In Findings of the Association for Computational Linguistics: AACL-IJCNLP 2022, Online only, November 20-23, 2022, Yulan He, Heng Ji, Yang L...

work page 2022
[41]

Monroe, Michael P

Burt L. Monroe, Michael P. Colaresi, and Kevin M. Quinn. 2017. Fightin’ Words: Lexical Feature Selection and Evaluation for Identifying the Content of Political Conflict.Political Analysis16, 4 (2017), 372–403. doi:10.1093/pan/ mpn018

work page doi:10.1093/pan/ 2017
[42]

1990.Politics of Disablement

Michael Oliver. 1990.Politics of Disablement. Red Globe Press London, London. 152 pages. doi:10.1007/978-1-349- 20895-1

work page doi:10.1007/978-1-349- 1990
[43]

World Health Organization. 2023. World Health Organization - Disability. https://www.who.int/health-topics/disability. Accessed: 2025-01-13

work page 2023
[44]

Srikant Panda, Amit Agarwal, and Hitesh Laxmichand Patel. 2025. AccessEval: Benchmarking Disability Bias in Large Language Models. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng (Eds.). Association for Computational Linguistics, Suzhou...

work page 2025
[45]

Alicia Parrish, Angelica Chen, Nikita Nangia, Vishakh Padmakumar, Jason Phang, Jana Thompson, Phu Mon Htut, and Samuel R. Bowman. 2022. BBQ: A hand-built bias benchmark for question answering. InFindings of the Association for Computational Linguistics: ACL 2022, Dublin, Ireland, May 22-27, 2022, Smaranda Muresan, Preslav Nakov, and Aline Villavicencio (E...

work page doi:10.18653/v1/2022.findings- 2022
[46]

Rafał Po´swiata and Michał Perełkiewicz. 2022. OPI@LT-EDI-ACL2022: Detecting Signs of Depression from Social Media Text using RoBERTa Pre-trained Language Models. InProceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion. Association for Computational Linguistics, Dublin, Ireland, 276–282. doi:10.18653/v1/2022.ltedi-1.40

work page doi:10.18653/v1/2022.ltedi-1.40 2022
[47]

Rebecca Qian, Candace Ross, Jude Fernandes, Eric Michael Smith, Douwe Kiela, and Adina Williams. 2022. Perturba- tion Augmentation for Fairer NLP. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (Eds.). Association for Computational Linguistics, Abu Dhabi, United Ara...

work page doi:10.18653/v1/2022.emnlp-main.646 2022
[48]

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language Models are Unsupervised Multitask Learners.OpenAI(2019)

work page 2019
[49]

Abel Salinas, Parth Shah, Yuzhong Huang, Robert McCormack, and Fred Morstatter. 2023. The Unequal Opportunities of Large Language Models: Examining Demographic Biases in Job Recommendations by ChatGPT and LLaMA. InProceedings of the 3rd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization. Association for Computing Machinery, Ne...

work page doi:10.1145/3617694.3623257 2023
[50]

Emily Sheng, Kai-Wei Chang, Prem Natarajan, and Nanyun Peng. 2021. Societal Biases in Language Generation: Progress and Challenges. InProceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, Au...

work page doi:10.18653/v1/2021.acl-long.330 2021
[51]

I‘m sorry to hear that

Eric Michael Smith, Melissa Hall, Melanie Kambadur, Eleonora Presani, and Adina Williams. 2022. “I‘m sorry to hear that”: Finding New Biases in Language Models with a Holistic Descriptor Dataset. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (Eds.). Association for...

work page doi:10.18653/v1/2022.emnlp- 2022
[52]

Karthik Sreedhar and Lydia B. Chilton. 2024. Simulating Human Strategic Behavior: Comparing Single and Multi-agent LLMs.CoRRabs/2402.08189 (2024). doi:10.48550/ARXIV .2402.08189

work page internal anchor Pith review doi:10.48550/arxiv 2024
[53]

Grzegorz Szumski, Joanna Smogorzewska, and Paweł Grygiel. 2020. Attitudes of students toward people with disabilities, moral identity and inclusive education—A two-level analysis.Research in Developmental Disabilities102 (2020), 103685. doi:10.1016/j.ridd.2020.103685

work page doi:10.1016/j.ridd.2020.103685 2020
[54]

Nicholas Tilmes. 2022. Disability, fairness, and algorithmic bias in AI recruitment.Ethics Inf. Technol.24, 2 (2022), 21. doi:10.1007/S10676-022-09633-2

work page doi:10.1007/s10676-022-09633-2 2022
[55]

Laura VanPuymbrouck, Carli Friedman, and Heather Ann Feldner. 2020. Explicit and implicit disability attitudes of healthcare providers.Rehabilitation psychology(2020)

work page 2020
[56]

Pranav Narayanan Venkit, Mukund Srinath, and Shomir Wilson. 2022. A Study of Implicit Bias in Pretrained Language Models against People with Disabilities. InProceedings of the 29th International Conference on Computational Linguistics, COLING 2022, Gyeongju, Republic of Korea, October 12-17, 2022, Nicoletta Calzolari, Chu-Ren Huang, Hansaem Kim, James Pus...

work page 2022
[57]

Vijay Viswanathan, Kiril Gashteovski, Kiril Gashteovski, Carolin Lawrence, Tongshuang Wu, and Graham Neu- big. 2024. Large Language Models Enable Few-Shot Clustering.Transactions of the Association for Com- putational Linguistics12 (04 2024), 321–333. doi:10.1162/tacl_a_00648 arXiv:https://direct.mit.edu/tacl/article- pdf/doi/10.1162/tacl_a_00648/2362202/...

work page doi:10.1162/tacl_a_00648 2024
[58]

1" or "2

Zoe Wyatt. 2024. The Dark Side of #PositiveVibes: Understanding Toxic Positivity in Modern Culture.Psychiatry and Behavioral Health3 (09 2024), 1–6. Shiny Stories, Hidden Struggles: Investigating the Representation of Disability Through the Lens of LLMs 25 A Appendix A.1 Prompt used for preprocessing the REDDdataset To extract from subreddits only posts w...

work page 2024

[1] [1]

Aher, Rosa I

Gati V . Aher, Rosa I. Arriaga, and Adam Tauman Kalai. 2023. Using Large Language Models to Simulate Multiple Humans and Replicate Human Subject Studies. InInternational Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA (Proceedings of Machine Learning Research, Vol. 202), Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barb...

work page 2023

[2] [2]

Jamal J Al-Menayes. 2015. Motivations for using social media: An exploratory factor analysis.International Journal of Psychological Studies7, 1 (2015), 43

work page 2015

[3] [3]

Busby, Nancy Fulda, Joshua R

Lisa P. Argyle, Ethan C. Busby, Nancy Fulda, Joshua R. Gubler, Christopher Rytting, and David Wingate. 2023. Out of One, Many: Using Language Models to Simulate Human Samples.Political Analysis31, 3 (2023), 337–351. doi:10.1017/pan.2023.2

work page doi:10.1017/pan.2023.2 2023

[4] [4]

Ayers and Katherine A

Kara B. Ayers and Katherine A. Reed. 2022.Chapter 10 Inspiration Porn and Desperation Porn: Disrupting the Objectification of Disability in Media. Brill, Leiden, The Netherlands, 90 – 101. doi:10.1163/9789004512702_014

work page doi:10.1163/9789004512702_014 2022

[5] [5]

Iryna Babik and Elena S. Gardner. 2021. Factors Affecting the Perception of Disability: A Developmental Perspective. Frontiers in PsychologyV olume 12 - 2021 (2021). doi:10.3389/fpsyg.2021.702166

work page doi:10.3389/fpsyg.2021.702166 2021

[6] [6]

1992.Disabling Imagery and the Media: An Exploration of the Principles for Media Representations of Disabled People

Colin Barnes and British Council of Organizations of Disabled People. 1992.Disabling Imagery and the Media: An Exploration of the Principles for Media Representations of Disabled People. BCODP. 60 pages. https://books.google. it/books?id=iXIeNAAACAAJ

work page 1992

[7] [7]

Zou, Venkatesh Saligrama, and Adam Tauman Kalai

Tolga Bolukbasi, Kai-Wei Chang, James Y . Zou, Venkatesh Saligrama, and Adam Tauman Kalai. 2016. Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. InAdvances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain, Daniel D. Lee, Ma...

work page 2016

[8] [8]

Marco Bombieri and Marco Rospocher. 2025. Mining Impersonification Bias in LLMs via Survey Filling.Information 16, 11 (2025). doi:10.3390/info16110931

work page doi:10.3390/info16110931 2025

[9] [9]

2009.Contours of Ableism: The Production of Disability and Abledness

Fiona Kumari Campbell. 2009.Contours of Ableism: The Production of Disability and Abledness. Palgrave Macmillan London. 231 pages. doi:10.1057/9780230245181

work page doi:10.1057/9780230245181 2009

[10] [10]

Myra Cheng, Esin Durmus, and Dan Jurafsky. 2023. Marked Personas: Using Natural Language Prompts to Measure Stereotypes in Language Models. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, Anna Rogers, Jordan L. Boyd-Graber, and Naoaki Okazaki (Ed...

work page doi:10.18653/v1/2023.acl- 2023

[11] [11]

Zhibo Chu, Zichong Wang, and Wenbin Zhang. 2024. Fairness in Large Language Models: A Taxonomic Survey. SIGKDD Explor. Newsl.26, 1 (July 2024), 34–48. doi:10.1145/3682112.3682117

work page doi:10.1145/3682112.3682117 2024

[12] [12]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. InProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Jill Burstein, Christy ...

work page doi:10.18653/v1/n19-1423 2019

[13] [13]

Mark Díaz, Isaac Johnson, Amanda Lazar, Anne Marie Piper, and Darren Gergle. 2019. Addressing Age-Related Bias in Sentiment Analysis. InProceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, August 10-16, 2019, Sarit Kraus (Ed.). ijcai.org, 6146–6150. doi:10.24963/IJCAI.2019/852

work page doi:10.24963/ijcai.2019/852 2019

[14] [14]

Ricardo Fitas. 2025. Inclusive education with AI: supporting special needs and tackling language barriers.AI and Ethics (2025). doi:10.1007/s43681-025-00824-3

work page doi:10.1007/s43681-025-00824-3 2025

[15] [16]

I wouldn’t say offensive but

Vinitha Gadiraju, Shaun K. Kane, Sunipa Dev, Alex S. Taylor, Ding Wang, Emily Denton, and Robin Brewer. 2023. "I wouldn’t say offensive but...": Disability-Centered Perspectives on Large Language Models. InProceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, FAccT 2023, Chicago, IL, USA, June 12-15, 2023. ACM, 205–216. doi...

work page doi:10.1145/3593013.3593989 2023

[16] [17]

Gallegos, Ryan A

Isabel O. Gallegos, Ryan A. Rossi, Joe Barrow, Md Mehrab Tanjim, Sungchul Kim, Franck Dernoncourt, Tong Yu, Ruiyi Zhang, and Nesreen K. Ahmed. 2024. Bias and Fairness in Large Language Models: A Survey.Computa- tional Linguistics50, 3 (09 2024), 1097–1179. doi:10.1162/coli_a_00524 arXiv:https://direct.mit.edu/coli/article- pdf/50/3/1097/2471010/coli_a_00524.pdf

work page doi:10.1162/coli_a_00524 2024

[17] [18]

Nikhil Garg, Londa Schiebinger, Dan Jurafsky, and James Zou. 2018. Word embeddings quantify 100 years of gender and ethnic stereotypes.Proc. Natl. Acad. Sci. USA115, 16 (2018), E3635–E3644. doi:10.1073/PNAS.1720347115

work page doi:10.1073/pnas.1720347115 2018

[18] [19]

Glazko, Yusuf Mohammed, Ben Kosa, Venkatesh Potluri, and Jennifer Mankoff

Kate S. Glazko, Yusuf Mohammed, Ben Kosa, Venkatesh Potluri, and Jennifer Mankoff. 2024. Identifying and Improving Disability Bias in GPT-Based Resume Screening. InThe 2024 ACM Conference on Fairness, Accountability, and Transparency, FAccT 2024, Rio de Janeiro, Brazil, June 3-6, 2024. ACM, 687–700. doi:10.1145/3630106.3658933

work page doi:10.1145/3630106.3658933 2024

[19] [20]

Jan Grue. 2015. The Problem of the Supercrip: Representation and Misrepresentation of Disability.Disability Research Today: International Perspectives(01 2015), 204–218

work page 2015

[20] [21]

Jan Grue. 2016. The problem with inspiration porn: a tentative definition and a provisional critique.Disability & Society 31, 6 (2016), 838–849. doi:10.1080/09687599.2016.1205473

work page doi:10.1080/09687599.2016.1205473 2016

[21] [23]

Saad Hassan, Matt Huenerfauth, and Cecilia Ovesdotter Alm. 2021. Unpacking the Interdependent Systems of Discrimination: Ableist Bias in NLP Systems through an Intersectional Lens. InFindings of the Association for Computational Linguistics: EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 16-20 November, 2021, Marie-Francine Moens, Xuanjing Hu...

work page doi:10.18653/v1/2021.findings-emnlp.267 2021

[22] [24]

Brienna Herold, James Waller, and Raja Kushalnagar. 2022. Applying the Stereotype Content Model to assess disability bias in popular pre-trained NLP models underlying AI-based assistive technologies. InNinth Workshop on Speech and Language Processing for Assistive Technologies (SLPAT-2022), Sarah Ebling, Emily Prud’hommeaux, and Preethi Vaidyanathan (Eds....

work page doi:10.18653/v1/2022.slpat-1.8 2022

[23] [25]

Tiancheng Hu and Nigel Collier. 2024. Quantifying the Persona Effect in LLM Simulations. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2024, Bangkok, Thailand, August 11-16, 2024, Lun-Wei Ku, Andre Martins, and Vivek Srikumar (Eds.). Association for Computational Linguistics, 10289–1...

work page doi:10.18653/v1/2024.acl-long.554 2024

[24] [26]

Ben Hutchinson, Vinodkumar Prabhakaran, Emily Denton, Kellie Webster, Yu Zhong, and Stephen Denuyl. 2020. Social Biases in NLP Models as Barriers for Persons with Disabilities. InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel R...

work page doi:10.18653/v1/2020 2020

[25] [27]

Hutto and Eric Gilbert

Clayton J. Hutto and Eric Gilbert. 2014. V ADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text. InProceedings of the Eighth International Conference on Weblogs and Social Media, ICWSM 2014, Ann Arbor, Michigan, USA, June 1-4, 2014, Eytan Adar, Paul Resnick, Munmun De Choudhury, Bernie Hogan, and Alice Oh (Eds.). The AAAI Press

work page 2014

[26] [29]

Gauri Kambhatla, Ian Stewart, and Rada Mihalcea. 2022. Surfacing Racial Stereotypes through Identity Portrayal. In FAccT ’22: 2022 ACM Conference on Fairness, Accountability, and Transparency, Seoul, Republic of Korea, June 21 - 24, 2022. ACM, 1604–1615. doi:10.1145/3531146.3533217

work page doi:10.1145/3531146.3533217 2022

[27] [30]

Mohammad

Svetlana Kiritchenko and Saif M. Mohammad. 2018. Examining Gender and Race Bias in Two Hundred Senti- ment Analysis Systems. InProceedings of the Seventh Joint Conference on Lexical and Computational Semantics, *SEM@NAACL-HLT 2018, New Orleans, Louisiana, USA, June 5-6, 2018, Malvina Nissim, Jonathan Berant, and Shiny Stories, Hidden Struggles: Investigat...

work page doi:10.18653/v1/s18-2005 2018

[28] [31]

Richard Landis and Gary G

J. Richard Landis and Gary G. Koch. 1977. The Measurement of Observer Agreement for Categorical Data.Biometrics 33, 1 (1977)

work page 1977

[29] [32]

François Ledoyen, Gaël Dias, Jeremie Pantin, Alexis Lechervy, Fabrice Maurel, and Youssef Chahir. 2025. Facilitating Cognitive Accessibility with LLMs: A Multi-Task Approach to Easy-to-Read Text Generation. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Ros...

work page 2025

[30] [33]

Rong Li, Ashwini Kamaraj, Jing Ma, and Sarah Ebling. 2024. Decoding Ableism in Large Language Models: An Intersectional Approach. InProceedings of the Third Workshop on NLP for Positive Impact, Daryna Dementieva, Oana Ignat, Zhijing Jin, Rada Mihalcea, Giorgio Piatti, Joel Tetreault, Steven Wilson, and Jieyu Zhao (Eds.). Association for Computational Ling...

work page doi:10.18653/v1/2024.nlp4pi-1.22 2024

[31] [35]

Elham Madjidi and Christopher Crick. 2025. Towards Inclusive Reading: A Neural Text Generation Framework for Dyslexia Accessibility. InProceedings of the 11th International Conference on Software Development and Technologies for Enhancing Accessibility and Fighting Info-Exclusion (DSAI ’24). Association for Computing Machinery, New York, NY , USA, 360–367...

work page doi:10.1145/3696593.3696625 2025

[32] [36]

Thomas Manzini, Lim Yao Chong, Alan W Black, and Yulia Tsvetkov. 2019. Black is to Criminal as Caucasian is to Police: Detecting and Removing Multiclass Bias in Word Embeddings. InProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers),...

work page doi:10.18653/v1/n19-1062 2019

[33] [37]

Michal Mˇechura. 2022. A Taxonomy of Bias-Causing Ambiguities in Machine Translation. InProceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP), Christian Hardmeier, Christine Basta, Marta R. Costa-jussà, Gabriel Stanovsky, and Hila Gonen (Eds.). Association for Computational Linguistics, Seattle, Washington, 168–173. doi:10...

work page doi:10.18653/v1/2022.gebnlp-1.18 2022

[34] [38]

Katelyn Mei, Sonia Fereidooni, and Aylin Caliskan. 2023. Bias Against 93 Stigmatized Groups in Masked Language Models and Downstream Sentiment Classification Tasks. InProceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, FAccT 2023, Chicago, IL, USA, June 12-15, 2023. ACM, 1699–1710. doi:10.1145/ 3593013.3594109

work page arXiv 2023

[35] [39]

Mohammad and Peter D

Saif M. Mohammad and Peter D. Turney. 2013. Crowdsourcing a Word-Emotion Association Lexicon.Comput. Intell. 29, 3 (2013), 436–465. https://doi.org/10.1111/j.1467-8640.2012.00460.x

work page doi:10.1111/j.1467-8640.2012.00460.x 2013

[36] [40]

#DisabledOnIn- dianTwitter

Ishani Mondal, Sukhnidh Kaur, Kalika Bali, Aditya Vashistha, and Manohar Swaminathan. 2022. "#DisabledOnIn- dianTwitter" : A Dataset towards Understanding the Expression of People with Disabilities on Indian Twitter. In Findings of the Association for Computational Linguistics: AACL-IJCNLP 2022, Online only, November 20-23, 2022, Yulan He, Heng Ji, Yang L...

work page 2022

[37] [41]

Monroe, Michael P

Burt L. Monroe, Michael P. Colaresi, and Kevin M. Quinn. 2017. Fightin’ Words: Lexical Feature Selection and Evaluation for Identifying the Content of Political Conflict.Political Analysis16, 4 (2017), 372–403. doi:10.1093/pan/ mpn018

work page doi:10.1093/pan/ 2017

[38] [42]

1990.Politics of Disablement

Michael Oliver. 1990.Politics of Disablement. Red Globe Press London, London. 152 pages. doi:10.1007/978-1-349- 20895-1

work page doi:10.1007/978-1-349- 1990

[39] [43]

World Health Organization. 2023. World Health Organization - Disability. https://www.who.int/health-topics/disability. Accessed: 2025-01-13

work page 2023

[40] [44]

Srikant Panda, Amit Agarwal, and Hitesh Laxmichand Patel. 2025. AccessEval: Benchmarking Disability Bias in Large Language Models. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng (Eds.). Association for Computational Linguistics, Suzhou...

work page 2025

[41] [45]

Alicia Parrish, Angelica Chen, Nikita Nangia, Vishakh Padmakumar, Jason Phang, Jana Thompson, Phu Mon Htut, and Samuel R. Bowman. 2022. BBQ: A hand-built bias benchmark for question answering. InFindings of the Association for Computational Linguistics: ACL 2022, Dublin, Ireland, May 22-27, 2022, Smaranda Muresan, Preslav Nakov, and Aline Villavicencio (E...

work page doi:10.18653/v1/2022.findings- 2022

[42] [46]

Rafał Po´swiata and Michał Perełkiewicz. 2022. OPI@LT-EDI-ACL2022: Detecting Signs of Depression from Social Media Text using RoBERTa Pre-trained Language Models. InProceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion. Association for Computational Linguistics, Dublin, Ireland, 276–282. doi:10.18653/v1/2022.ltedi-1.40

work page doi:10.18653/v1/2022.ltedi-1.40 2022

[43] [47]

Rebecca Qian, Candace Ross, Jude Fernandes, Eric Michael Smith, Douwe Kiela, and Adina Williams. 2022. Perturba- tion Augmentation for Fairer NLP. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (Eds.). Association for Computational Linguistics, Abu Dhabi, United Ara...

work page doi:10.18653/v1/2022.emnlp-main.646 2022

[44] [48]

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language Models are Unsupervised Multitask Learners.OpenAI(2019)

work page 2019

[45] [49]

Abel Salinas, Parth Shah, Yuzhong Huang, Robert McCormack, and Fred Morstatter. 2023. The Unequal Opportunities of Large Language Models: Examining Demographic Biases in Job Recommendations by ChatGPT and LLaMA. InProceedings of the 3rd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization. Association for Computing Machinery, Ne...

work page doi:10.1145/3617694.3623257 2023

[46] [50]

Emily Sheng, Kai-Wei Chang, Prem Natarajan, and Nanyun Peng. 2021. Societal Biases in Language Generation: Progress and Challenges. InProceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, Au...

work page doi:10.18653/v1/2021.acl-long.330 2021

[47] [51]

I‘m sorry to hear that

Eric Michael Smith, Melissa Hall, Melanie Kambadur, Eleonora Presani, and Adina Williams. 2022. “I‘m sorry to hear that”: Finding New Biases in Language Models with a Holistic Descriptor Dataset. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (Eds.). Association for...

work page doi:10.18653/v1/2022.emnlp- 2022

[48] [52]

Karthik Sreedhar and Lydia B. Chilton. 2024. Simulating Human Strategic Behavior: Comparing Single and Multi-agent LLMs.CoRRabs/2402.08189 (2024). doi:10.48550/ARXIV .2402.08189

work page internal anchor Pith review doi:10.48550/arxiv 2024

[49] [53]

Grzegorz Szumski, Joanna Smogorzewska, and Paweł Grygiel. 2020. Attitudes of students toward people with disabilities, moral identity and inclusive education—A two-level analysis.Research in Developmental Disabilities102 (2020), 103685. doi:10.1016/j.ridd.2020.103685

work page doi:10.1016/j.ridd.2020.103685 2020

[50] [54]

Nicholas Tilmes. 2022. Disability, fairness, and algorithmic bias in AI recruitment.Ethics Inf. Technol.24, 2 (2022), 21. doi:10.1007/S10676-022-09633-2

work page doi:10.1007/s10676-022-09633-2 2022

[51] [55]

Laura VanPuymbrouck, Carli Friedman, and Heather Ann Feldner. 2020. Explicit and implicit disability attitudes of healthcare providers.Rehabilitation psychology(2020)

work page 2020

[52] [56]

Pranav Narayanan Venkit, Mukund Srinath, and Shomir Wilson. 2022. A Study of Implicit Bias in Pretrained Language Models against People with Disabilities. InProceedings of the 29th International Conference on Computational Linguistics, COLING 2022, Gyeongju, Republic of Korea, October 12-17, 2022, Nicoletta Calzolari, Chu-Ren Huang, Hansaem Kim, James Pus...

work page 2022

[53] [57]

Vijay Viswanathan, Kiril Gashteovski, Kiril Gashteovski, Carolin Lawrence, Tongshuang Wu, and Graham Neu- big. 2024. Large Language Models Enable Few-Shot Clustering.Transactions of the Association for Com- putational Linguistics12 (04 2024), 321–333. doi:10.1162/tacl_a_00648 arXiv:https://direct.mit.edu/tacl/article- pdf/doi/10.1162/tacl_a_00648/2362202/...

work page doi:10.1162/tacl_a_00648 2024

[54] [58]

1" or "2

Zoe Wyatt. 2024. The Dark Side of #PositiveVibes: Understanding Toxic Positivity in Modern Culture.Psychiatry and Behavioral Health3 (09 2024), 1–6. Shiny Stories, Hidden Struggles: Investigating the Representation of Disability Through the Lens of LLMs 25 A Appendix A.1 Prompt used for preprocessing the REDDdataset To extract from subreddits only posts w...

work page 2024