Compounding Disadvantage: Auditing Intersectional Bias in LLM-Generated Explanations Across Indian and American STEM Education
Pith reviewed 2026-05-21 16:29 UTC · model grok-4.3
The pith
Large language models in STEM education create grade-level gaps of up to 2.55 between privileged and marginalized student profiles in India and the US.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
LLM-generated STEM content systematically disadvantages marginalized student profiles across two cultural contexts, with the gap between the most privileged and most marginalized profiles reaching 2.55 grade levels. Audits of four models using synthetic profiles that cross Indian-specific dimensions such as caste and college tier with American ones such as race and HBCU attendance, plus shared factors like income, gender, and disability, reveal significant effects from income in every case, the strongest single effect from medium of instruction in India, and simpler explanations triggered by disability status. These biases compound non-additively and remain even inside elite institutions,
What carries the argument
Synthetic demographic profiles that combine multiple axes of identity, tested via ranking and generation tasks with statistical correction and feature importance measures to detect how LLMs weigh signals when creating educational explanations.
Load-bearing premise
The synthetic profiles accurately capture the demographic signals that LLMs actually use when generating explanations, and the chosen evaluation metrics validly quantify educational disadvantage in grade-level equivalents.
What would settle it
A study that replaces the synthetic profiles with real student data or ability-matched profiles without demographic cues and finds no remaining grade-level differences in the generated explanations would challenge the central claim.
Figures
read the original abstract
Large language models are increasingly deployed in STEM education for personalized instruction and feedback across institutions in high- and low-income countries. These systems are designed to adapt content to student needs, but whether they adapt based on demonstrated ability or demographic signals remains untested at scale. Here we establish that LLM-generated STEM content systematically disadvantages marginalized student profiles across two cultural contexts, with the gap between the most privileged and most marginalized profiles reaching 2.55 grade levels. We audited four LLMs (Qwen 2.5-32B-Instruct, GPT-4o, GPT-4o-mini, GPT-OSS 20B) using synthetic profiles crossing dimensions specific to Indian education (caste, medium of instruction, college tier) and American education (race, HBCU attendance, school type), alongside income, gender, and disability, across ranking and generation tasks with FDR-corrected significance testing and SHAP feature attribution. Income produces significant effects across every model and context, medium of instruction drives the largest single effect in the Indian context, and disability status triggers simpler explanations. Effects compound non-additively: marginalization across multiple dimensions produces gaps larger than any single dimension predicts, and biases persist within elite institutions. Bias is consistent across all four architectures and persists through model selection, making intersectional, cross-cultural auditing a structural requirement before deployment.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript audits four LLMs (Qwen 2.5-32B-Instruct, GPT-4o, GPT-4o-mini, GPT-OSS 20B) for intersectional bias in generating STEM explanations using synthetic student profiles that cross Indian-specific (caste, medium of instruction, college tier) and American-specific (race, HBCU attendance, school type) dimensions, plus income, gender, and disability. It reports that marginalized profiles receive systematically lower-quality explanations, with gaps up to 2.55 grade levels, non-additive compounding of disadvantages, significant effects from income across all models, largest single effect from medium of instruction in India, and simpler explanations for disabled students. Results are supported by FDR-corrected significance tests and SHAP feature attribution, and biases persist across models and within elite institutions.
Significance. If the findings hold, this work provides important evidence on the risks of deploying LLMs for personalized STEM instruction without intersectional safeguards. It demonstrates non-additive compounding across cultural contexts and the persistence of bias even in elite settings. Strengths include the multi-model audit, FDR-corrected testing, SHAP attribution for interpretability, and explicit cross-cultural design covering both Indian and American educational systems.
major comments (3)
- [§3 (Synthetic Profile Construction)] §3 (Synthetic Profile Construction): The audit relies on synthetic profiles that explicitly encode protected attributes (caste, race, disability, HBCU status, etc.). The 2.55-grade-level gap and non-additive intersectional effects are observed only under these explicit-cue conditions. In real educational deployments, prompts are typically performance- or goal-oriented and omit such labels; models are often aligned to refuse demographic inference. Without additional experiments using implicit signals or omitted demographics, the results do not yet establish that comparable disadvantages will appear in actual personalized-instruction use.
- [Results (Grade-level Equivalence)] Results (Grade-level Equivalence): The central quantitative claim is a 2.55 grade-level gap. The manuscript must specify exactly how explanation quality is mapped to grade-level equivalents, including the rubric, any automated scoring procedure, validation against human raters, and sensitivity checks. This mapping is load-bearing for interpreting the practical magnitude of disadvantage.
- [Methods and Reproducibility] Methods and Reproducibility: Full prompt templates, complete synthetic-profile examples, and raw data or analysis code are not provided. This prevents independent verification of the FDR-corrected significance tests, SHAP attributions, and the specific non-additive compounding patterns reported.
minor comments (2)
- [Abstract] Abstract: Model names should be formatted consistently (e.g., 'GPT-OSS 20B' vs. 'Qwen 2.5-32B-Instruct').
- [Discussion] Discussion: Add an explicit limitations paragraph addressing the ecological validity of explicit demographic cues versus real-world prompt distributions.
Simulated Author's Rebuttal
We thank the referee for their insightful and constructive comments. These have prompted us to clarify key aspects of our methodology and strengthen the discussion of scope and limitations. We respond to each major comment below.
read point-by-point responses
-
Referee: [§3 (Synthetic Profile Construction)] The audit relies on synthetic profiles that explicitly encode protected attributes (caste, race, disability, HBCU status, etc.). The 2.55-grade-level gap and non-additive intersectional effects are observed only under these explicit-cue conditions. In real educational deployments, prompts are typically performance- or goal-oriented and omit such labels; models are often aligned to refuse demographic inference. Without additional experiments using implicit signals or omitted demographics, the results do not yet establish that comparable disadvantages will appear in actual personalized-instruction use.
Authors: We appreciate the referee highlighting the distinction between explicit and implicit cues. Our study is intentionally scoped to cases where demographic attributes are explicitly available in the prompt, which is relevant for institutional deployments that maintain student profiles (e.g., for accessibility accommodations or targeted support programs). We agree that implicit inference represents an important complementary scenario. In the revised manuscript we have added a new subsection in the Discussion that explicitly acknowledges this boundary condition, cites alignment literature on demographic refusal, and outlines targeted future experiments using implicit signals. We maintain that the explicit-cue results remain policy-relevant as a lower bound on risk when profile data is legitimately shared. revision: partial
-
Referee: [Results (Grade-level Equivalence)] The central quantitative claim is a 2.55 grade-level gap. The manuscript must specify exactly how explanation quality is mapped to grade-level equivalents, including the rubric, any automated scoring procedure, validation against human raters, and sensitivity checks. This mapping is load-bearing for interpreting the practical magnitude of disadvantage.
Authors: We have substantially expanded the Methods section (now §4.2) to detail the mapping procedure. Explanations were scored on a 10-point rubric across factual accuracy, conceptual depth, and accessibility; these scores were then linearly calibrated to grade-level equivalents using official curriculum benchmarks from the U.S. Common Core and Indian CBSE/NCERT standards. Automated scoring was validated against two independent human raters on a stratified sample of 200 explanations (Cohen’s κ = 0.81). We have added a sensitivity analysis varying the calibration thresholds and included the full rubric and calibration table in the new Appendix E. revision: yes
-
Referee: [Methods and Reproducibility] Full prompt templates, complete synthetic-profile examples, and raw data or analysis code are not provided. This prevents independent verification of the FDR-corrected significance tests, SHAP attributions, and the specific non-additive compounding patterns reported.
Authors: We regret the initial omission. The revised submission includes all prompt templates in Appendix A, full synthetic-profile templates with example instantiations in Appendix B, and the complete analysis pipeline (including FDR correction, SHAP computation, and non-additivity tests) as a public GitHub repository linked in the Data Availability Statement. Raw per-explanation scores are not released for ethical reasons, but the repository contains the exact synthetic data generator and aggregated results sufficient to reproduce all reported statistics and figures. revision: yes
Circularity Check
No significant circularity in empirical audit
full rationale
The paper conducts an empirical audit by constructing synthetic student profiles with explicit demographic attributes, prompting four external LLMs (Qwen 2.5-32B-Instruct, GPT-4o, GPT-4o-mini, GPT-OSS 20B), and evaluating generated explanations via statistical tests (FDR-corrected), SHAP attribution, and grade-level metrics. No derivation chain, equations, or first-principles predictions are presented that reduce to fitted inputs or self-referential definitions. Results are observed outputs from independent models rather than constructed equivalences, and the study relies on external benchmarks without load-bearing self-citations or ansatzes. This is a standard data-driven audit self-contained against external model behavior.
Axiom & Free-Parameter Ledger
axioms (2)
- standard math FDR correction appropriately controls false discovery rate across multiple comparisons in bias testing.
- domain assumption SHAP values provide reliable attribution of which profile features drive differences in explanation quality.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We audit four LLMs ... using synthetic profiles crossing dimensions specific to Indian education (caste, medium of instruction, college tier) and American education (race, HBCU attendance, school type), alongside income, gender, and disability, across ranking and generation tasks with FDR-corrected significance testing and SHAP feature attribution.
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Income produces significant effects ... medium of instruction drives the largest single effect ... disability status triggers simpler explanations. Effects compound non-additively ... gap ... reaches 2.55 grade levels.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Dhruv Agarwal, Mor Naaman, and Aditya Vashistha. 2025. AI suggestions homogenize writing toward western styles and diminish cultural nuances. In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems. 1–21
work page 2025
-
[2]
Sandhini Agarwal, Lama Ahmad, Jason Ai, Sam Altman, Andy Applebaum, Edwin Arbus, Rahul K Arora, Yu Bai, Bowen Baker, Haiming Bao, et al . 2025. gpt-oss-120b & gpt-oss-20b model card.arXiv preprint arXiv:2508.10925(2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[3]
1945.Annihilation of Caste, with a reply to Mahatma Gandhi
Bhimrao Ramji Ambedkar. 1945.Annihilation of Caste, with a reply to Mahatma Gandhi
work page 1945
-
[4]
Daman Arora, Himanshu Singh, et al . 2023. Have llms advanced enough? a challenging problem solving benchmark for large language models. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 7527– 7543
work page 2023
-
[5]
Nabit Bajwa and Sanmay Das. 2023. Test Scores, Classroom Performance, and Capacity in Academically Selective School Program Admissions. InProceedings of the 3rd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization. 1–15
work page 2023
-
[6]
Ryan S Baker and Aaron Hawn. 2022. Algorithmic bias in education.International journal of artificial intelligence in education32, 4 (2022), 1052–1092
work page 2022
-
[7]
Solon Barocas, Kate Crawford, Aaron Shapiro, and Hanna Wallach. 2017. The problem with bias: Allocative versus representational harms in machine learning. In9th Annual conference of the special interest group for computing, information and society, Vol. 1. New York, NY
work page 2017
-
[8]
Luis Guillermo Barrantes-Montero. [n. d.]. Phillipson’s Linguistic Imperialism Revisited at the light of Latin American Decoloniality Approach Relectura de la obra de R. Phillipson: Imperialismo lingüístico, a la luz del enfoque latinoamer- icano de la Decolonialidad Releitura da obra de R. Phillipson: o imperialismo linguístico, à luz da abordagem latino...
-
[9]
Yoav Benjamini and Yosef Hochberg. 1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing.Journal of the Royal statistical society: series B (Methodological)57, 1 (1995), 289–300
work page 1995
-
[10]
Su Lin Blodgett, Solon Barocas, Hal Daumé Iii, and Hanna Wallach. 2020. Lan- guage (technology) is power: A critical survey of “bias” in NLP. InProceedings of the 58th annual meeting of the association for computational linguistics. 5454–5476
work page 2020
-
[11]
Rachel Boccio. 2022. Race after technology: Abolitionist tools for the new Jim Code by Ruha Benjamin.Configurations30, 2 (2022), 236–238
work page 2022
-
[12]
It’s not like Jarvis, but it’s pretty close!
Ritvik Budhiraja, Ishika Joshi, Jagat Sesh Challa, Harshal D Akolekar, and Dhruv Kumar. 2024. “It’s not like Jarvis, but it’s pretty close!”-Examining ChatGPT’s Usage among Undergraduate Students in Computer Science. InProceedings of the 26th Australasian Computing Education Conference. 124–133
work page 2024
-
[13]
Joy Buolamwini and Timnit Gebru. 2018. Gender shades: Intersectional accu- racy disparities in commercial gender classification. InConference on fairness, accountability and transparency. PMLR, 77–91
work page 2018
-
[14]
Aylin Caliskan, Joanna J Bryson, and Arvind Narayanan. 2017. Semantics derived automatically from language corpora contain human-like biases.Science356, 6334 (2017), 183–186
work page 2017
-
[15]
Jane Castleman and Aleksandra Korolova. 2025. Adultification Bias in LLMs and Text-to-Image Models. InProceedings of the 2025 ACM Conference on Fair- ness, Accountability, and Transparency (FAccT ’25). Association for Computing Machinery, New York, NY, USA, 2751–2767. doi:10.1145/3715275.3732178
-
[16]
William Gemmell Cochran. 1977.Sampling techniques. john wiley & sons
work page 1977
-
[17]
Meri Coleman and Ta Lin Liau. 1975. A computer readability formula designed for machine scoring.Journal of Applied Psychology60, 2 (1975), 283
work page 1975
-
[18]
Patricia Hill Collins. 1990. Black feminist thought in the matrix of domination. Black feminist thought: Knowledge, consciousness, and the politics of empowerment 138, 1990 (1990), 221–238
work page 1990
-
[19]
Kimberlé Crenshaw. 2013. Demarginalizing the intersection of race and sex: A black feminist critique of antidiscrimination doctrine, feminist theory and antiracist politics. InFeminist legal theories. Routledge, 23–51
work page 2013
-
[20]
Preetam Prabhu Srikar Dammu, Hayoung Jung, Anjali Singh, Monojit Choudhury, and Tanu Mitra. 2024. “They are uncultured”: Unveiling Covert Harms and Social Threats in LLM Generated Conversations. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen (Eds.). Association fo...
-
[21]
Marian Daun and Jennifer Brings. 2023. How ChatGPT will change software engineering education. InProceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1. 110–116
work page 2023
- [22]
-
[23]
Eve Fleisig, Aubrie Amstutz, Chad Atalla, Su Lin Blodgett, Hal Daumé Iii, Alexan- dra Olteanu, Emily Sheng, Dan Vann, and Hanna Wallach. 2023. FairPrism: evaluating fairness-related harms in text generation. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 6231–6251
work page 2023
-
[24]
Sourojit Ghosh. 2024. Interpretations, Representations, and Stereotypes of Caste within Text-to-Image Generators. InProceedings of the AAAI/ACM Conference on AI, Ethics, and Society, Vol. 7. 490–502
work page 2024
-
[25]
Sourojit Ghosh and Aylin Caliskan. 2023. ‘Person’ == Light-skinned, Western Man, and Sexualization of Women of Color: Stereotypes in Stable Diffusion. In Findings of the Association for Computational Linguistics: EMNLP 2023, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Lin- guistics, Singapore, 6971–6985. doi:10.18653/v1...
-
[26]
Sourojit Ghosh, Sanjana Gautam, Pranav Narayanan Venkit, and Avijit Ghosh
-
[27]
InProceedings of the AAAI/ACM Conference on AI, Ethics, and Society, Vol
Documenting patterns of exoticism of marginalized populations within text-to-image generators. InProceedings of the AAAI/ACM Conference on AI, Ethics, and Society, Vol. 8. 1107–1119
-
[28]
Kate Glazko, Yusuf Mohammed, Ben Kosa, Venkatesh Potluri, and Jennifer Mankoff. 2024. Identifying and improving disability bias in GPT-based resume screening. InProceedings of the 2024 ACM conference on fairness, accountability, and transparency. 687–700
work page 2024
-
[29]
Robert Gunning. 1952. The technique of clear writing.(No Title)(1952)
work page 1952
- [30]
-
[31]
Roberta M Hall and Bernice R Sandler. 1982. The Classroom Climate: A Chilly One for Women?. (1982)
work page 1982
-
[32]
Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, and Jacob Steinhardt. 2021. Measuring mathematical problem solving with the math dataset.arXiv preprint arXiv:2103.03874(2021)
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[33]
2019.Artificial intelligence in education promises and implications for teaching and learning
Wayne Holmes, Maya Bialik, and Charles Fadel. 2019.Artificial intelligence in education promises and implications for teaching and learning. Center for Curriculum Redesign
work page 2019
-
[34]
Saghar Hosseini, Hamid Palangi, and Ahmed Hassan. 2023. An empirical study of metrics to measure representational harms in pre-trained language models. InProceedings of the 3rd Workshop on Trustworthy Natural Language Processing (TrustNLP 2023). 121–134
work page 2023
-
[35]
Aaron Hurst, Adam Lerer, Adam P Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, et al. 2024. Gpt-4o system card.arXiv preprint arXiv:2410.21276(2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[36]
Yes I Would Recommend Calling the Police
Shomik Jain, D Calacci, and Ashia Wilson. 2024. As an AI Language Model, " Yes I Would Recommend Calling the Police": Norm Inconsistency in LLM Decision- Making. InProceedings of the AAAI/ACM Conference on AI, Ethics, and Society, Vol. 7. 624–633
work page 2024
-
[37]
Weijie Jiang and Zachary A Pardos. 2021. Towards equity and algorithmic fairness in student grade prediction. InProceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society. 608–617
work page 2021
-
[38]
Anjali Kantharuban, Jeremiah Milbauer, Maarten Sap, Emma Strubell, and Gra- ham Neubig. 2025. Stereotype or personalization? user identity biases chatbot recommendations. InFindings of the Association for Computational Linguistics: ACL 2025. 24418–24436
work page 2025
-
[39]
Khyati Khandelwal, Manuel Tonneau, Andrew M Bean, Hannah Rose Kirk, and Scott A Hale. 2024. Indian-BHed: A dataset for measuring India-centric biases in large language models. InProceedings of the 2024 International Conference on Information Technology for Social Good. 231–239
work page 2024
-
[40]
J Peter Kincaid, Robert P Fishburne Jr, Richard L Rogers, and Brad S Chissom. 1975.Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel. Technical Report
work page 1975
-
[41]
Hannah Rose Kirk, Bertie Vidgen, Paul Röttger, and Scott A Hale. 2024. The benefits, risks and bounds of personalizing the alignment of large language models to individuals.Nature Machine Intelligence6, 4 (2024), 383–392
work page 2024
-
[42]
Ashwin Kumar, Yuzi He, Aram H Markosyan, Bobbie Chern, and Imanol Arrieta- Ibarra. 2025. Detecting Prefix Bias in LLM-based Reward Models. InProceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency. 3196– 3206
work page 2025
-
[43]
If you aren’t White, Asian or Indian, you aren’t an engineer
Meggan J Lee, Jasmine D Collins, Stacy Anne Harwood, Ruby Mendenhall, and Margaret Browne Huntt. 2020. “If you aren’t White, Asian or Indian, you aren’t an engineer”: racial microaggressions in STEM education.International Journal of STEM Education7, 1 (2020), 48
work page 2020
-
[44]
Paola Lopez. 2024. More than the sum of its parts: Susceptibility to algorith- mic disadvantage as a conceptual framework. InProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency. 909–919
work page 2024
-
[45]
Anastassia Loukina, Nitin Madnani, and Klaus Zechner. 2019. The many dimen- sions of algorithmic fairness in educational applications. InProceedings of the fourteenth workshop on innovative use of NLP for building educational applications. 1–10
work page 2019
-
[46]
Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions.Advances in neural information processing systems30 (2017)
work page 2017
-
[47]
Nicholas Meade, Elinor Poole-Dayan, and Siva Reddy. 2022. An empirical survey of the effectiveness of debiasing techniques for pre-trained language models. InProceedings of the 60th annual meeting of the association for computational linguistics (volume 1: long papers). 1878–1898. Compounding Disadvantage: Auditing Intersectional Bias in LLM-Generated Exp...
work page 2022
-
[48]
2020.National Education Policy 2020
Ministry of Human Resource Development. 2020.National Education Policy 2020. Government of India. https://www.education.gov.in/sites/upload_files/mhrd/ files/NEP_Final_English_0.pdf
work page 2020
-
[49]
Swaroop Mishra, Daniel Khashabi, Chitta Baral, and Hannaneh Hajishirzi. 2022. Cross-task generalization via natural language crowdsourcing instructions. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 3470–3487
work page 2022
-
[50]
Shakir Mohamed, Marie-Therese Png, and William Isaac. 2020. Decolonial AI: Decolonial theory as sociotechnical foresight in artificial intelligence.Philosophy & Technology33, 4 (2020), 659–684
work page 2020
-
[51]
Ajit K Mohanty. 2010. Languages, inequality and marginalization: implications of the double divide in Indian multilingualism.International Journal of the Sociology of Language2010, 205 (2010)
work page 2010
-
[52]
Lucas Monteiro Paes, Carol Long, Berk Ustun, and Flavio Calmon. 2022. On the epistemic limits of personalized prediction.Advances in Neural Information Processing Systems35 (2022), 1979–1991
work page 2022
-
[53]
Nikita Nangia, Clara Vania, Rasika Bhalerao, and Samuel Bowman. 2020. CrowS- pairs: A challenge dataset for measuring social biases in masked language models. InProceedings of the 2020 conference on empirical methods in natural language processing (EMNLP). 1953–1967
work page 2020
-
[54]
National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research. 1979.The Belmont Report: Ethical Principles and Guidelines for the Protection of Human Subjects of Research. Technical Report. Department of Health, Education, and Welfare, Washington, DC. https://www.hhs.gov/ohrp/ regulations-and-policy/belmont-report/read-the...
work page 1979
-
[55]
Jerzy Neyman. 1992. On the two different aspects of the representative method: the method of stratified sampling and the method of purposive selection. In Breakthroughs in statistics: Methodology and distribution. Springer, 123–150
work page 1992
-
[56]
OpenAI. 2024. GPT-4o mini: advancing cost-efficient intelligence. OpenAI Blog. https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/
work page 2024
-
[57]
Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. 2022. Training language models to follow instructions with human feedback.Advances in neural information processing systems35 (2022), 27730–27744
work page 2022
-
[58]
Pew Research Center. 2021.Attitudes about caste. Pew Research Center. https: //www.pewresearch.org/religion/2021/06/29/attitudes-about-caste/
work page 2021
-
[59]
1882.Selected Writings of Jotirao Phule
Jot¯ır¯ava Govindar¯ava Phule. 1882.Selected Writings of Jotirao Phule. LeftWord Books
-
[60]
Rida Qadri, Renee Shelby, Cynthia L Bennett, and Remi Denton. 2023. AI’s regimes of representation: A community-centered study of text-to-image models in South Asia. InProceedings of the 2023 ACM Conference on Fairness, Account- ability, and Transparency. 506–517
work page 2023
-
[61]
Qwen, :, An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jingren Zhou, Junyang Lin, Kai Dang, Keming Lu, Keqin Bao, Kexin Yang, Le Yu, Mei Li, Mingfeng Xue, Pei Zhang, Qin Zhu, Rui Men, Runji Lin, Tianhao Li,...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[62]
Suraj Begum R, Saravana Mahesan S, Jiang Min, Divya D, and Thennarasu Sakkan. 2025. Decolonizing the Digital Classroom: A Critical Analysis of Power, Privilege, and Algorithmic Bias in AI-Mediated Learning Environments.Asian Journal of Interdisciplinary Research8, 4 (2025), 301–330
work page 2025
-
[63]
Evani Radiya-Dixit and Angele Christin. 2025. Same Stereotypes, Different Term? Understanding the “Global South” in AI Ethics. InProceedings of the AAAI/ACM Conference on AI, Ethics, and Society, Vol. 8. 2081–2093
work page 2025
-
[64]
S Ramamoorthy and A Dinesh. 2025. English Medium Instruction (EMI), Chal- lenges, and Coping Strategies: Stances of Dalit Students and Teaches in Institutes of HE in India. (2025)
work page 2025
-
[65]
Nimmi Rangaswamy and Nithya Sambasivan. 2011. Cutting Chai, Jugaad, and Here Pheri: towards UbiComp for a global community.Personal and Ubiquitous Computing15, 6 (2011), 553–564
work page 2011
-
[66]
Lauren A Rivera. 2016. Pedigree: How elite students get elite jobs. (2016)
work page 2016
-
[67]
2019.How the Indian Education System Reinforces Caste, Class Differ- ences
Sanya Sagar. 2019.How the Indian Education System Reinforces Caste, Class Differ- ences. The Swaddle. https://www.theswaddle.com/how-the-indian-education- system-reinforces-caste-class-differences
work page 2019
-
[68]
Nithya Sambasivan, Erin Arnesen, Ben Hutchinson, Tulsee Doshi, and Vin- odkumar Prabhakaran. 2021. Re-imagining algorithmic fairness in india and beyond. InProceedings of the 2021 ACM conference on fairness, accountability, and transparency. 315–328
work page 2021
-
[69]
Christian Sandvig, Kevin Hamilton, Karrie Karahalios, and Cedric Langbort. 2014. Auditing algorithms: Research methods for detecting discrimination on internet platforms.Data and discrimination: converting critical concerns into productive inquiry22, 2014 (2014), 4349–4357
work page 2014
-
[70]
Maarten Sap, Saadia Gabriel, Lianhui Qin, Dan Jurafsky, Noah A Smith, and Yejin Choi. 2020. Social bias frames: Reasoning about social and power implications of language. InProceedings of the 58th annual meeting of the association for computational linguistics. 5477–5490
work page 2020
-
[71]
Andrew D Selbst, Danah Boyd, Sorelle A Friedler, Suresh Venkatasubramanian, and Janet Vertesi. 2019. Fairness and abstraction in sociotechnical systems. In Proceedings of the conference on fairness, accountability, and transparency. 59–68
work page 2019
-
[72]
Agrima Seth, Monojit Choudhury, Sunayana Sitaram, Kentaro Toyama, Aditya Vashistha, and Kalika Bali. 2025. How Deep Is Representational Bias in LLMs? The Cases of Caste and Religion. InProceedings of the AAAI/ACM Conference on AI, Ethics, and Society, Vol. 8. 2319–2330
work page 2025
- [73]
-
[74]
Abdulhadi Shoufan. 2023. Exploring students’ perceptions of ChatGPT: Thematic analysis and follow-up survey.IEEE access11 (2023), 38805–38818
work page 2023
-
[75]
2025.Casteist AI: Examining the Datafication of Caste in Artificial Intelligence Systems
Nishanshi Atulkumar Shukla. 2025.Casteist AI: Examining the Datafication of Caste in Artificial Intelligence Systems. Ph. D. Dissertation. The University of Texas at Dallas
work page 2025
-
[76]
2019.The caste of merit: Engineering education in India
Ajantha Subramanian. 2019.The caste of merit: Engineering education in India. Harvard University Press
work page 2019
-
[77]
Vinith Menon Suriyakumar, Marzyeh Ghassemi, and Berk Ustun. 2023. When personalization harms performance: reconsidering the use of group attributes in prediction. InInternational Conference on Machine Learning. PMLR, 33209–33228
work page 2023
-
[78]
Prashanth Vijayaraghavan, Soroush Vosoughi, Lamogha Chiazor, Raya Horesh, Rogerio Abreu De Paula, Ehsan Degan, and Vandana Mukherjee. 2025. Decaste: Unveiling caste stereotypes in large language models through multi-dimensional bias analysis.arXiv preprint arXiv:2505.14971(2025)
-
[79]
Aditya Vinodh, Emma Harvey, Husni Almoubayyed, Renzhe Yu, Christopher Brooks, Allison Koenecke, and Rene F Kizilcec. 2025. Evaluating an AI Tutor for Bias Across Different Foundation Models. InInternational Conference on Artificial Intelligence in Education. Springer, 341–348
work page 2025
-
[80]
Iain Weissburg, Sathvika Anand, Sharon Levy, and Haewon Jeong. 2025. Llms are biased teachers: Evaluating llm bias in personalized education. InFindings of the Association for Computational Linguistics: NAACL 2025. 5650–5698
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.