pith. sign in

arxiv: 2506.06816 · v2 · submitted 2025-06-07 · 💻 cs.CL · cs.CY· cs.HC

How do datasets, developers, and models affect biases in a low-resourced language?: The Case of the Bengali Language

Pith reviewed 2026-05-19 10:44 UTC · model grok-4.3

classification 💻 cs.CL cs.CYcs.HC
keywords Bengalisentiment analysisidentity biaslow-resource languagesalgorithmic auditgender biasreligion biasnatural language processing
0
0 comments X

The pith

Bengali sentiment analysis models show biases across gender, religion and nationality identities even when semantic content is similar.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether language-specific datasets and models reduce identity biases in natural language processing for low-resource languages. It builds and audits sentiment analysis systems for Bengali by fine-tuning mBERT and BanglaBERT on every available Bengali sentiment dataset located through Google Dataset Search. The audit finds that the resulting models still produce biased outputs toward gender, religion, and nationality categories. A sympathetic reader would care because these systems are already used in social media moderation, news recommendation, and customer service in Bengali-speaking regions, where biased outputs can compound existing marginalization.

Core claim

BSA models exhibit biases across different identity categories despite having similar semantic content and structure. The audit examined models fine-tuned on all Bengali sentiment analysis datasets, revealing that biases appear consistently for gender, religion, and nationality identities. The work also documents inconsistencies and uncertainties that arise when pre-trained models are paired with datasets created by developers from diverse demographic backgrounds.

What carries the argument

Algorithmic audit of fine-tuned sentiment analysis models that measures output differences across identity templates while holding semantic content constant.

If this is right

  • Biases remain even after following the common recommendation to use language-specific models and datasets for low-resource languages.
  • Mixing pre-trained models with datasets created by developers from varied backgrounds introduces measurable inconsistencies in bias patterns.
  • Methodological choices in how datasets are selected and how audits are conducted directly shape conclusions about epistemic injustice.
  • The same bias patterns may appear in other identity categories or in other low-resource languages when similar audits are performed.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The findings suggest that simply translating or localizing existing models is unlikely to eliminate identity biases without additional controls on data sources and developer demographics.
  • Future audits could compare results when dataset creators' backgrounds are explicitly matched or mismatched to test the contribution of developer identity.
  • The work implies that bias measurement in low-resource settings may need new template designs that better isolate cultural context from identity markers.

Load-bearing premise

The chosen Bengali sentiment datasets and the fine-tuning of mBERT and BanglaBERT are sufficient to separate the effects of datasets, developers, and models on identity biases without large interference from annotation quality or template design.

What would settle it

A replication that adds new Bengali datasets or different model families and finds no systematic output differences across the same gender, religion, and nationality identity categories would falsify the central claim.

Figures

Figures reproduced from arXiv: 2506.06816 by Bryan Semaan, Dipto Das, Shion Guha.

Figure 1
Figure 1. Figure 1: (a) Fine-tuning mBERT or BanglaBERT (B/W diagram in middle) with BSA datasets, (icon on left) to get fine-tuned language models (color diagram on right) (b) Auditing the fine-tuned Dx-mBERT or Dx-BanglaBERT models’ gender, religion, and nationality biases (First paragraph of this section lists the icons used for indicating different categories). like the pairwise t-test [126], otherwise a non-parametric eq… view at source ↗
Figure 2
Figure 2. Figure 2: Heatmap showing the directions of biases of the fine-tuned models based on PCR, i.e., in how many iterations a [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗
read the original abstract

Sociotechnical systems, such as language technologies, frequently exhibit identity-based biases. These biases exacerbate the experiences of historically marginalized communities and remain understudied in low-resource contexts. While models and datasets specific to a language or with multilingual support are commonly recommended to address these biases, this paper empirically tests the effectiveness of such approaches in the context of gender, religion, and nationality-based identities in Bengali, a widely spoken but low-resourced language. We conducted an algorithmic audit of sentiment analysis models built on mBERT and BanglaBERT, which were fine-tuned using all Bengali sentiment analysis (BSA) datasets from Google Dataset Search. Our analyses showed that BSA models exhibit biases across different identity categories despite having similar semantic content and structure. We also examined the inconsistencies and uncertainties arising from combining pre-trained models and datasets created by individuals from diverse demographic backgrounds. We connected these findings to the broader discussions on epistemic injustice, AI alignment, and methodological decisions in algorithmic audits.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper conducts an algorithmic audit of sentiment analysis models for Bengali, a low-resourced language. It fine-tunes mBERT and BanglaBERT on all Bengali sentiment analysis (BSA) datasets retrieved from Google Dataset Search, then evaluates the resulting models on identity-swapped templates targeting gender, religion, and nationality. The central empirical claim is that biases persist across these identity categories even when the test inputs have similar semantic content and structure; the authors further examine inconsistencies linked to developer demographics and connect the findings to epistemic injustice and AI alignment.

Significance. If the central attribution of biases holds after addressing potential confounds, the work is significant for NLP ethics and low-resource language research. It provides concrete evidence that language-specific pre-trained models and exhaustive use of available datasets do not automatically eliminate identity biases, and it highlights the role of developer backgrounds. The multi-axis audit (gender, religion, nationality) and explicit linkage to broader sociotechnical concerns are strengths; the empirical focus on a widely spoken but under-studied language fills a documented gap.

major comments (2)
  1. [§3 and §4.2] §3 (Dataset Curation) and §4.2 (Template Construction): the central claim that observed sentiment differences can be attributed to datasets, developers, or models rather than artifacts requires explicit controls. No inter-annotator agreement, label-distribution statistics, or creator-demographic metadata are reported for the BSA datasets; likewise, no embedding-similarity or human-rating validation is provided to confirm that identity-swapped templates preserve semantics. These omissions are load-bearing because uneven annotation quality or subtle wording differences could produce the reported prediction gaps.
  2. [§5] §5 (Results and Analysis): the reported inconsistencies across developer backgrounds are presented without statistical controls for dataset size or domain overlap. If larger or more homogeneous datasets systematically yield lower bias scores, the attribution to developer demographics alone is weakened; a regression or stratified analysis controlling for these factors would be needed to support the claim.
minor comments (2)
  1. [Table 1] Table 1 (dataset summary) would benefit from an additional column reporting the number of unique annotators or source URLs to aid reproducibility.
  2. [Abstract] The abstract states that templates have 'similar semantic content and structure,' but this phrasing is repeated without a forward reference to the validation procedure in §4.2; a brief cross-reference would improve readability.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for their insightful comments on our work. We provide point-by-point responses to the major comments below, indicating where revisions have been made to the manuscript.

read point-by-point responses
  1. Referee: [§3 and §4.2] §3 (Dataset Curation) and §4.2 (Template Construction): the central claim that observed sentiment differences can be attributed to datasets, developers, or models rather than artifacts requires explicit controls. No inter-annotator agreement, label-distribution statistics, or creator-demographic metadata are reported for the BSA datasets; likewise, no embedding-similarity or human-rating validation is provided to confirm that identity-swapped templates preserve semantics. These omissions are load-bearing because uneven annotation quality or subtle wording differences could produce the reported prediction gaps.

    Authors: We agree that these controls are necessary to rule out confounds. We have added label distribution statistics for each dataset in the revised §3. For inter-annotator agreement and creator demographics, these are not reported in the original dataset documentation, and we have explicitly noted this as a limitation in the revised manuscript. For the templates in §4.2, we now report cosine similarity of embeddings between original and identity-swapped versions using mBERT, showing high similarity (average >0.9). Additionally, we performed a human validation where three native Bengali speakers rated 100 templates for semantic equivalence on a 5-point scale, with mean score of 4.7 and high inter-rater agreement. These details are included in the updated section. revision: partial

  2. Referee: [§5] §5 (Results and Analysis): the reported inconsistencies across developer backgrounds are presented without statistical controls for dataset size or domain overlap. If larger or more homogeneous datasets systematically yield lower bias scores, the attribution to developer demographics alone is weakened; a regression or stratified analysis controlling for these factors would be needed to support the claim.

    Authors: We appreciate this suggestion for strengthening the analysis. In the revised version of §5, we have added a linear regression model predicting bias scores from developer background indicators, with controls for log(dataset size) and a measure of domain overlap (average TF-IDF cosine similarity between datasets). The developer effects remain statistically significant after including these controls. We also present stratified results by dataset size. This addresses the potential confounding and supports our original attribution. revision: yes

standing simulated objections not resolved
  • Complete inter-annotator agreement and creator demographic information for all BSA datasets, which are not provided in the public dataset releases.

Circularity Check

0 steps flagged

No significant circularity in empirical bias audit

full rationale

The paper reports an empirical algorithmic audit: mBERT and BanglaBERT are fine-tuned on all BSA datasets returned by Google Dataset Search, then evaluated for identity biases (gender, religion, nationality) via sentiment predictions on templates claimed to hold similar semantic content. All reported results consist of direct model-output comparisons and qualitative observations about inconsistencies across developer and dataset demographics. No equations, fitted parameters, or first-principles derivations appear; the central claims rest on external pre-trained models, public datasets, and standard fine-tuning rather than any self-definitional reduction, renamed prediction, or load-bearing self-citation chain. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard assumptions from algorithmic auditing literature rather than new mathematical axioms or invented entities.

axioms (1)
  • domain assumption Sentiment analysis models can be audited for identity-based biases by comparing outputs on semantically similar sentences that differ only in identity references.
    This premise is invoked when the paper constructs test cases for gender, religion, and nationality biases.

pith-pipeline@v0.9.0 · 5708 in / 1132 out tokens · 44554 ms · 2026-05-19T10:44:57.550023+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

147 extracted references · 147 canonical work pages · 4 internal anchors

  1. [1]

    Samyak Agrawal, Kshitij Gupta, Devansh Gautam, and Radhika Mamidi. 2022. Towards Detecting Political Bias in Hindi News Articles. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop. Association for Computational Linguistics, Dublin, Ireland, 239–244. https://doi.org/10.18653/v1/2022.acl-srw.17

  2. [2]

    Sibbir Ahmad, Songqing Jin, Veronique Theriault, and Klaus Deininger. 2023. Labor market discrimination in Bangladesh: Experimental evidence from the job market of college graduates. (2023)

  3. [3]

    Syed Mustafa Ali. 2016. A brief introduction to decolonial computing. XRDS: Crossroads, The ACM Magazine for Students 22, 4 (2016), 16–21

  4. [4]

    Mariam Attia and Julian Edge. 2017. Be (com) ing a reflexive researcher: a developmental approach to research methodology. Open review of educational research 4, 1 (2017), 33–45

  5. [5]

    Imran Awan. 2016. Islamophobia on social media: A qualitative analysis of the facebook’s walls of hate. International Journal of Cyber Criminology 10, 1 (2016), 1

  6. [6]

    Senthil Kumar B, Pranav Tiwari, Aman Chandra Kumar, and Aravindan Chan- drabose. 2022. Casteism in India, but Not Racism - a Study of Bias in Word Em- beddings of Indian Languages. In Proceedings of the First Workshop on Language Technology and Resources for a Fair, Inclusive, and Safe Society within the 13th Language Resources and Evaluation Conference ....

  7. [7]

    Ricardo Baeza-Yates. 2020. Bias in search and recommender systems. In Proceed- ings of the 14th ACM Conference on Recommender Systems . 2–2

  8. [8]

    Sarbani Banerjee. 2015. ” More or Less” Refugee?: Bengal Partition in Literature and Cinema. The University of Western Ontario (Canada)

  9. [9]

    Chelsea Barabas, Colin Doyle, JB Rubinovitz, and Karthik Dinakar. 2020. Study- ing up: reorienting the study of algorithmic fairness around issues of power. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency . 167–176

  10. [10]

    Gábor Bella, Paula Helm, Gertraud Koch, and Fausto Giunchiglia. 2024. Tackling Language Modelling Bias in Support of Linguistic Diversity. In The 2024 ACM Conference on Fairness, Accountability, and Transparency . 562–572

  11. [11]

    Marianne Bertrand and Sendhil Mullainathan. 2004. Are Emily and Greg more employable than Lakisha and Jamal? A field experiment on labor market dis- crimination. American economic review 94, 4 (2004), 991–1013

  12. [12]

    Shaily Bhatt, Sunipa Dev, Partha Talukdar, Shachi Dave, and Vinodkumar Prab- hakaran. 2022. Re-contextualizing Fairness in NLP: The Case of India. In Pro- ceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Pap...

  13. [13]

    Bahri, H

    Abhik Bhattacharjee, Tahmid Hasan, Wasi Ahmad, Kazi Samin Mubasshir, Md Saiful Islam, Anindya Iqbal, M. Sohel Rahman, and Rifat Shahriyar. 2022. BanglaBERT: Language Model Pretraining and Benchmarks for Low-Resource Language Understanding Evaluation in Bangla. In Findings of the Association for Computational Linguistics: NAACL 2022 , Marine Carpuat, Marie...

  14. [14]

    Steven Bird. 2020. Decolonising speech and language technology. In Proceedings of the 28th International Conference on Computational Linguistics . 3504–3519

  15. [15]

    Su Lin Blodgett, Solon Barocas, Hal Daumé III, and Hanna Wallach. 2020. Lan- guage (technology) is power: A critical survey of” bias” in nlp. arXiv preprint arXiv:2005.14050 (2020)

  16. [16]

    Nina Brown, Thomas McIlwraith, and Laura Tubelle de González. 2020. Per- spectives: An open introduction to cultural anthropology . Vol. 2300. American Anthropological Association

  17. [17]

    Amy Bruckman. 2002. Studying the amateur artist: A perspective on disguising data collected in human subjects research on the Internet. Ethics and Information Technology 4 (2002), 217–231

  18. [18]

    Bangladesh Statistics Bureau BSB. 2022. Preliminary Report on Population and Housing Census 2022 : English Version. https://drive.google.com/file/d/1Vhn2t_ PbEzo5-NDGBeoFJq4XCoSzOVKg/view. [Accessed: Feb 28, 2023]

  19. [19]

    Joy Buolamwini and Timnit Gebru. 2018. Gender shades: Intersectional accu- racy disparities in commercial gender classification. In Conference on fairness, accountability and transparency. PMLR, 77–91. EAAMO ’25, November 5–7, 2025, Pittsburgh, PA, USA Dipto Das, Shion Guha, and Bryan Semaan

  20. [20]

    Judith Butler. 2011. Gender trouble: Feminism and the subversion of identity . routledge

  21. [21]

    Laura Cabello, Anna Katrine Jørgensen, and Anders Søgaard. 2023. On the independence of association bias and empirical fairness in language models. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency. 370–378

  22. [22]

    Partha Chatterjee. 1993. The nation and its fragments: Colonial and postcolonial histories. Princeton University Press

  23. [23]

    Le Chen, Ruijun Ma, Anikó Hannák, and Christo Wilson. 2018. Investigating the impact of gender on rank in resume search engines. In Proceedings of the 2018 chi conference on human factors in computing systems . 1–14

  24. [24]

    John Cheney-Lippold. 2017. We are data: Algorithms and the making of our digital selves. New York University Press

  25. [25]

    Jacob Cohen. 2013. Statistical power analysis for the behavioral sciences. Academic press

  26. [26]

    Jacob Cohen. 2016. A power primer. (2016)

  27. [27]

    Patricia Hill Collins. 2022. Black feminist thought: Knowledge, consciousness, and the politics of empowerment . routledge

  28. [28]

    Patricia Hill Collins and Sirma Bilge. 2020. Intersectionality. John Wiley & Sons

  29. [29]

    A Feder Cooper, Emanuel Moss, Benjamin Laufer, and Helen Nissenbaum

  30. [30]

    In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency

    Accountability in an algorithmic society: relationality, responsibility, and robustness in machine learning. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 864–876

  31. [31]

    Kate Crawford. 2021. The atlas of AI: Power, politics, and the planetary costs of artificial intelligence. Yale University Press

  32. [32]

    Kimberlé Crenshaw. 2013. Demarginalizing the intersection of race and sex: A black feminist critique of antidiscrimination doctrine, feminist theory and antiracist politics. In Feminist legal theories . Routledge, 23–51

  33. [33]

    Jia Cui, Brian Kingsbury, Bhuvana Ramabhadran, George Saon, Tom Sercu, Kar- tik Audhkhasi, Abhinav Sethy, Markus Nussbaum-Thom, and Andrew Rosen- berg. 2017. Knowledge distillation across ensembles of multilingual models for low-resource languages. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . IEEE, 4825–4829

  34. [34]

    Peter Cummings. 2011. Arguments for and against standardized mean differ- ences (effect sizes). Archives of pediatrics & adolescent medicine 165, 7 (2011), 592–596

  35. [35]

    Paula Czarnowska, Yogarshi Vyas, and Kashif Shah. 2021. Quantifying social biases in NLP: A generalization and empirical comparison of extrinsic fairness metrics. Transactions of the Association for Computational Linguistics 9 (2021), 1249–1267

  36. [36]

    Dipto Das and Anthony J Clark. 2019. Construct of Sarcasm on social media platform. In 2019 IEEE international conference on humanized computing and communication (HCC). IEEE, 106–113

  37. [37]

    Dipto Das, Dhwani Gandhi, and Bryan Semaan. 2024. Reimagining Communities through Transnational Bengali Decolonial Discourse with YouTube Content Creators. arXiv preprint arXiv:2407.13131 (2024)

  38. [38]

    Colonial Impulse

    Dipto Das, Shion Guha, Jed R Brubaker, and Bryan Semaan. 2024. The“Colonial Impulse” of Natural Language Processing: An Audit of Bengali Sentiment Anal- ysis Tools and Their Identity-based Biases. In Proceedings of the CHI Conference on Human Factors in Computing Systems . 1–18

  39. [39]

    Dipto Das, Shion Guha, and Bryan Semaan. 2023. Toward Cultural Bias Evalu- ation Datasets: The Case of Bengali Gender, Religious, and National Identity. In Proceedings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP). 68–83

  40. [40]

    Dipto Das, Carsten Østerlund, and Bryan Semaan. 2021. ” Jol” or” Pani”?: How Does Governance Shape a Platform’s Identity? Proceedings of the ACM on Human-Computer Interaction 5, CSCW2 (2021), 1–25

  41. [41]

    Dipto Das and Bryan Semaan. 2022. Collaborative identity decolonization as reclaiming narrative agency: Identity work of Bengali communities on Quora. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems . 1–23

  42. [42]

    Veena Das. 2006. Life and Words: Violence and the Descent into the Ordinary . Univ of California Press

  43. [43]

    Thomas Davidson, Dana Warmsley, Michael Macy, and Ingmar Weber. 2017. Automated hate speech detection and the problem of offensive language. In Proceedings of the international AAAI conference on web and social media , Vol. 11. 512–515

  44. [44]

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  45. [45]

    Mark Díaz, Isaac Johnson, Amanda Lazar, Anne Marie Piper, and Darren Gergle

  46. [46]

    In Proceedings of the 2018 chi conference on human factors in computing systems

    Addressing age-related bias in sentiment analysis. In Proceedings of the 2018 chi conference on human factors in computing systems . 1–14

  47. [47]

    Mark Díaz, Ian Kivlichan, Rachel Rosen, Dylan Baker, Razvan Amironesei, Vin- odkumar Prabhakaran, and Emily Denton. 2022. Crowdworksheets: Accounting for individual and collective identities underlying crowdsourced dataset anno- tation. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 2342–2351

  48. [48]

    Afia Dil. 1972. The Hindu and Muslim Dialects of Bengali . Stanford University

  49. [49]

    divinAI. 2020. Diversity in Artificial Intelligence: ACM FAccT 2020. https: //divinai.org/conf/74/acm-facct. Last accessed: Sep 12, 2023

  50. [50]

    Paul Dourish and Scott D Mainwaring. 2012. Ubicomp’s colonial impulse. In Proceedings of the 2012 ACM conference on ubiquitous computing . 133–142

  51. [51]

    Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. 2012. Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference . 214–226

  52. [52]

    Benjamin Edelman, Michael Luca, and Dan Svirsky. 2017. Racial discrimination in the sharing economy: Evidence from a field experiment. American economic journal: applied economics 9, 2 (2017), 1–22

  53. [53]

    Benjamin G Edelman and Michael Luca. 2014. Digital discrimination: The case of Airbnb. com. Harvard Business School NOM Unit Working Paper 14-054 (2014)

  54. [54]

    Upol Ehsan, Q Vera Liao, Samir Passi, Mark O Riedl, and Hal Daumé III. 2024. Seamful XAI: Operationalizing Seamful Design in Explainable AI. Proceedings of the ACM on Human-Computer Interaction 8, CSCW1 (2024), 1–29

  55. [55]

    Maria Eriksson and Anna Johansson. 2017. Tracking gendered streams. Culture unbound. Journal of Current Cultural Research 9, 2 (2017), 163–183

  56. [56]

    Virginia Eubanks. 2018. Automating inequality: How high-tech tools profile, police, and punish the poor . St. Martin’s Press

  57. [57]

    Oliver Falck, Stephan Heblich, Alfred Lameli, and Jens Südekum. 2012. Dialects, cultural identity, and economic exchange. Journal of urban economics 72, 2-3 (2012), 225–239

  58. [58]

    Participant

    Casey Fiesler and Nicholas Proferes. 2018. “Participant” perceptions of Twitter research ethics. Social Media+ Society 4, 1 (2018), 2056305118763366

  59. [59]

    Miranda Fricker. 2007. Epistemic injustice: Power and the ethics of knowing . Oxford University Press

  60. [60]

    Batya Friedman and Helen Nissenbaum. 1996. Bias in computer systems. ACM Transactions on information systems (TOIS) 14, 3 (1996), 330–347

  61. [61]

    Joshua Gardner, Renzhe Yu, Quan Nguyen, Christopher Brooks, and Rene Kizilcec. 2023. Cross-institutional transfer learning for educational models: Implications for model performance, fairness, and equity. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency . 1664–1684

  62. [62]

    Viktor Gecas. 1982. The self-concept. Annual review of sociology 8 (1982), 1–33

  63. [63]

    Anindita Ghoshal. 2021. ‘mirroring the other’: Refugee, homeland, identity and diaspora. In Routledge Handbook of Asian Diaspora and Development . Routledge, 147–158

  64. [64]

    Rishav Hada, Safiya Husain, Varun Gumma, Harshita Diddee, Aditya Yadavalli, Agrima Seth, Nidhi Kulkarni, Ujwal Gadiraju, Aditya Vashistha, Vivek Seshadri, et al. 2024. Akal Badi ya Bias: An Exploratory Study of Gender Bias in Hindi Language Technology. In The 2024 ACM Conference on Fairness, Accountability, and Transparency. 1926–1939

  65. [65]

    Anikó Hannák, Claudia Wagner, David Garcia, Alan Mislove, Markus Strohmaier, and Christo Wilson. 2017. Bias in online freelance marketplaces: Evidence from taskrabbit and fiverr. In Proceedings of the 2017 ACM conference on computer supported cooperative work and social computing . 1914–1933

  66. [66]

    MD Romael Haque, Devansh Saxena, Katy Weathington, Joseph Chudzik, and Shion Guha. 2024. Are We Asking the Right Questions?: Designing for Com- munity Stakeholders’ Interactions with AI in Policing. In Proceedings of the CHI Conference on Human Factors in Computing Systems . 1–20

  67. [67]

    All that You Touch, You Change

    Christina N Harrington, Shamika Klassen, and Yolanda A Rankin. 2022. “All that You Touch, You Change”: Expanding the Canon of Speculative Design Towards Black Futuring. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–10

  68. [68]

    Sohel Rahman, and Rifat Shahriyar

    Tahmid Hasan, Abhik Bhattacharjee, Kazi Samin, Masum Hasan, Madhusudan Basak, M. Sohel Rahman, and Rifat Shahriyar. 2020. Not Low-Resource Anymore: Aligner Ensembling, Batch Filtering, and New Datasets for Bengali-English Machine Translation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) , Bonnie Webber,...

  69. [69]

    Daniel Hershcovich, Stella Frank, Heather Lent, Miryam de Lhoneux, Mostafa Abdou, Stephanie Brandl, Emanuele Bugliarello, Laura Cabello Piqueras, Ilias Chalkidis, Ruixiang Cui, et al . 2022. Challenges and strategies in cross-cultural NLP. arXiv preprint arXiv:2203.10020 (2022)

  70. [70]

    Geoffrey Hinton. 2015. Distilling the Knowledge in a Neural Network. arXiv preprint arXiv:1503.02531 (2015)

  71. [71]

    Sara Hooker. 2021. The hardware lottery. Commun. ACM 64, 12 (2021), 58–65

  72. [72]

    you sound just like your father

    Dirk Hovy, Federico Bianchi, and Tommaso Fornaciari. 2020. “you sound just like your father” commercial machine translation systems include stylistic biases. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 1686–1690

  73. [73]

    Saffron Huang, Divya Siddarth, Liane Lovitt, Thomas I Liao, Esin Durmus, Alex Tamkin, and Deep Ganguli. 2024. Collective Constitutional AI: Aligning a Language Model with Public Input. In The 2024 ACM Conference on Fairness, Accountability, and Transparency. 1395–1417. How do datasets, developers, and models affect biases in a low-resourced language? EAAM...

  74. [74]

    Tenghao Huang, Faeze Brahman, Vered Shwartz, and Snigdha Chaturvedi. 2021. Uncovering Implicit Gender Bias in Narratives through Commonsense Infer- ence. In Findings of the Association for Computational Linguistics: EMNLP 2021 . Association for Computational Linguistics, Punta Cana, Dominican Republic, 3866–3873. https://doi.org/10.18653/v1/2021.findings-...

  75. [75]

    Ben Hutchinson, Andrew Smart, Alex Hanna, Emily Denton, Christina Greer, Oddur Kjartansson, Parker Barnes, and Margaret Mitchell. 2021. Towards accountability for machine learning datasets: Practices from software engineer- ing and infrastructure. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. 560–575

  76. [76]

    Office of the Registrar General India. 2011. Census of India: Comparative speaker’s strength of Scheduled Languages. https://www.censusindia.gov.in/ 2011Census/C-16_25062018_NEW.pdf. Last accessed: September 16, 2020

  77. [77]

    Alvi Md Ishmam and Sadia Sharmin. 2019. Hateful speech detection in pub- lic facebook pages for the bengali language. In 2019 18th IEEE international conference on machine learning and applications (ICMLA) . IEEE, 555–560

  78. [78]

    Jiaming Ji, Tianyi Qiu, Boyuan Chen, Borong Zhang, Hantao Lou, Kaile Wang, Yawen Duan, Zhonghao He, Jiayi Zhou, Zhaowei Zhang, et al . 2023. Ai align- ment: A comprehensive survey. arXiv preprint arXiv:2310.19852 (2023)

  79. [79]

    Pratik Joshi, Sebastin Santy, Amar Budhiraja, Kalika Bali, and Monojit Choud- hury. 2020. The State and Fate of Linguistic Diversity and Inclusion in the NLP World. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 6282–6293. https://doi.org/10.18653/v1/2020.acl-main.560

  80. [80]

    Shafkat Kibria, Ahnaf Mozib Samin, M Humayon Kobir, M Shahidur Rahman, M Reza Selim, and M Zafar Iqbal. 2022. Bangladeshi Bangla speech corpus for automatic speech recognition research. Speech Communication 136 (2022)

Showing first 80 references.