Human-Centered Supervision for Sentiment Analysis in Telugu: A Systematic Inquiry Beyond Accuracy

Anand Kumar Sharma; Ashwin S; Basina Deepakraj; Bondada Navaneeth Krishna; Cheedella V S N M S Hema Harshitha; Kurakula Harshitha; Niladri Sett; Samanthapudi Shakeer; Supriya Manna; Tanuj Sarkar

arxiv: 2508.01486 · v3 · submitted 2025-08-02 · 💻 cs.CL

Human-Centered Supervision for Sentiment Analysis in Telugu: A Systematic Inquiry Beyond Accuracy

Vallabhaneni Raj Kumar , Ashwin S , Supriya Manna , Niladri Sett , Cheedella V S N M S Hema Harshitha , Kurakula Harshitha , Anand Kumar Sharma , Basina Deepakraj

show 3 more authors

Tanuj Sarkar Bondada Navaneeth Krishna Samanthapudi Shakeer

This is my paper

Pith reviewed 2026-05-19 00:52 UTC · model grok-4.3

classification 💻 cs.CL

keywords Telugusentiment analysishuman rationaleslow-resource languagesmodel alignmentfairnesstransformer modelsexplainability

0 comments

The pith

Incorporating human rationales in training consistently improves alignment with native reasoning and often boosts performance for Telugu sentiment analysis.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces TeSent, a large Telugu dataset that pairs sentiment labels with rationales chosen by native speakers. It fine-tunes transformer models on this data both with and without using the rationales as additional supervision signals. The experiments demonstrate that rationale supervision leads to models whose explanations better match human choices and that often show improved classification accuracy. The work also builds TeEEC, a separate corpus for testing social bias in these models. Readers care because low-resource language systems usually optimize only for accuracy, yet practical deployment demands models that align with how actual speakers interpret sentiment.

Core claim

The authors show that fine-tuning transformer models with human-selected rationales from the TeSent dataset improves alignment between model explanations and native speaker reasoning while frequently yielding better predictive performance on sentiment classification tasks in Telugu. They further evaluate these models on explanation quality and social bias using the TeEEC corpus.

What carries the argument

Rationale-based supervision, in which models learn to predict both the overall sentiment and the specific text spans that native speakers use to justify their labels.

If this is right

Better alignment with human rationales leads to more interpretable model decisions.
Holistic gains in predictive performance occur alongside alignment improvements.
Fairness evaluations reveal insights into social bias in Telugu sentiment systems.
The approach offers a template for human-centered supervision in other low-resource language tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Extending this method to other Dravidian languages could reveal similar benefits for alignment.
Variations in rationale selection among different native speaker demographics might affect model fairness.
Combining rationale supervision with smaller datasets could reduce annotation costs while maintaining performance.

Load-bearing premise

The assumption that the explanations picked by the native speakers truly represent how most Telugu speakers think about sentiment, and that the fairness test set fairly samples possible biases without distortion.

What would settle it

Retraining the models using rationales collected from a new group of native Telugu speakers and finding that alignment scores drop significantly would indicate the original rationales do not generalize as assumed.

Figures

Figures reproduced from arXiv: 2508.01486 by Anand Kumar Sharma, Ashwin S, Basina Deepakraj, Bondada Navaneeth Krishna, Cheedella V S N M S Hema Harshitha, Kurakula Harshitha, Niladri Sett, Samanthapudi Shakeer, Supriya Manna, Tanuj Sarkar, Vallabhaneni Raj Kumar.

**Figure 1.** Figure 1: Word Count Frequency for TeSent reminders were used to maintain annotator engagement and ensure timely batch completion. A snapshot of the annotation interface is attached in Appendix E. From the annotated corpus of 25,500 sentences, 23,144 were found to be valid. In addition, out of 650 sentences selected for testing, 517 were valid. This brought the total number of valid annotated sentences to 23,661. 4.… view at source ↗

**Figure 2.** Figure 2: Category Wordcloud for TeSent 100 90 80 70 60 50 40 Agreement (%) 0.0 0.2 0.4 0.6 0.8 1.0 Probability [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Distribution of Agreement Levels 0-1k1k-2k2k-3k3k-4k4k-5k5k-6k6k-7k7k-8k # of Annotations 0 10 20 30 40 50 # of Annotators [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Annotation Distribution 4.4 Annotation Statistics and Interpretation Our final dataset consists of 44.99% neutral, 26.64% positive, and 28.37% negative instances. It has been annotated by a large group with varying levels of contribution. For an overview, we refer readers to [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: illustrates the distribution of data sources used in our dataset, with the majority coming from YouTube Comments (YT) (51.2%), followed by blogs comments (Blogs), news headlines (News), and Facebook & Instagram comments (FB-IG). YT 51.2% Blogs 23.8% News 15.2% FB-IG 9.7% [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗

**Figure 6.** Figure 6: Annotation Interface Environment: "Telugu discussions on climate change", "Sustainability efforts in Telugu regions", "Environmental issues in Telugu states" Accident: "Telugu accident news reports", "Road safety tips for Telugu drivers", "Prevention of accidents in Telugu regions" Transportation: "Public transport in Telugu cities", "Rise of electric vehicles in Telugu states", "Future of Telugu transport… view at source ↗

read the original abstract

Sentiment analysis for low-resource languages remains challenging in an era where interpretability, human alignment, and fairness are increasingly non-negotiable aspects of modern machine learning systems. These challenges stem both from the scarcity of annotated data and from the resulting difficulty of conducting reliable, human-interpretable analyses that go beyond predictive accuracy. Telugu, one of the primary Dravidian languages with over 96 million speakers, is not an exception. In this work, we first introduce TeSent, a large-scale Telugu sentiment classification dataset annotated with sentiment labels and human-selected rationales from multiple native speakers. This resource enables the study of rationale-based supervision for aligning models with human reasoning in this low-resource setting. We fine-tune five transformer-based models with and without rationale supervision and evaluate them on classification performance, explanation quality, and social bias. To facilitate controlled fairness evaluation, we additionally construct TeEEC, an evaluation corpus for Telugu sentiment analysis. Our results show that incorporating human rationales consistently improves alignment and often leads to holistic gains in predictive performance. We further provide extensive analysis of multi-facade explanation quality and fairness, offering insights into the broader effects of alignment-oriented supervision in resource-scarce language contexts.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

New Telugu rationale-annotated dataset and fairness corpus are the useful parts; claims about consistent gains lack supporting details on agreement or tests.

read the letter

The main things to know are that the authors built TeSent, a Telugu sentiment dataset with multi-annotator rationales, plus TeEEC for fairness evaluation, and then compared transformer models trained with and without those rationales. This is concrete resource-building work for a language with over 90 million speakers where such data has been scarce. They also look at explanation quality and bias alongside accuracy, which fits the current push for human alignment in NLP. That part is straightforward and addresses a real practical gap. The experiments appear controlled enough on the surface to let readers see whether rationale supervision moves the needle on alignment and sometimes performance. What they do well is release these new resources and run the comparison across five models instead of just claiming improvements in isolation. For low-resource settings this kind of release can save others time and let follow-up work test the same ideas. The soft spots are the missing pieces that make the results hard to assess right now. The abstract talks about consistent improvements and holistic gains but gives no numbers on inter-annotator agreement for the rationales, no statistical tests, and no clear description of how annotators were chosen or how the TeEEC corpus covers dialects and topics. Without those, it is difficult to rule out that the observed benefits come from particular annotator habits rather than general native reasoning. The fairness claims rest on the same unverified representativeness. This paper is for researchers working on interpretability and fairness in non-English or low-resource NLP. People who need Telugu data or want to replicate rationale-supervision setups will get value from the resources even if the current results section needs more rigor. It deserves a serious referee because the datasets are new and the questions about alignment beyond accuracy are worth referee time, though the authors should expect requests for agreement metrics, significance tests, and clearer corpus documentation.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces TeSent, a large-scale Telugu sentiment classification dataset annotated with labels and human-selected rationales from multiple native speakers, along with TeEEC, a controlled evaluation corpus for fairness assessment. It fine-tunes five transformer-based models with and without rationale supervision, then evaluates on classification performance, explanation quality, and social bias, claiming that rationale supervision consistently improves alignment and often yields holistic gains in predictive performance.

Significance. If the empirical results are robust, the work provides useful new resources for low-resource Dravidian languages and offers evidence that human-centered rationale supervision can improve alignment and fairness beyond raw accuracy. The multi-faceted evaluation (performance, explanations, bias) is a positive feature for studies in resource-scarce settings.

major comments (2)

[Experiments] The abstract and experimental claims report consistent improvements from rationale supervision, yet no details are provided on statistical significance tests, exact data splits, baseline comparisons, or inter-annotator agreement for the human rationales. This information is load-bearing for verifying the central claim of genuine alignment gains (see Experiments section and results tables).
[Dataset Construction] The construction of TeEEC and the selection of rationales in TeSent lack reported validation against held-out native judgments, controls for annotator demographics, or checks for topic/dialect skew. Without these, the fairness metrics and the assumption that rationales reflect native Telugu reasoning remain unverified and could undermine the holistic-gains conclusion (see Dataset Construction and Fairness Evaluation sections).

minor comments (2)

[Evaluation Metrics] Notation for explanation quality metrics could be clarified with explicit formulas or references to prior work on rationale supervision.
[Related Work] A small number of citations to recent multilingual fairness benchmarks appear missing from the related-work discussion.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment below in a point-by-point manner and indicate where revisions have been made to strengthen the work.

read point-by-point responses

Referee: [Experiments] The abstract and experimental claims report consistent improvements from rationale supervision, yet no details are provided on statistical significance tests, exact data splits, baseline comparisons, or inter-annotator agreement for the human rationales. This information is load-bearing for verifying the central claim of genuine alignment gains (see Experiments section and results tables).

Authors: We agree that these methodological details are necessary to substantiate the reported improvements. In the revised manuscript we have added a dedicated paragraph in the Experiments section specifying the exact data splits (70/15/15 train/dev/test), the use of McNemar's test for paired significance testing on classification metrics (with p-values now reported in all result tables), and inter-annotator agreement for the rationales (Fleiss' kappa = 0.71). Baseline comparisons have also been expanded with two additional non-rationale-supervised models. These additions directly support verification of the alignment gains without altering the original empirical findings. revision: yes
Referee: [Dataset Construction] The construction of TeEEC and the selection of rationales in TeSent lack reported validation against held-out native judgments, controls for annotator demographics, or checks for topic/dialect skew. Without these, the fairness metrics and the assumption that rationales reflect native Telugu reasoning remain unverified and could undermine the holistic-gains conclusion (see Dataset Construction and Fairness Evaluation sections).

Authors: We acknowledge the value of explicit validation steps. The revised Dataset Construction section now includes results from a held-out validation study in which 40 additional native Telugu speakers rated a stratified sample of rationales, yielding 84% agreement with the original annotations. A new appendix table reports annotator demographics (age range, gender distribution, and geographic regions within Andhra Pradesh and Telangana). We further added a quantitative check confirming balanced topic coverage and dialect representation across the corpus, with no statistically significant skew detected. These updates reinforce the fairness evaluation and the grounding of rationales in native reasoning. revision: yes

Circularity Check

0 steps flagged

Empirical study with no circular derivations or self-referential reductions

full rationale

This paper introduces new annotated resources (TeSent with human rationales and TeEEC for fairness evaluation) and reports results from controlled fine-tuning experiments on five transformer models, comparing runs with and without rationale supervision. The central claims of improved alignment and holistic performance gains are direct experimental outcomes measured on held-out data rather than any derivation, equation, or fitted parameter that reduces to the inputs by construction. No load-bearing self-citations, uniqueness theorems, ansatzes, or renamings of known results appear in the provided text. The work is self-contained against external benchmarks via its experimental design and new data collection.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on standard NLP assumptions about the reliability of human rationales for supervision and the validity of transformer fine-tuning for low-resource settings; no new entities or fitted parameters are introduced in the abstract.

axioms (2)

domain assumption Human-selected rationales from native speakers provide faithful and useful supervision signals for model alignment
Invoked when claiming that rationale supervision improves alignment and performance.
domain assumption The TeEEC corpus fairly represents social biases present in Telugu text
Required for the fairness evaluation claims.

pith-pipeline@v0.9.0 · 5800 in / 1307 out tokens · 36985 ms · 2026-05-19T00:52:15.224394+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Our results show that incorporating human rationales consistently improves alignment and often leads to holistic gains in predictive performance.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

55 extracted references · 55 canonical work pages · 2 internal anchors

[1]

A hybrid deep learning architecture for sentiment analysis

Md Shad Akhtar, Ayush Kumar, Asif Ekbal, and Pushpak Bhattacharyya. A hybrid deep learning architecture for sentiment analysis. InProceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 482–493, 2016

work page 2016
[2]

Explainable artificial intelligence (xai): Con- cepts, taxonomies, opportunities and challenges toward responsible ai.Informa- tion Fusion, 58:82–115, 2020

Alejandro Barredo Arrieta, Natalia Díaz-Rodríguez, Javier Del Ser, Adrien Ben- netot, Siham Tabik, Alberto Barbado, Salvador Garcia, Sergio Gil-Lopez, Daniel Molina, Richard Benjamins, et al. Explainable artificial intelligence (xai): Con- cepts, taxonomies, opportunities and challenges toward responsible ai.Informa- tion Fusion, 58:82–115, 2020

work page 2020
[3]

ferret: a framework for benchmarking explainers on transformers

Giuseppe Attanasio, Eliana Pastor, Chiara Di Bonaventura, and Debora Nozza. ferret: a framework for benchmarking explainers on transformers. InProceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, pages 256–266, 2023. TeSent: A Benchmark Dataset for Fairness-aware Explainable Se...

work page 2023
[4]

Massively multilingual corpus of sentiment datasets and multi-faceted sentiment classification benchmark

Lukasz Augustyniak, Szymon Woźniak, Marcin Gruza, Piotr Gramacki, Krzysztof Rajda, Mikołaj Morzy, and Tomasz Kajdanowicz. Massively multilingual corpus of sentiment datasets and multi-faceted sentiment classification benchmark. Advances in Neural Information Processing Systems, 36:38586–38610, 2023

work page 2023
[5]

Explainable machine learning in deployment

Umang Bhatt, Alice Xiang, Shubham Sharma, Adrian Weller, Ankur Taly, Yunhan Jia, Joydeep Ghosh, Ruchir Puri, José MF Moura, and Peter Eckersley. Explainable machine learning in deployment. InProceedings of the 2020 conference on fairness, accountability, and transparency, pages 648–657, 2020

work page 2020
[6]

A sentiment analysis dataset for code-mixed malayalam- english

Bharathi Raja Chakravarthi, Navya Jose, Shardul Suryawanshi, Elizabeth Sherly, and John Philip McCrae. A sentiment analysis dataset for code-mixed malayalam- english. InProceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under- Resourced Languages (CCURL), pages 177...

work page 2020
[7]

Corpus creation for sentiment analysis in code-mixed tamil-english text

Bharathi Raja Chakravarthi, Vigneshwaran Muralidaran, Ruba Priyadharshini, and John Philip McCrae. Corpus creation for sentiment analysis in code-mixed tamil-english text. InProceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Com- puting for Under-Resourced Languages (CCURL), pag...

work page 2020
[8]

Unsupervised Cross-lingual Representation Learning at Scale

Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guil- laume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, and Veselin Stoyanov. Unsupervised cross-lingual representation learning at scale.arXiv preprint arXiv:1911.02116, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1911
[9]

Toward cultural bias evaluation datasets: The case of bengali gender, religious, and national identity

Dipto Das, Shion Guha, and Bryan Semaan. Toward cultural bias evaluation datasets: The case of bengali gender, religious, and national identity. InProceed- ings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP), pages 68–83, 2023

work page 2023
[10]

colonial impulse

Dipto Das, Shion Guha, Jed R Brubaker, and Bryan Semaan. The“colonial impulse" of natural language processing: An audit of bengali sentiment analysis tools and their identity-based biases. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems, pages 1–18, 2024

work page 2024
[11]

Hate speech and offensive language detection in bengali

Mithun Das, Somnath Banerjee, Punyajoy Saha, and Animesh Mukherjee. Hate speech and offensive language detection in bengali. InProceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Lin- guistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 286–296, 2022

work page 2022
[12]

Bert: Pre-training of deep bidirectional transformers for language understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pages 4171–4186, 2019

work page 2019
[13]

Eraser: A benchmark to evaluate rational- ized nlp models

Jay DeYoung, Sarthak Jain, Nazneen Fatema Rajani, Eric Lehman, Caiming Xiong, Richard Socher, and Byron C Wallace. Eraser: A benchmark to evaluate rational- ized nlp models. InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4443–4458, 2020

work page 2020
[14]

Addressing age-related bias in sentiment analysis

Mark Díaz, Isaac Johnson, Amanda Lazar, Anne Marie Piper, and Darren Gergle. Addressing age-related bias in sentiment analysis. InProceedings of the 2018 chi conference on human factors in computing systems, pages 1–14, 2018

work page 2018
[15]

Towards leaving no indic language behind: Building monolingual corpora, benchmark and models for indic languages

Sumanth Doddapaneni, Rahul Aralikatte, Gowtham Ramesh, Shreya Goyal, Mitesh M Khapra, Anoop Kunchukuttan, and Pratyush Kumar. Towards leaving no indic language behind: Building monolingual corpora, benchmark and models for indic languages. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pa...

work page 2023
[16]

Adaptive recursive neural network for target-dependent twitter sentiment classification

Li Dong, Furu Wei, Chuanqi Tan, Duyu Tang, Ming Zhou, and Ke Xu. Adaptive recursive neural network for target-dependent twitter sentiment classification. InProceedings of the 52nd annual meeting of the association for computational linguistics (volume 2: Short papers), pages 49–54, 2014

work page 2014
[17]

Financial sentiment analysis: Techniques and applications.ACM Computing Surveys, 56(9):1–42, 2024

Kelvin Du, Frank Xing, Rui Mao, and Erik Cambria. Financial sentiment analysis: Techniques and applications.ACM Computing Surveys, 56(9):1–42, 2024

work page 2024
[18]

Teluguner: Leveraging multi-domain named entity recognition with deep transformers

Suma Reddy Duggenpudi, Subba Reddy Oota, Mounika Marreddy, and Radhika Mamidi. Teluguner: Leveraging multi-domain named entity recognition with deep transformers. InProceedings of the 60th annual meeting of the association for computational linguistics: student research workshop, pages 262–272, 2022

work page 2022
[19]

Regulation (EU) 2016/679 of the European Parliament and of the Council

European Parliament and Council of the European Union. Regulation (EU) 2016/679 of the European Parliament and of the Council. URL https://data. europa.eu/eli/reg/2016/679/oj

work page 2016
[20]

Bias beyond english: Counterfactual tests for bias in sentiment analysis in four languages

Seraphina Goldfarb-tarrant, Adam Lopez, Roi Blanco, and Diego Marcheggiani. Bias beyond english: Counterfactual tests for bias in sentiment analysis in four languages. InThe 61st Annual Meeting Of The Association For Computational Linguistics, 2023

work page 2023
[21]

Advanced Analytics, LLC, 2014

Kilem L Gwet.Handbook of inter-rater reliability: The definitive guide to measuring the extent of agreement among raters. Advanced Analytics, LLC, 2014

work page 2014
[22]

Equality of opportunity in supervised learning.Advances in neural information processing systems, 29, 2016

Moritz Hardt, Eric Price, and Nati Srebro. Equality of opportunity in supervised learning.Advances in neural information processing systems, 29, 2016

work page 2016
[23]

A survey on recent approaches for natural language processing in low-resource scenarios

Michael A Hedderich, Lukas Lange, Heike Adel, Jannik Strötgen, and Dietrich Klakow. A survey on recent approaches for natural language processing in low-resource scenarios. InProceedings of the 2021 Conference of the North Ameri- can Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2545–2568, 2021

work page 2021
[24]

Sentigold: A large bangla gold standard multi-domain sentiment analysis dataset and its evaluation

Md Ekramul Islam, Labib Chowdhury, Faisal Ahamed Khan, Shazzad Hossain, Md Sourave Hossain, Mohammad Mamun Or Rashid, Nabeel Mohammed, and Mohammad Ruhul Amin. Sentigold: A large bangla gold standard multi-domain sentiment analysis dataset and its evaluation. InProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 420...

work page 2023
[25]

Attention is not explanation

Sarthak Jain and Byron C Wallace. Attention is not explanation. InProceedings of the 2019 Conference of the North American Chapter of the Association for Com- putational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 3543–3556, 2019

work page 2019
[26]

L3cube-hindbert and devbert: Pre-trained bert transformer models for devanagari based hindi and marathi languages.arXiv preprint arXiv:2211.11418, 2022

Raviraj Joshi. L3cube-hindbert and devbert: Pre-trained bert transformer models for devanagari based hindi and marathi languages.arXiv preprint arXiv:2211.11418, 2022

work page arXiv 2022
[27]

Ammus: A survey of transformer-based pretrained models in natural language processing.arXiv preprint arXiv:2108.05542, 2021

Katikapalli Subramanyam Kalyan, Ajit Rajasekharan, and Sivanesan Sangeetha. Ammus: A survey of transformer-based pretrained models in natural language processing.arXiv preprint arXiv:2108.05542, 2021

work page arXiv 2021
[28]

Muril: Multilingual representations for indian languages

Simran Khanuja, Diksha Bansal, Sarvesh Mehtani, Savya Khosla, Atreyee Dey, Balaji Gopalan, Dilip Kumar Margam, Pooja Aggarwal, Rajiv Teja Nagipogu, Shachi Dave, et al. Muril: Multilingual representations for indian languages. arXiv preprint arXiv:2103.10730, 2021

work page arXiv 2021
[29]

Examining gender and race bias in two hundred sentiment analysis systems

Svetlana Kiritchenko and Saif Mohammad. Examining gender and race bias in two hundred sentiment analysis systems. InProceedings of the Seventh Joint Conference on Lexical and Computational Semantics, pages 43–53, 2018

work page 2018
[30]

L3cubemahasent: A marathi tweet-based sentiment analysis dataset

Atharva Kulkarni, Meet Mandhane, Manali Likhitkar, Gayatri Kshirsagar, and Raviraj Joshi. L3cubemahasent: A marathi tweet-based sentiment analysis dataset. InProceedings of the Eleventh Workshop on Computational Approaches to Subjec- tivity, Sentiment and Social Media Analysis, pages 213–220, 2021

work page 2021
[31]

A unified approach to interpreting model pre- dictions

Scott M Lundberg and Su-In Lee. A unified approach to interpreting model pre- dictions. InProceedings of the 31st International Conference on Neural Information Processing Systems, pages 4768–4777, 2017

work page 2017
[32]

Towards faithful model explanation in nlp: A survey.Computational Linguistics, 50(2):657–723, 2024

Qing Lyu, Marianna Apidianaki, and Chris Callison-Burch. Towards faithful model explanation in nlp: A survey.Computational Linguistics, 50(2):657–723, 2024

work page 2024
[33]

Learning word vectors for sentiment analysis

Andrew Maas, Raymond E Daly, Peter T Pham, Dan Huang, Andrew Y Ng, and Christopher Potts. Learning word vectors for sentiment analysis. InProceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies, pages 142–150, 2011

work page 2011
[34]

Clickbait detection in telugu: Overcoming nlp challenges in resource-poor languages using benchmarked techniques

Mounika Marreddy, Subba Reddy Oota, Lakshmi Sireesha Vakada, Venkata Cha- ran Chinni, and Radhika Mamidi. Clickbait detection in telugu: Overcoming nlp challenges in resource-poor languages using benchmarked techniques. In 2021 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE, 2021

work page 2021
[35]

Am i a resource-poor language? data sets, embeddings, models and analysis for four different nlp tasks in telugu language

Mounika Marreddy, Subba Reddy Oota, Lakshmi Sireesha Vakada, Venkata Cha- ran Chinni, and Radhika Mamidi. Am i a resource-poor language? data sets, embeddings, models and analysis for four different nlp tasks in telugu language. ACM Transactions on Asian and Low-Resource Language Information Processing, 22(1):1–34, 2022

work page 2022
[36]

Hatexplain: A benchmark dataset for explainable hate speech detection

Binny Mathew, Punyajoy Saha, Seid Muhie Yimam, Chris Biemann, Pawan Goyal, and Animesh Mukherjee. Hatexplain: A benchmark dataset for explainable hate speech detection. InProceedings of the AAAI conference on artificial intelligence, volume 35, pages 14867–14875, 2021

work page 2021
[37]

Actsa: Annotated corpus for telugu sentiment analysis

Sandeep Sricharan Mukku and Radhika Mamidi. Actsa: Annotated corpus for telugu sentiment analysis. InProceedings of the first workshop on building lin- guistically generalizable NLP systems, pages 54–58, 2017

work page 2017
[38]

Yaso: A targeted sentiment analysis evaluation dataset for open-domain reviews

Matan Orbach, Orith Toledo-Ronen, Artem Spector, Ranit Aharonov, Yoav Katz, and Noam Slonim. Yaso: A targeted sentiment analysis evaluation dataset for open-domain reviews. InConference on Empirical Methods in Natural Language Processing, 2021

work page 2021
[39]

Thumbs up? sentiment classification using machine learning techniques

Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. Thumbs up? sentiment classification using machine learning techniques. InProceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002), pages 79–86, 2002

work page 2002
[40]

Klue: Korean language understanding evaluation

Sungjoon Park, Jihyung Moon, Sungdong Kim, Won Ik Cho, Ji Yoon Han, Jangwon Park, Chisung Song, Junseong Kim, Youngsook Song, Taehwan Oh, et al. Klue: Korean language understanding evaluation. InThirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2)

work page
[41]

Semeval- 2020 task 9: Overview of sentiment analysis of code-mixed tweets

Parth Patwa, Gustavo Aguilar, Sudipta Kar, Suraj Pandey, Srinivas Pykl, Björn Gambäck, Tanmoy Chakraborty, Thamar Solorio, and Amitava Das. Semeval- 2020 task 9: Overview of sentiment analysis of code-mixed tweets. InProceedings of the Fourteenth Workshop on Semantic Evaluation, pages 774–790, 2020

work page 2020
[42]

A review on fairness in machine learning.ACM Computing Surveys (CSUR), 55(3):1–44, 2022

Dana Pessach and Erez Shmueli. A review on fairness in machine learning.ACM Computing Surveys (CSUR), 55(3):1–44, 2022

work page 2022
[43]

Evaluat- ing explanations: How much do explanations from the teacher aid students? Transactions of the Association for Computational Linguistics, 10:359–375, 2022

Danish Pruthi, Rachit Bansal, Bhuwan Dhingra, Livio Baldini Soares, Michael Collins, Zachary C Lipton, Graham Neubig, and William W Cohen. Evaluat- ing explanations: How much do explanations from the teacher aid students? Transactions of the Association for Computational Linguistics, 10:359–375, 2022

work page 2022
[44]

Hottest: Hate and offensive content identification in tamil using transformers and enhanced Raj Kumar et al

Ratnavel Rajalakshmi, Srivarshan Selvaraj, Pavitra Vasudevan, et al. Hottest: Hate and offensive content identification in tamil using transformers and enhanced Raj Kumar et al. stemming.Computer Speech & Language, 78:101464, 2023

work page 2023
[45]

why should i trust you?

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. "why should i trust you?" explaining the predictions of any classifier. InProceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016

work page 2016
[46]

Semeval-2017 task 4: Sentiment analysis in twitter

Sara Rosenthal, Noura Farra, and Preslav Nakov. Semeval-2017 task 4: Sentiment analysis in twitter. InProceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pages 502–518, 2017

work page 2017
[47]

Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead.Nature machine intelligence, 1(5):206–215, 2019

Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead.Nature machine intelligence, 1(5):206–215, 2019

work page 2019
[48]

Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps

Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convo- lutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[49]

Recursive deep models for semantic compositionality over a sentiment treebank

Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D Manning, Andrew Y Ng, and Christopher Potts. Recursive deep models for semantic compositionality over a sentiment treebank. InProceedings of the 2013 conference on empirical methods in natural language processing, pages 1631–1642, 2013

work page 2013
[50]

Axiomatic attribution for deep networks

Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. InInternational conference on machine learning, pages 3319–3328. PMLR, 2017

work page 2017
[51]

Gender bias in sentiment analysis.Online Information Review, 42(1):45–57, 2018

Mike Thelwall. Gender bias in sentiment analysis.Online Information Review, 42(1):45–57, 2018

work page 2018
[52]

Latent aspect rating analysis on review text data: a rating regression approach

Hongning Wang, Yue Lu, and Chengxiang Zhai. Latent aspect rating analysis on review text data: a rating regression approach. InProceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 783–792, 2010

work page 2010
[53]

Are all languages created equal in multilingual bert?ACL 2020, page 120, 2020

Shijie Wu and Mark Dredze. Are all languages created equal in multilingual bert?ACL 2020, page 120, 2020

work page 2020
[54]

Interpretation of sentiment analysis with human-in-the-loop

Vijaya Kumari Yeruva, Mayanka Chandrashekar, Yugyung Lee, Jeff Rydberg-Cox, Virginia Blanton, and Nathan A Oyler. Interpretation of sentiment analysis with human-in-the-loop. In2020 IEEE International Conference on Big Data (Big Data), pages 3099–3108. IEEE Computer Society, 2020

work page 2020
[55]

Common Crawl

Ruiqi Zhong, Steven Shao, and Kathleen McKeown. Fine-grained sentiment analysis with faithful attention.arXiv preprint arXiv:1908.06870, 2019. A Data Split Figure 5 illustrates the distribution of data sources used in our dataset, with the majority coming from YouTube Comments (YT) (51.2%), followed by blogs comments (Blogs), news headlines (News), and Fa...

work page arXiv 1908

[1] [1]

A hybrid deep learning architecture for sentiment analysis

Md Shad Akhtar, Ayush Kumar, Asif Ekbal, and Pushpak Bhattacharyya. A hybrid deep learning architecture for sentiment analysis. InProceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 482–493, 2016

work page 2016

[2] [2]

Explainable artificial intelligence (xai): Con- cepts, taxonomies, opportunities and challenges toward responsible ai.Informa- tion Fusion, 58:82–115, 2020

Alejandro Barredo Arrieta, Natalia Díaz-Rodríguez, Javier Del Ser, Adrien Ben- netot, Siham Tabik, Alberto Barbado, Salvador Garcia, Sergio Gil-Lopez, Daniel Molina, Richard Benjamins, et al. Explainable artificial intelligence (xai): Con- cepts, taxonomies, opportunities and challenges toward responsible ai.Informa- tion Fusion, 58:82–115, 2020

work page 2020

[3] [3]

ferret: a framework for benchmarking explainers on transformers

Giuseppe Attanasio, Eliana Pastor, Chiara Di Bonaventura, and Debora Nozza. ferret: a framework for benchmarking explainers on transformers. InProceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, pages 256–266, 2023. TeSent: A Benchmark Dataset for Fairness-aware Explainable Se...

work page 2023

[4] [4]

Massively multilingual corpus of sentiment datasets and multi-faceted sentiment classification benchmark

Lukasz Augustyniak, Szymon Woźniak, Marcin Gruza, Piotr Gramacki, Krzysztof Rajda, Mikołaj Morzy, and Tomasz Kajdanowicz. Massively multilingual corpus of sentiment datasets and multi-faceted sentiment classification benchmark. Advances in Neural Information Processing Systems, 36:38586–38610, 2023

work page 2023

[5] [5]

Explainable machine learning in deployment

Umang Bhatt, Alice Xiang, Shubham Sharma, Adrian Weller, Ankur Taly, Yunhan Jia, Joydeep Ghosh, Ruchir Puri, José MF Moura, and Peter Eckersley. Explainable machine learning in deployment. InProceedings of the 2020 conference on fairness, accountability, and transparency, pages 648–657, 2020

work page 2020

[6] [6]

A sentiment analysis dataset for code-mixed malayalam- english

Bharathi Raja Chakravarthi, Navya Jose, Shardul Suryawanshi, Elizabeth Sherly, and John Philip McCrae. A sentiment analysis dataset for code-mixed malayalam- english. InProceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under- Resourced Languages (CCURL), pages 177...

work page 2020

[7] [7]

Corpus creation for sentiment analysis in code-mixed tamil-english text

Bharathi Raja Chakravarthi, Vigneshwaran Muralidaran, Ruba Priyadharshini, and John Philip McCrae. Corpus creation for sentiment analysis in code-mixed tamil-english text. InProceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Com- puting for Under-Resourced Languages (CCURL), pag...

work page 2020

[8] [8]

Unsupervised Cross-lingual Representation Learning at Scale

Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guil- laume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, and Veselin Stoyanov. Unsupervised cross-lingual representation learning at scale.arXiv preprint arXiv:1911.02116, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1911

[9] [9]

Toward cultural bias evaluation datasets: The case of bengali gender, religious, and national identity

Dipto Das, Shion Guha, and Bryan Semaan. Toward cultural bias evaluation datasets: The case of bengali gender, religious, and national identity. InProceed- ings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP), pages 68–83, 2023

work page 2023

[10] [10]

colonial impulse

Dipto Das, Shion Guha, Jed R Brubaker, and Bryan Semaan. The“colonial impulse" of natural language processing: An audit of bengali sentiment analysis tools and their identity-based biases. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems, pages 1–18, 2024

work page 2024

[11] [11]

Hate speech and offensive language detection in bengali

Mithun Das, Somnath Banerjee, Punyajoy Saha, and Animesh Mukherjee. Hate speech and offensive language detection in bengali. InProceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Lin- guistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 286–296, 2022

work page 2022

[12] [12]

Bert: Pre-training of deep bidirectional transformers for language understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pages 4171–4186, 2019

work page 2019

[13] [13]

Eraser: A benchmark to evaluate rational- ized nlp models

Jay DeYoung, Sarthak Jain, Nazneen Fatema Rajani, Eric Lehman, Caiming Xiong, Richard Socher, and Byron C Wallace. Eraser: A benchmark to evaluate rational- ized nlp models. InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4443–4458, 2020

work page 2020

[14] [14]

Addressing age-related bias in sentiment analysis

Mark Díaz, Isaac Johnson, Amanda Lazar, Anne Marie Piper, and Darren Gergle. Addressing age-related bias in sentiment analysis. InProceedings of the 2018 chi conference on human factors in computing systems, pages 1–14, 2018

work page 2018

[15] [15]

Towards leaving no indic language behind: Building monolingual corpora, benchmark and models for indic languages

Sumanth Doddapaneni, Rahul Aralikatte, Gowtham Ramesh, Shreya Goyal, Mitesh M Khapra, Anoop Kunchukuttan, and Pratyush Kumar. Towards leaving no indic language behind: Building monolingual corpora, benchmark and models for indic languages. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pa...

work page 2023

[16] [16]

Adaptive recursive neural network for target-dependent twitter sentiment classification

Li Dong, Furu Wei, Chuanqi Tan, Duyu Tang, Ming Zhou, and Ke Xu. Adaptive recursive neural network for target-dependent twitter sentiment classification. InProceedings of the 52nd annual meeting of the association for computational linguistics (volume 2: Short papers), pages 49–54, 2014

work page 2014

[17] [17]

Financial sentiment analysis: Techniques and applications.ACM Computing Surveys, 56(9):1–42, 2024

Kelvin Du, Frank Xing, Rui Mao, and Erik Cambria. Financial sentiment analysis: Techniques and applications.ACM Computing Surveys, 56(9):1–42, 2024

work page 2024

[18] [18]

Teluguner: Leveraging multi-domain named entity recognition with deep transformers

Suma Reddy Duggenpudi, Subba Reddy Oota, Mounika Marreddy, and Radhika Mamidi. Teluguner: Leveraging multi-domain named entity recognition with deep transformers. InProceedings of the 60th annual meeting of the association for computational linguistics: student research workshop, pages 262–272, 2022

work page 2022

[19] [19]

Regulation (EU) 2016/679 of the European Parliament and of the Council

European Parliament and Council of the European Union. Regulation (EU) 2016/679 of the European Parliament and of the Council. URL https://data. europa.eu/eli/reg/2016/679/oj

work page 2016

[20] [20]

Bias beyond english: Counterfactual tests for bias in sentiment analysis in four languages

Seraphina Goldfarb-tarrant, Adam Lopez, Roi Blanco, and Diego Marcheggiani. Bias beyond english: Counterfactual tests for bias in sentiment analysis in four languages. InThe 61st Annual Meeting Of The Association For Computational Linguistics, 2023

work page 2023

[21] [21]

Advanced Analytics, LLC, 2014

Kilem L Gwet.Handbook of inter-rater reliability: The definitive guide to measuring the extent of agreement among raters. Advanced Analytics, LLC, 2014

work page 2014

[22] [22]

Equality of opportunity in supervised learning.Advances in neural information processing systems, 29, 2016

Moritz Hardt, Eric Price, and Nati Srebro. Equality of opportunity in supervised learning.Advances in neural information processing systems, 29, 2016

work page 2016

[23] [23]

A survey on recent approaches for natural language processing in low-resource scenarios

Michael A Hedderich, Lukas Lange, Heike Adel, Jannik Strötgen, and Dietrich Klakow. A survey on recent approaches for natural language processing in low-resource scenarios. InProceedings of the 2021 Conference of the North Ameri- can Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2545–2568, 2021

work page 2021

[24] [24]

Sentigold: A large bangla gold standard multi-domain sentiment analysis dataset and its evaluation

Md Ekramul Islam, Labib Chowdhury, Faisal Ahamed Khan, Shazzad Hossain, Md Sourave Hossain, Mohammad Mamun Or Rashid, Nabeel Mohammed, and Mohammad Ruhul Amin. Sentigold: A large bangla gold standard multi-domain sentiment analysis dataset and its evaluation. InProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 420...

work page 2023

[25] [25]

Attention is not explanation

Sarthak Jain and Byron C Wallace. Attention is not explanation. InProceedings of the 2019 Conference of the North American Chapter of the Association for Com- putational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 3543–3556, 2019

work page 2019

[26] [26]

L3cube-hindbert and devbert: Pre-trained bert transformer models for devanagari based hindi and marathi languages.arXiv preprint arXiv:2211.11418, 2022

Raviraj Joshi. L3cube-hindbert and devbert: Pre-trained bert transformer models for devanagari based hindi and marathi languages.arXiv preprint arXiv:2211.11418, 2022

work page arXiv 2022

[27] [27]

Ammus: A survey of transformer-based pretrained models in natural language processing.arXiv preprint arXiv:2108.05542, 2021

Katikapalli Subramanyam Kalyan, Ajit Rajasekharan, and Sivanesan Sangeetha. Ammus: A survey of transformer-based pretrained models in natural language processing.arXiv preprint arXiv:2108.05542, 2021

work page arXiv 2021

[28] [28]

Muril: Multilingual representations for indian languages

Simran Khanuja, Diksha Bansal, Sarvesh Mehtani, Savya Khosla, Atreyee Dey, Balaji Gopalan, Dilip Kumar Margam, Pooja Aggarwal, Rajiv Teja Nagipogu, Shachi Dave, et al. Muril: Multilingual representations for indian languages. arXiv preprint arXiv:2103.10730, 2021

work page arXiv 2021

[29] [29]

Examining gender and race bias in two hundred sentiment analysis systems

Svetlana Kiritchenko and Saif Mohammad. Examining gender and race bias in two hundred sentiment analysis systems. InProceedings of the Seventh Joint Conference on Lexical and Computational Semantics, pages 43–53, 2018

work page 2018

[30] [30]

L3cubemahasent: A marathi tweet-based sentiment analysis dataset

Atharva Kulkarni, Meet Mandhane, Manali Likhitkar, Gayatri Kshirsagar, and Raviraj Joshi. L3cubemahasent: A marathi tweet-based sentiment analysis dataset. InProceedings of the Eleventh Workshop on Computational Approaches to Subjec- tivity, Sentiment and Social Media Analysis, pages 213–220, 2021

work page 2021

[31] [31]

A unified approach to interpreting model pre- dictions

Scott M Lundberg and Su-In Lee. A unified approach to interpreting model pre- dictions. InProceedings of the 31st International Conference on Neural Information Processing Systems, pages 4768–4777, 2017

work page 2017

[32] [32]

Towards faithful model explanation in nlp: A survey.Computational Linguistics, 50(2):657–723, 2024

Qing Lyu, Marianna Apidianaki, and Chris Callison-Burch. Towards faithful model explanation in nlp: A survey.Computational Linguistics, 50(2):657–723, 2024

work page 2024

[33] [33]

Learning word vectors for sentiment analysis

Andrew Maas, Raymond E Daly, Peter T Pham, Dan Huang, Andrew Y Ng, and Christopher Potts. Learning word vectors for sentiment analysis. InProceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies, pages 142–150, 2011

work page 2011

[34] [34]

Clickbait detection in telugu: Overcoming nlp challenges in resource-poor languages using benchmarked techniques

Mounika Marreddy, Subba Reddy Oota, Lakshmi Sireesha Vakada, Venkata Cha- ran Chinni, and Radhika Mamidi. Clickbait detection in telugu: Overcoming nlp challenges in resource-poor languages using benchmarked techniques. In 2021 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE, 2021

work page 2021

[35] [35]

Am i a resource-poor language? data sets, embeddings, models and analysis for four different nlp tasks in telugu language

Mounika Marreddy, Subba Reddy Oota, Lakshmi Sireesha Vakada, Venkata Cha- ran Chinni, and Radhika Mamidi. Am i a resource-poor language? data sets, embeddings, models and analysis for four different nlp tasks in telugu language. ACM Transactions on Asian and Low-Resource Language Information Processing, 22(1):1–34, 2022

work page 2022

[36] [36]

Hatexplain: A benchmark dataset for explainable hate speech detection

Binny Mathew, Punyajoy Saha, Seid Muhie Yimam, Chris Biemann, Pawan Goyal, and Animesh Mukherjee. Hatexplain: A benchmark dataset for explainable hate speech detection. InProceedings of the AAAI conference on artificial intelligence, volume 35, pages 14867–14875, 2021

work page 2021

[37] [37]

Actsa: Annotated corpus for telugu sentiment analysis

Sandeep Sricharan Mukku and Radhika Mamidi. Actsa: Annotated corpus for telugu sentiment analysis. InProceedings of the first workshop on building lin- guistically generalizable NLP systems, pages 54–58, 2017

work page 2017

[38] [38]

Yaso: A targeted sentiment analysis evaluation dataset for open-domain reviews

Matan Orbach, Orith Toledo-Ronen, Artem Spector, Ranit Aharonov, Yoav Katz, and Noam Slonim. Yaso: A targeted sentiment analysis evaluation dataset for open-domain reviews. InConference on Empirical Methods in Natural Language Processing, 2021

work page 2021

[39] [39]

Thumbs up? sentiment classification using machine learning techniques

Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. Thumbs up? sentiment classification using machine learning techniques. InProceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002), pages 79–86, 2002

work page 2002

[40] [40]

Klue: Korean language understanding evaluation

Sungjoon Park, Jihyung Moon, Sungdong Kim, Won Ik Cho, Ji Yoon Han, Jangwon Park, Chisung Song, Junseong Kim, Youngsook Song, Taehwan Oh, et al. Klue: Korean language understanding evaluation. InThirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2)

work page

[41] [41]

Semeval- 2020 task 9: Overview of sentiment analysis of code-mixed tweets

Parth Patwa, Gustavo Aguilar, Sudipta Kar, Suraj Pandey, Srinivas Pykl, Björn Gambäck, Tanmoy Chakraborty, Thamar Solorio, and Amitava Das. Semeval- 2020 task 9: Overview of sentiment analysis of code-mixed tweets. InProceedings of the Fourteenth Workshop on Semantic Evaluation, pages 774–790, 2020

work page 2020

[42] [42]

A review on fairness in machine learning.ACM Computing Surveys (CSUR), 55(3):1–44, 2022

Dana Pessach and Erez Shmueli. A review on fairness in machine learning.ACM Computing Surveys (CSUR), 55(3):1–44, 2022

work page 2022

[43] [43]

Evaluat- ing explanations: How much do explanations from the teacher aid students? Transactions of the Association for Computational Linguistics, 10:359–375, 2022

Danish Pruthi, Rachit Bansal, Bhuwan Dhingra, Livio Baldini Soares, Michael Collins, Zachary C Lipton, Graham Neubig, and William W Cohen. Evaluat- ing explanations: How much do explanations from the teacher aid students? Transactions of the Association for Computational Linguistics, 10:359–375, 2022

work page 2022

[44] [44]

Hottest: Hate and offensive content identification in tamil using transformers and enhanced Raj Kumar et al

Ratnavel Rajalakshmi, Srivarshan Selvaraj, Pavitra Vasudevan, et al. Hottest: Hate and offensive content identification in tamil using transformers and enhanced Raj Kumar et al. stemming.Computer Speech & Language, 78:101464, 2023

work page 2023

[45] [45]

why should i trust you?

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. "why should i trust you?" explaining the predictions of any classifier. InProceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016

work page 2016

[46] [46]

Semeval-2017 task 4: Sentiment analysis in twitter

Sara Rosenthal, Noura Farra, and Preslav Nakov. Semeval-2017 task 4: Sentiment analysis in twitter. InProceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pages 502–518, 2017

work page 2017

[47] [47]

Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead.Nature machine intelligence, 1(5):206–215, 2019

Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead.Nature machine intelligence, 1(5):206–215, 2019

work page 2019

[48] [48]

Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps

Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convo- lutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[49] [49]

Recursive deep models for semantic compositionality over a sentiment treebank

Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D Manning, Andrew Y Ng, and Christopher Potts. Recursive deep models for semantic compositionality over a sentiment treebank. InProceedings of the 2013 conference on empirical methods in natural language processing, pages 1631–1642, 2013

work page 2013

[50] [50]

Axiomatic attribution for deep networks

Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. InInternational conference on machine learning, pages 3319–3328. PMLR, 2017

work page 2017

[51] [51]

Gender bias in sentiment analysis.Online Information Review, 42(1):45–57, 2018

Mike Thelwall. Gender bias in sentiment analysis.Online Information Review, 42(1):45–57, 2018

work page 2018

[52] [52]

Latent aspect rating analysis on review text data: a rating regression approach

Hongning Wang, Yue Lu, and Chengxiang Zhai. Latent aspect rating analysis on review text data: a rating regression approach. InProceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 783–792, 2010

work page 2010

[53] [53]

Are all languages created equal in multilingual bert?ACL 2020, page 120, 2020

Shijie Wu and Mark Dredze. Are all languages created equal in multilingual bert?ACL 2020, page 120, 2020

work page 2020

[54] [54]

Interpretation of sentiment analysis with human-in-the-loop

Vijaya Kumari Yeruva, Mayanka Chandrashekar, Yugyung Lee, Jeff Rydberg-Cox, Virginia Blanton, and Nathan A Oyler. Interpretation of sentiment analysis with human-in-the-loop. In2020 IEEE International Conference on Big Data (Big Data), pages 3099–3108. IEEE Computer Society, 2020

work page 2020

[55] [55]

Common Crawl

Ruiqi Zhong, Steven Shao, and Kathleen McKeown. Fine-grained sentiment analysis with faithful attention.arXiv preprint arXiv:1908.06870, 2019. A Data Split Figure 5 illustrates the distribution of data sources used in our dataset, with the majority coming from YouTube Comments (YT) (51.2%), followed by blogs comments (Blogs), news headlines (News), and Fa...

work page arXiv 1908