Optimizing Abstractive Summarization With Fine-Tuned PEGASUS

Farig Yousuf Sadeque; Ha-mim Ahmad; Kazi Nazibul Islam; Naimur Rahman; Sadiul Arefin Rafi

arxiv: 2606.25462 · v1 · pith:FKWJEMSQnew · submitted 2026-06-24 · 💻 cs.CL

Optimizing Abstractive Summarization With Fine-Tuned PEGASUS

Sadiul Arefin Rafi , Naimur Rahman , Kazi Nazibul Islam , Ha-mim Ahmad , Farig Yousuf Sadeque This is my paper

Pith reviewed 2026-06-25 20:59 UTC · model grok-4.3

classification 💻 cs.CL

keywords abstractive summarizationPEGASUSXL-Sumfine-tuningROUGE metrictransformer modelmT5 baseline

0 comments

The pith

Fine-tuned PEGASUS achieves state-of-the-art ROUGE scores on XL-Sum English corpus with gains over mT5.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper fine-tunes the PEGASUS model on the XL-Sum English corpus for abstractive summarization. It evaluates the output summaries against human references using the ROUGE metric and reports higher scores than the mT5 baseline. The reported gains are 4.04 percent on ROUGE-1, 15.25 percent on ROUGE-2, and 3.39 percent on ROUGE-L. A reader would care if these gains mean more accurate automatic summaries can be produced without copying source sentences directly.

Core claim

The authors fine-tune PEGASUS on the XL-Sum English corpus and evaluate the generated summaries using ROUGE against human summaries. They claim this gives state-of-the-art performance, with a 4.04% improvement in ROUGE-1, 15.25% increase in ROUGE-2, and 3.39% improvement in ROUGE-L over the baseline mT5 model.

What carries the argument

The fine-tuned PEGASUS model, a transformer sequence-to-sequence architecture pre-trained for summarization and then adapted to the target corpus.

If this is right

Higher ROUGE scores indicate closer matches to human-written summaries on the XL-Sum corpus.
The fine-tuning process improves abstractive output quality over the mT5 baseline.
ROUGE-2 gains suggest better capture of phrase-level overlaps in the generated text.
The approach demonstrates that targeted adaptation of PEGASUS can optimize performance on this dataset.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the gains hold, the model could support more reliable automatic news digests or content tools.
The reported ROUGE improvements might encourage testing the same fine-tuning recipe on other summarization benchmarks.
Future checks could add human preference judgments alongside the automatic metrics to confirm perceived quality.
The method might extend to the non-English portions of XL-Sum to test cross-lingual transfer.

Load-bearing premise

The mT5 baseline was trained and evaluated under conditions directly comparable to the fine-tuned PEGASUS.

What would settle it

A controlled re-run of both models on identical data splits and hyperparameters that shows equal or lower ROUGE scores for the PEGASUS version.

read the original abstract

Abstractive text summarization is the technique of generating a short and concise summary comprising the salient ideas of a source text without making a subset of the salient sentences from the source text. The introduction of transformer models such as BART, T5, and PEGASUS has made this sort of summarization process more efficient and accurate. The objective of this paper is to fine-tune PEGASUS on the XL-Sum English corpus to achieve a better performance compared to the baseline mT5 model. The performance of the generated summaries from the fine-tuned model is evaluated using the ROUGE metric, which basically compares the auto-generated summaries with human-created summaries. To the best of our knowledge, the results from our fine-tuned PEGASUS model give a state-of-the-art performance on the XL-Sum English Corpus. To quantify the improvement, there is a 4.04% improvement in the ROUGE-1 score, a 15.25% increase in the ROUGE-2 score, and a 3.39% improvement in the ROUGE-L score from the baseline model.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Fine-tuning PEGASUS on XL-Sum reports ROUGE gains over mT5 but supplies almost no experimental details, so the comparison is hard to trust.

read the letter

The paper fine-tunes PEGASUS on the English part of XL-Sum and claims it beats an mT5 baseline by 4.04 ROUGE-1, 15.25 ROUGE-2, and 3.39 ROUGE-L points while calling the result SOTA. That is the entire contribution.

They follow the usual recipe: load a known summarization model, train it on a public corpus, and score the output with ROUGE. The ROUGE-2 jump is the most noticeable number. If the runs were controlled, someone who needs a quick reference point on that specific dataset might note the scores.

The problems start with what is missing. The text gives no hyperparameters, no data-split details, no confirmation that the mT5 baseline was re-run with the same tokenizer, optimizer schedule, or evaluation code, and no error bars or multiple seeds. The stress-test concern holds: without that information the reported deltas cannot be cleanly attributed to the model choice. There are also no ablations and no human judgments, so the quality claim rests only on automatic overlap metrics.

The work is straightforward and does not contradict itself, but the evidence is too thin to support the SOTA statement. It is the sort of incremental benchmark note that appears often in the area.

It would interest only a narrow group already tracking XL-Sum English numbers. I would not bring it to a reading group or cite it. It does not look ready for peer review; the central claim cannot be assessed without the missing experimental controls.

Referee Report

2 major / 0 minor

Summary. The manuscript describes fine-tuning the PEGASUS model on the XL-Sum English corpus for abstractive summarization. It evaluates the resulting summaries with ROUGE metrics and claims state-of-the-art performance, quantifying improvements of 4.04% ROUGE-1, 15.25% ROUGE-2, and 3.39% ROUGE-L over an mT5 baseline.

Significance. If the experimental conditions prove comparable and the numbers hold under scrutiny, the work would supply a modest empirical data point for PEGASUS on XL-Sum. The manuscript contains no machine-checked proofs, reproducible code, or parameter-free derivations, so its value rests entirely on the reported ROUGE deltas once the setup is documented.

major comments (2)

[Abstract] Abstract: the central claim of specific percentage improvements and SOTA status on XL-Sum English cannot be assessed because the text supplies no training details, data splits, hyperparameter values, error bars, or ablation studies.
[Abstract] Abstract: the reported 4.04/15.25/3.39% ROUGE gains are not attributable to the model change unless the mT5 baseline was re-run with the identical XL-Sum English partition, optimizer schedule, and evaluation script used for PEGASUS; no section confirms this comparability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We will revise the paper to provide the necessary experimental details and clarifications as outlined below.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim of specific percentage improvements and SOTA status on XL-Sum English cannot be assessed because the text supplies no training details, data splits, hyperparameter values, error bars, or ablation studies.

Authors: We agree with the referee that the abstract and current text lack sufficient details on training, splits, hyperparameters, error bars, and ablations to fully assess the claims. In the revised version, we will include these details in an expanded experimental section. revision: yes
Referee: [Abstract] Abstract: the reported 4.04/15.25/3.39% ROUGE gains are not attributable to the model change unless the mT5 baseline was re-run with the identical XL-Sum English partition, optimizer schedule, and evaluation script used for PEGASUS; no section confirms this comparability.

Authors: We confirm that the mT5 baseline was re-run using the exact same XL-Sum English data partition, training setup, optimizer schedule, and evaluation script as the PEGASUS model to ensure a fair comparison. This was done to attribute the improvements specifically to the model choice. We will explicitly document this in a new subsection on baseline implementation in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical fine-tuning and metric comparison

full rationale

The paper performs standard fine-tuning of PEGASUS on XL-Sum English and reports ROUGE-1/2/L scores against an mT5 baseline. No equations, derivations, fitted parameters renamed as predictions, or self-citation chains appear in the provided text. The central claim (SOTA via 4.04/15.25/3.39% ROUGE gains) rests on external benchmark comparison rather than any self-referential reduction. Baseline comparability is a potential correctness issue, not circularity per the evaluation rules.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review prevents enumeration of specific free parameters or axioms; work implicitly relies on standard transformer fine-tuning assumptions and the domain assumption that ROUGE correlates with summary quality.

pith-pipeline@v0.9.1-grok · 5745 in / 1150 out tokens · 24224 ms · 2026-06-25T20:59:19.410514+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

300 extracted references · 142 canonical work pages

[1]

Information , VOLUME =

Abdel-Salam, Shehab and Rafea, Ahmed , TITLE =. Information , VOLUME =. 2022 , NUMBER =

2022
[2]

arXiv preprint arXiv:1704.04368 , year=

Get to the point: Summarization with pointer-generator networks , author=. arXiv preprint arXiv:1704.04368 , year=

Pith/arXiv arXiv
[3]

arXiv preprint arXiv:2012.14136 , year=

On generating extended summaries of long documents , author=. arXiv preprint arXiv:2012.14136 , year=

arXiv 2012
[4]

arXiv preprint arXiv:1905.01975 , year=

Point-less: More abstractive summarization with pointer-generator networks , author=. arXiv preprint arXiv:1905.01975 , year=

Pith/arXiv arXiv 1905
[5]

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval , pages=

Generic text summarization using relevance measure and latent semantic analysis , author=. Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval , pages=
[6]

arXiv preprint arXiv:1602.06023 , year=

Abstractive text summarization using sequence-to-sequence rnns and beyond , author=. arXiv preprint arXiv:1602.06023 , year=

Pith/arXiv arXiv
[7]

Proceedings of the 2004 conference on empirical methods in natural language processing , pages=

Textrank: Bringing order into text , author=. Proceedings of the 2004 conference on empirical methods in natural language processing , pages=

2004
[8]

International Journal of Computer Applications , volume=

Comparative study of text summarization methods , author=. International Journal of Computer Applications , volume=. 2014 , publisher=

2014
[9]

Information Processing & Management , volume=

Text summarization using Wikipedia , author=. Information Processing & Management , volume=. 2014 , publisher=

2014
[10]

arXiv preprint arXiv:2006.01997 , year=

Automatic text summarization of covid-19 medical research articles using bert and gpt-2 , author=. arXiv preprint arXiv:2006.01997 , year=

arXiv 2006
[11]

International Conference on Machine Learning , pages=

Pegasus: Pre-training with extracted gap-sentences for abstractive summarization , author=. International Conference on Machine Learning , pages=. 2020 , organization=

2020
[12]

Sustainable Advanced Computing , pages=

Automated news summarization using transformers , author=. Sustainable Advanced Computing , pages=. 2022 , publisher=

2022
[13]

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) , pages=

On extractive and abstractive neural document summarization with transformer language models , author=. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) , pages=

2020
[14]

arXiv preprint arXiv:2102.09130 , year=

Entity-level factual consistency of abstractive text summarization , author=. arXiv preprint arXiv:2102.09130 , year=

arXiv
[15]

arXiv preprint arXiv:2012.00052 , year=

Systematically exploring redundancy reduction in summarizing long documents , author=. arXiv preprint arXiv:2012.00052 , year=

arXiv 2012
[16]

arXiv preprint arXiv:1910.13461 , year=

Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension , author=. arXiv preprint arXiv:1910.13461 , year=

Pith/arXiv arXiv 1910
[17]

The Journal of Machine Learning Research , volume=

Exploring the limits of transfer learning with a unified text-to-text transformer , author=. The Journal of Machine Learning Research , volume=. 2020 , publisher=

2020
[18]

Liu , title =

Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu , title =. arXiv e-prints , year =
[19]

, title =

Raffel, Colin and Shazeer, Noam and Roberts, Adam and Lee, Katherine and Narang, Sharan and Matena, Michael and Zhou, Yanqi and Li, Wei and Liu, Peter J. , title =. J. Mach. Learn. Res. , month =. 2020 , issue_date =

2020
[20]

2017 , URL =

Get To The Point: Summarization with Pointer-Generator Networks , author =. 2017 , URL =

2017
[21]

On Faithfulness and Factuality in Abstractive Summarization

Maynez, Joshua and Narayan, Shashi and Bohnet, Bernd and McDonald, Ryan. On Faithfulness and Factuality in Abstractive Summarization. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. doi:10.18653/v1/2020.acl-main.173

work page doi:10.18653/v1/2020.acl-main.173 2020
[22]

Cohen and Mirella Lapata

Shashi Narayan and Shay B. Cohen and Mirella Lapata. Don't Give Me the Details, Just the Summary! T opic-Aware Convolutional Neural Networks for Extreme Summarization. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018

2018
[23]

High-risk learning: acquiring new word vectors from tiny data

Herbelot, Aur \'e lie and Baroni, Marco. High-risk learning: acquiring new word vectors from tiny data. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2017. doi:10.18653/v1/D17-1030

work page doi:10.18653/v1/d17-1030 2017
[24]

and Lapata, Mirella

Narayan, Shashi and Cohen, Shay B. and Lapata, Mirella. Don ' t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018. doi:10.18653/v1/D18-1206

work page doi:10.18653/v1/d18-1206 2018
[25]

Saiful and Mubasshir, Kazi and Li, Yuan-Fang and Kang, Yong-Bin and Rahman, M

Hasan, Tahmid and Bhattacharjee, Abhik and Islam, Md. Saiful and Mubasshir, Kazi and Li, Yuan-Fang and Kang, Yong-Bin and Rahman, M. Sohel and Shahriyar, Rifat. XL -Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages. Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. 2021. doi:10.18653/v1/2021.findings-acl.413

work page doi:10.18653/v1/2021.findings-acl.413 2021
[26]

Advances in neural information processing systems , volume=

A unified approach to interpreting model predictions , author=. Advances in neural information processing systems , volume=
[27]

Proceedings of the 3rd Wordplay: When Language Meets Games Workshop (Wordplay 2022). 2022

2022
[28]

A Systematic Survey of Text Worlds as Embodied Natural Language Environments

Jansen, Peter. A Systematic Survey of Text Worlds as Embodied Natural Language Environments. Proceedings of the 3rd Wordplay: When Language Meets Games Workshop (Wordplay 2022). 2022. doi:10.18653/v1/2022.wordplay-1.1

work page doi:10.18653/v1/2022.wordplay-1.1 2022
[29]

A Minimal Computational Improviser Based on Oral Thought

Montfort, Nick and Bartlett Fernandez, Sebastian. A Minimal Computational Improviser Based on Oral Thought. Proceedings of the 3rd Wordplay: When Language Meets Games Workshop (Wordplay 2022). 2022. doi:10.18653/v1/2022.wordplay-1.2

work page doi:10.18653/v1/2022.wordplay-1.2 2022
[30]

Craft an Iron Sword: Dynamically Generating Interactive Game Characters by Prompting Large Language Models Tuned on Code

Volum, Ryan and Rao, Sudha and Xu, Michael and DesGarennes, Gabriel and Brockett, Chris and Van Durme, Benjamin and Deng, Olivia and Malhotra, Akanksha and Dolan, Bill. Craft an Iron Sword: Dynamically Generating Interactive Game Characters by Prompting Large Language Models Tuned on Code. Proceedings of the 3rd Wordplay: When Language Meets Games Worksho...

work page doi:10.18653/v1/2022.wordplay-1.3 2022
[31]

A Sequence Modelling Approach to Question Answering in Text-Based Games

Furman, Gregory and Toledo, Edan and Shock, Jonathan and Buys, Jan. A Sequence Modelling Approach to Question Answering in Text-Based Games. Proceedings of the 3rd Wordplay: When Language Meets Games Workshop (Wordplay 2022). 2022. doi:10.18653/v1/2022.wordplay-1.4

work page doi:10.18653/v1/2022.wordplay-1.4 2022
[32]

Automatic Exploration of Textual Environments with Language-Conditioned Autotelic Agents

Teodorescu, Laetitia and Yuan, Xingdi and C \^o t \'e , Marc-Alexandre and Oudeyer, Pierre-Yves. Automatic Exploration of Textual Environments with Language-Conditioned Autotelic Agents. Proceedings of the 3rd Wordplay: When Language Meets Games Workshop (Wordplay 2022). 2022. doi:10.18653/v1/2022.wordplay-1.5

work page doi:10.18653/v1/2022.wordplay-1.5 2022
[33]

Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022

2022
[34]

Separating Hate Speech and Offensive Language Classes via Adversarial Debiasing

Yuan, Shuzhou and Maronikolakis, Antonis and Sch. Separating Hate Speech and Offensive Language Classes via Adversarial Debiasing. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.1

work page doi:10.18653/v1/2022.woah-1.1 2022
[35]

Towards Automatic Generation of Messages Countering Online Hate Speech and Microaggressions

Ashida, Mana and Komachi, Mamoru. Towards Automatic Generation of Messages Countering Online Hate Speech and Microaggressions. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.2

work page doi:10.18653/v1/2022.woah-1.2 2022
[36]

G rease V ision: Rewriting the Rules of the Interface

Datta, Siddhartha and Kollnig, Konrad and Shadbolt, Nigel. G rease V ision: Rewriting the Rules of the Interface. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.3

work page doi:10.18653/v1/2022.woah-1.3 2022
[37]

Improving Generalization of Hate Speech Detection Systems to Novel Target Groups via Domain Adaptation

Ludwig, Florian and Dolos, Klara and Zesch, Torsten and Hobley, Eleanor. Improving Generalization of Hate Speech Detection Systems to Novel Target Groups via Domain Adaptation. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.4

work page doi:10.18653/v1/2022.woah-1.4 2022
[38]

`` Zo Grof ! '' : A Comprehensive Corpus for Offensive and Abusive Language in D utch

Ruitenbeek, Ward and Zwart, Victor and Van Der Noord, Robin and Gnezdilov, Zhenja and Caselli, Tommaso. `` Zo Grof ! '' : A Comprehensive Corpus for Offensive and Abusive Language in D utch. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.5

work page doi:10.18653/v1/2022.woah-1.5 2022
[39]

Counter- TWIT : An I talian Corpus for Online Counterspeech in Ecological Contexts

Goffredo, Pierpaolo and Basile, Valerio and Cepollaro, Bianca and Patti, Viviana. Counter- TWIT : An I talian Corpus for Online Counterspeech in Ecological Contexts. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.6

work page doi:10.18653/v1/2022.woah-1.6 2022
[40]

S tereo KG : Data-Driven Knowledge Graph Construction For Cultural Knowledge and Stereotypes

Deshpande, Awantee and Ruiter, Dana and Mosbach, Marius and Klakow, Dietrich. S tereo KG : Data-Driven Knowledge Graph Construction For Cultural Knowledge and Stereotypes. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.7

work page doi:10.18653/v1/2022.woah-1.7 2022
[41]

The subtle language of exclusion: Identifying the Toxic Speech of Trans-exclusionary Radical Feminists

Lu, Christina and Jurgens, David. The subtle language of exclusion: Identifying the Toxic Speech of Trans-exclusionary Radical Feminists. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.8

work page doi:10.18653/v1/2022.woah-1.8 2022
[42]

Lost in Distillation: A Case Study in Toxicity Modeling

Chvasta, Alyssa and Lees, Alyssa and Sorensen, Jeffrey and Vasserman, Lucy and Goyal, Nitesh. Lost in Distillation: A Case Study in Toxicity Modeling. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.9

work page doi:10.18653/v1/2022.woah-1.9 2022
[43]

Cleansing & expanding the HURTLEX (el) with a multidimensional categorization of offensive words

Stamou, Vivian and Alexiou, Iakovi and Klimi, Antigone and Molou, Eleftheria and Saivanidou, Alexandra and Markantonatou, Stella. Cleansing & expanding the HURTLEX (el) with a multidimensional categorization of offensive words. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.10

work page doi:10.18653/v1/2022.woah-1.10 2022
[44]

Free speech or Free Hate Speech? Analyzing the Proliferation of Hate Speech in Parler

Israeli, Abraham and Tsur, Oren. Free speech or Free Hate Speech? Analyzing the Proliferation of Hate Speech in Parler. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.11

work page doi:10.18653/v1/2022.woah-1.11 2022
[45]

Resources for Multilingual Hate Speech Detection

Arango Monnar, Ayme and Perez, Jorge and Poblete, Barbara and Salda \ n a, Magdalena and Proust, Valentina. Resources for Multilingual Hate Speech Detection. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.12

work page doi:10.18653/v1/2022.woah-1.12 2022
[46]

Enriching Abusive Language Detection with Community Context

Saleem, Haji Mohammad and Kurrek, Jana and Ruths, Derek. Enriching Abusive Language Detection with Community Context. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.13

work page doi:10.18653/v1/2022.woah-1.13 2022
[47]

DeTox: A Comprehensive Dataset for G erman Offensive Language and Conversation Analysis

Demus, Christoph and Pitz, Jonas and Sch. DeTox: A Comprehensive Dataset for G erman Offensive Language and Conversation Analysis. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.14

work page doi:10.18653/v1/2022.woah-1.14 2022
[48]

Multilingual H ate C heck: Functional Tests for Multilingual Hate Speech Detection Models

R. Multilingual H ate C heck: Functional Tests for Multilingual Hate Speech Detection Models. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.15

work page doi:10.18653/v1/2022.woah-1.15 2022
[49]

Distributional properties of political dogwhistle representations in S wedish BERT

Hertzberg, Niclas and Cooper, Robin and Lindgren, Elina and R. Distributional properties of political dogwhistle representations in S wedish BERT. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.16

work page doi:10.18653/v1/2022.woah-1.16 2022
[50]

Hate Speech Criteria: A Modular Approach to Task-Specific Hate Speech Definitions

Khurana, Urja and Vermeulen, Ivar and Nalisnick, Eric and Van Noorloos, Marloes and Fokkens, Antske. Hate Speech Criteria: A Modular Approach to Task-Specific Hate Speech Definitions. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.17

work page doi:10.18653/v1/2022.woah-1.17 2022
[51]

Accounting for Offensive Speech as a Practice of Resistance

Diaz, Mark and Amironesei, Razvan and Weidinger, Laura and Gabriel, Iason. Accounting for Offensive Speech as a Practice of Resistance. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.18

work page doi:10.18653/v1/2022.woah-1.18 2022
[52]

Towards a Multi-Entity Aspect-Based Sentiment Analysis for Characterizing Directed Social Regard in Online Messaging

Zheng, Joan and Friedman, Scott and Schmer-galunder, Sonja and Magnusson, Ian and Wheelock, Ruta and Gottlieb, Jeremy and Gomez, Diana and Miller, Christopher. Towards a Multi-Entity Aspect-Based Sentiment Analysis for Characterizing Directed Social Regard in Online Messaging. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:1...

work page doi:10.18653/v1/2022.woah-1.19 2022
[53]

Flexible text generation for counterfactual fairness probing

Fryer, Zee and Axelrod, Vera and Packer, Ben and Beutel, Alex and Chen, Jilin and Webster, Kellie. Flexible text generation for counterfactual fairness probing. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.20

work page doi:10.18653/v1/2022.woah-1.20 2022
[54]

Users Hate Blondes: Detecting Sexism in User Comments on Online R omanian News

Moldovan, Andreea and Cs. Users Hate Blondes: Detecting Sexism in User Comments on Online R omanian News. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.21

work page doi:10.18653/v1/2022.woah-1.21 2022
[55]

Targeted Identity Group Prediction in Hate Speech Corpora

Sachdeva, Pratik and Barreto, Renata and Von Vacano, Claudia and Kennedy, Chris. Targeted Identity Group Prediction in Hate Speech Corpora. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.22

work page doi:10.18653/v1/2022.woah-1.22 2022
[56]

Revisiting Queer Minorities in Lexicons

Ramesh, Krithika and Kumar, Sumeet and Khudabukhsh, Ashiqur. Revisiting Queer Minorities in Lexicons. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.23

work page doi:10.18653/v1/2022.woah-1.23 2022
[57]

HATE - ITA : Hate Speech Detection in I talian Social Media Text

Nozza, Debora and Bianchi, Federico and Attanasio, Giuseppe. HATE - ITA : Hate Speech Detection in I talian Social Media Text. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.24

work page doi:10.18653/v1/2022.woah-1.24 2022
[58]

Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022
[59]

Changes in Tweet Geolocation over Time: A Study with Carmen 2.0

Zhang, Jingyu and DeLucia, Alexandra and Dredze, Mark. Changes in Tweet Geolocation over Time: A Study with Carmen 2.0. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022
[60]

Extracting Mathematical Concepts from Text

Collard, Jacob and de Paiva, Valeria and Fong, Brendan and Subrahmanian, Eswaran. Extracting Mathematical Concepts from Text. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022
[61]

Data-driven Approach to Differentiating between Depression and Dementia from Noisy Speech and Language Data

Ehghaghi, Malikeh and Rudzicz, Frank and Novikova, Jekaterina. Data-driven Approach to Differentiating between Depression and Dementia from Noisy Speech and Language Data. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022
[62]

Cross-Dialect Social Media Dependency Parsing for Social Scientific Entity Attribute Analysis

Eggleston, Chloe and O ' Connor, Brendan. Cross-Dialect Social Media Dependency Parsing for Social Scientific Entity Attribute Analysis. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022
[63]

Impact of Environmental Noise on A lzheimer ' s Disease Detection from Speech: Should You Let a Baby Cry?

Novikova, Jekaterina. Impact of Environmental Noise on A lzheimer ' s Disease Detection from Speech: Should You Let a Baby Cry?. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022
[64]

Exploring Multimodal Features and Fusion Strategies for Analyzing Disaster Tweets

Pranesh, Raj. Exploring Multimodal Features and Fusion Strategies for Analyzing Disaster Tweets. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022
[65]

NTULM : Enriching Social Media Text Representations with Non-Textual Units

Li, Jinning and Mishra, Shubhanshu and El-Kishky, Ahmed and Mehta, Sneha and Kulkarni, Vivek. NTULM : Enriching Social Media Text Representations with Non-Textual Units. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022
[66]

Robust Candidate Generation for Entity Linking on Short Social Media Texts

Hebert, Liam and Makki, Raheleh and Mishra, Shubhanshu and Saghir, Hamidreza and Kamath, Anusha and Merhav, Yuval. Robust Candidate Generation for Entity Linking on Short Social Media Texts. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022
[67]

T rans POS : Transformers for Consolidating Different POS Tagset Datasets

Li, Alex and Bankole-Hameed, Ilyas and Singh, Ranadeep and Ng, Gabriel and Gupta, Akshat. T rans POS : Transformers for Consolidating Different POS Tagset Datasets. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022
[68]

An Effective, Performant Named Entity Recognition System for Noisy Business Telephone Conversation Transcripts

Fu, Xue-Yong and Chen, Cheng and Laskar, Md Tahmid Rahman and Tn, Shashi Bhushan and Corston-Oliver, Simon. An Effective, Performant Named Entity Recognition System for Noisy Business Telephone Conversation Transcripts. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022
[69]

Leveraging Semantic and Sentiment Knowledge for User-Generated Text Sentiment Classification

Khan, Jawad and Ahmad, Niaz and Alam, Aftab and Lee, Youngmoon. Leveraging Semantic and Sentiment Knowledge for User-Generated Text Sentiment Classification. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022
[70]

An Emotional Journey: Detecting Emotion Trajectories in D utch Customer Service Dialogues

Labat, Sofie and Hadifar, Amir and Demeester, Thomas and Hoste, Veronique. An Emotional Journey: Detecting Emotion Trajectories in D utch Customer Service Dialogues. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022
[71]

Supervised and Unsupervised Evaluation of Synthetic Code-Switching

Orlov, Evgeny and Artemova, Ekaterina. Supervised and Unsupervised Evaluation of Synthetic Code-Switching. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022
[72]

A rab G end: Gender Analysis and Inference on A rabic T witter

Mubarak, Hamdy and Chowdhury, Shammur Absar and Alam, Firoj. A rab G end: Gender Analysis and Inference on A rabic T witter. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022
[73]

Automatic Identification of 5 C Vaccine Behaviour on Social Media

Sampath Kumar, Ajay Hemanth and Shausan, Aminath and Demartini, Gianluca and Rahimi, Afshin. Automatic Identification of 5 C Vaccine Behaviour on Social Media. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022
[74]

Automatic Extraction of Structured Mineral Drillhole Results from Unstructured Mining Company Reports

Dimeski, Adam and Rahimi, Afshin. Automatic Extraction of Structured Mineral Drillhole Results from Unstructured Mining Company Reports. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022
[75]

`` Kanglish alli names! '' Named Entity Recognition for K annada- E nglish Code-Mixed Social Media Data

S, Sumukh and Shrivastava, Manish. `` Kanglish alli names! '' Named Entity Recognition for K annada- E nglish Code-Mixed Social Media Data. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022
[76]

Span Extraction Aided Improved Code-mixed Sentiment Classification

S, Ramaneswaran and Benhur, Sean and Ghosh, Sreyan. Span Extraction Aided Improved Code-mixed Sentiment Classification. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022
[77]

A d BERT : An Effective Few Shot Learning Framework for Aligning Tweets to Superbowl Advertisements

Das, Debarati and Chenchu, Roopana and Abdollahi, Maral and Huh, Jisu and Srivastava, Jaideep. A d BERT : An Effective Few Shot Learning Framework for Aligning Tweets to Superbowl Advertisements. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022
[78]

Increasing Robustness for Cross-domain Dialogue Act Classification on Social Media Data

Vielsted, Marcus and Wallenius, Nikolaj and van der Goot, Rob. Increasing Robustness for Cross-domain Dialogue Act Classification on Social Media Data. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022
[79]

Disfluency Detection for V ietnamese

Dao, Mai Hoang and Truong, Thinh Hung and Nguyen, Dat Quoc. Disfluency Detection for V ietnamese. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022
[80]

A multi-level approach for hierarchical Ticket Classification

Marcuzzo, Matteo and Zangari, Alessandro and Schiavinato, Michele and Giudice, Lorenzo and Gasparetto, Andrea and Albarelli, Andrea. A multi-level approach for hierarchical Ticket Classification. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022

Showing first 80 references.

[1] [1]

Information , VOLUME =

Abdel-Salam, Shehab and Rafea, Ahmed , TITLE =. Information , VOLUME =. 2022 , NUMBER =

2022

[2] [2]

arXiv preprint arXiv:1704.04368 , year=

Get to the point: Summarization with pointer-generator networks , author=. arXiv preprint arXiv:1704.04368 , year=

Pith/arXiv arXiv

[3] [3]

arXiv preprint arXiv:2012.14136 , year=

On generating extended summaries of long documents , author=. arXiv preprint arXiv:2012.14136 , year=

arXiv 2012

[4] [4]

arXiv preprint arXiv:1905.01975 , year=

Point-less: More abstractive summarization with pointer-generator networks , author=. arXiv preprint arXiv:1905.01975 , year=

Pith/arXiv arXiv 1905

[5] [5]

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval , pages=

Generic text summarization using relevance measure and latent semantic analysis , author=. Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval , pages=

[6] [6]

arXiv preprint arXiv:1602.06023 , year=

Abstractive text summarization using sequence-to-sequence rnns and beyond , author=. arXiv preprint arXiv:1602.06023 , year=

Pith/arXiv arXiv

[7] [7]

Proceedings of the 2004 conference on empirical methods in natural language processing , pages=

Textrank: Bringing order into text , author=. Proceedings of the 2004 conference on empirical methods in natural language processing , pages=

2004

[8] [8]

International Journal of Computer Applications , volume=

Comparative study of text summarization methods , author=. International Journal of Computer Applications , volume=. 2014 , publisher=

2014

[9] [9]

Information Processing & Management , volume=

Text summarization using Wikipedia , author=. Information Processing & Management , volume=. 2014 , publisher=

2014

[10] [10]

arXiv preprint arXiv:2006.01997 , year=

Automatic text summarization of covid-19 medical research articles using bert and gpt-2 , author=. arXiv preprint arXiv:2006.01997 , year=

arXiv 2006

[11] [11]

International Conference on Machine Learning , pages=

Pegasus: Pre-training with extracted gap-sentences for abstractive summarization , author=. International Conference on Machine Learning , pages=. 2020 , organization=

2020

[12] [12]

Sustainable Advanced Computing , pages=

Automated news summarization using transformers , author=. Sustainable Advanced Computing , pages=. 2022 , publisher=

2022

[13] [13]

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) , pages=

On extractive and abstractive neural document summarization with transformer language models , author=. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) , pages=

2020

[14] [14]

arXiv preprint arXiv:2102.09130 , year=

Entity-level factual consistency of abstractive text summarization , author=. arXiv preprint arXiv:2102.09130 , year=

arXiv

[15] [15]

arXiv preprint arXiv:2012.00052 , year=

Systematically exploring redundancy reduction in summarizing long documents , author=. arXiv preprint arXiv:2012.00052 , year=

arXiv 2012

[16] [16]

arXiv preprint arXiv:1910.13461 , year=

Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension , author=. arXiv preprint arXiv:1910.13461 , year=

Pith/arXiv arXiv 1910

[17] [17]

The Journal of Machine Learning Research , volume=

Exploring the limits of transfer learning with a unified text-to-text transformer , author=. The Journal of Machine Learning Research , volume=. 2020 , publisher=

2020

[18] [18]

Liu , title =

Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu , title =. arXiv e-prints , year =

[19] [19]

, title =

Raffel, Colin and Shazeer, Noam and Roberts, Adam and Lee, Katherine and Narang, Sharan and Matena, Michael and Zhou, Yanqi and Li, Wei and Liu, Peter J. , title =. J. Mach. Learn. Res. , month =. 2020 , issue_date =

2020

[20] [20]

2017 , URL =

Get To The Point: Summarization with Pointer-Generator Networks , author =. 2017 , URL =

2017

[21] [21]

On Faithfulness and Factuality in Abstractive Summarization

Maynez, Joshua and Narayan, Shashi and Bohnet, Bernd and McDonald, Ryan. On Faithfulness and Factuality in Abstractive Summarization. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. doi:10.18653/v1/2020.acl-main.173

work page doi:10.18653/v1/2020.acl-main.173 2020

[22] [22]

Cohen and Mirella Lapata

Shashi Narayan and Shay B. Cohen and Mirella Lapata. Don't Give Me the Details, Just the Summary! T opic-Aware Convolutional Neural Networks for Extreme Summarization. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018

2018

[23] [23]

High-risk learning: acquiring new word vectors from tiny data

Herbelot, Aur \'e lie and Baroni, Marco. High-risk learning: acquiring new word vectors from tiny data. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2017. doi:10.18653/v1/D17-1030

work page doi:10.18653/v1/d17-1030 2017

[24] [24]

and Lapata, Mirella

Narayan, Shashi and Cohen, Shay B. and Lapata, Mirella. Don ' t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018. doi:10.18653/v1/D18-1206

work page doi:10.18653/v1/d18-1206 2018

[25] [25]

Saiful and Mubasshir, Kazi and Li, Yuan-Fang and Kang, Yong-Bin and Rahman, M

Hasan, Tahmid and Bhattacharjee, Abhik and Islam, Md. Saiful and Mubasshir, Kazi and Li, Yuan-Fang and Kang, Yong-Bin and Rahman, M. Sohel and Shahriyar, Rifat. XL -Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages. Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. 2021. doi:10.18653/v1/2021.findings-acl.413

work page doi:10.18653/v1/2021.findings-acl.413 2021

[26] [26]

Advances in neural information processing systems , volume=

A unified approach to interpreting model predictions , author=. Advances in neural information processing systems , volume=

[27] [27]

Proceedings of the 3rd Wordplay: When Language Meets Games Workshop (Wordplay 2022). 2022

2022

[28] [28]

A Systematic Survey of Text Worlds as Embodied Natural Language Environments

Jansen, Peter. A Systematic Survey of Text Worlds as Embodied Natural Language Environments. Proceedings of the 3rd Wordplay: When Language Meets Games Workshop (Wordplay 2022). 2022. doi:10.18653/v1/2022.wordplay-1.1

work page doi:10.18653/v1/2022.wordplay-1.1 2022

[29] [29]

A Minimal Computational Improviser Based on Oral Thought

Montfort, Nick and Bartlett Fernandez, Sebastian. A Minimal Computational Improviser Based on Oral Thought. Proceedings of the 3rd Wordplay: When Language Meets Games Workshop (Wordplay 2022). 2022. doi:10.18653/v1/2022.wordplay-1.2

work page doi:10.18653/v1/2022.wordplay-1.2 2022

[30] [30]

Craft an Iron Sword: Dynamically Generating Interactive Game Characters by Prompting Large Language Models Tuned on Code

Volum, Ryan and Rao, Sudha and Xu, Michael and DesGarennes, Gabriel and Brockett, Chris and Van Durme, Benjamin and Deng, Olivia and Malhotra, Akanksha and Dolan, Bill. Craft an Iron Sword: Dynamically Generating Interactive Game Characters by Prompting Large Language Models Tuned on Code. Proceedings of the 3rd Wordplay: When Language Meets Games Worksho...

work page doi:10.18653/v1/2022.wordplay-1.3 2022

[31] [31]

A Sequence Modelling Approach to Question Answering in Text-Based Games

Furman, Gregory and Toledo, Edan and Shock, Jonathan and Buys, Jan. A Sequence Modelling Approach to Question Answering in Text-Based Games. Proceedings of the 3rd Wordplay: When Language Meets Games Workshop (Wordplay 2022). 2022. doi:10.18653/v1/2022.wordplay-1.4

work page doi:10.18653/v1/2022.wordplay-1.4 2022

[32] [32]

Automatic Exploration of Textual Environments with Language-Conditioned Autotelic Agents

Teodorescu, Laetitia and Yuan, Xingdi and C \^o t \'e , Marc-Alexandre and Oudeyer, Pierre-Yves. Automatic Exploration of Textual Environments with Language-Conditioned Autotelic Agents. Proceedings of the 3rd Wordplay: When Language Meets Games Workshop (Wordplay 2022). 2022. doi:10.18653/v1/2022.wordplay-1.5

work page doi:10.18653/v1/2022.wordplay-1.5 2022

[33] [33]

Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022

2022

[34] [34]

Separating Hate Speech and Offensive Language Classes via Adversarial Debiasing

Yuan, Shuzhou and Maronikolakis, Antonis and Sch. Separating Hate Speech and Offensive Language Classes via Adversarial Debiasing. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.1

work page doi:10.18653/v1/2022.woah-1.1 2022

[35] [35]

Towards Automatic Generation of Messages Countering Online Hate Speech and Microaggressions

Ashida, Mana and Komachi, Mamoru. Towards Automatic Generation of Messages Countering Online Hate Speech and Microaggressions. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.2

work page doi:10.18653/v1/2022.woah-1.2 2022

[36] [36]

G rease V ision: Rewriting the Rules of the Interface

Datta, Siddhartha and Kollnig, Konrad and Shadbolt, Nigel. G rease V ision: Rewriting the Rules of the Interface. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.3

work page doi:10.18653/v1/2022.woah-1.3 2022

[37] [37]

Improving Generalization of Hate Speech Detection Systems to Novel Target Groups via Domain Adaptation

Ludwig, Florian and Dolos, Klara and Zesch, Torsten and Hobley, Eleanor. Improving Generalization of Hate Speech Detection Systems to Novel Target Groups via Domain Adaptation. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.4

work page doi:10.18653/v1/2022.woah-1.4 2022

[38] [38]

`` Zo Grof ! '' : A Comprehensive Corpus for Offensive and Abusive Language in D utch

Ruitenbeek, Ward and Zwart, Victor and Van Der Noord, Robin and Gnezdilov, Zhenja and Caselli, Tommaso. `` Zo Grof ! '' : A Comprehensive Corpus for Offensive and Abusive Language in D utch. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.5

work page doi:10.18653/v1/2022.woah-1.5 2022

[39] [39]

Counter- TWIT : An I talian Corpus for Online Counterspeech in Ecological Contexts

Goffredo, Pierpaolo and Basile, Valerio and Cepollaro, Bianca and Patti, Viviana. Counter- TWIT : An I talian Corpus for Online Counterspeech in Ecological Contexts. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.6

work page doi:10.18653/v1/2022.woah-1.6 2022

[40] [40]

S tereo KG : Data-Driven Knowledge Graph Construction For Cultural Knowledge and Stereotypes

Deshpande, Awantee and Ruiter, Dana and Mosbach, Marius and Klakow, Dietrich. S tereo KG : Data-Driven Knowledge Graph Construction For Cultural Knowledge and Stereotypes. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.7

work page doi:10.18653/v1/2022.woah-1.7 2022

[41] [41]

The subtle language of exclusion: Identifying the Toxic Speech of Trans-exclusionary Radical Feminists

Lu, Christina and Jurgens, David. The subtle language of exclusion: Identifying the Toxic Speech of Trans-exclusionary Radical Feminists. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.8

work page doi:10.18653/v1/2022.woah-1.8 2022

[42] [42]

Lost in Distillation: A Case Study in Toxicity Modeling

Chvasta, Alyssa and Lees, Alyssa and Sorensen, Jeffrey and Vasserman, Lucy and Goyal, Nitesh. Lost in Distillation: A Case Study in Toxicity Modeling. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.9

work page doi:10.18653/v1/2022.woah-1.9 2022

[43] [43]

Cleansing & expanding the HURTLEX (el) with a multidimensional categorization of offensive words

Stamou, Vivian and Alexiou, Iakovi and Klimi, Antigone and Molou, Eleftheria and Saivanidou, Alexandra and Markantonatou, Stella. Cleansing & expanding the HURTLEX (el) with a multidimensional categorization of offensive words. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.10

work page doi:10.18653/v1/2022.woah-1.10 2022

[44] [44]

Free speech or Free Hate Speech? Analyzing the Proliferation of Hate Speech in Parler

Israeli, Abraham and Tsur, Oren. Free speech or Free Hate Speech? Analyzing the Proliferation of Hate Speech in Parler. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.11

work page doi:10.18653/v1/2022.woah-1.11 2022

[45] [45]

Resources for Multilingual Hate Speech Detection

Arango Monnar, Ayme and Perez, Jorge and Poblete, Barbara and Salda \ n a, Magdalena and Proust, Valentina. Resources for Multilingual Hate Speech Detection. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.12

work page doi:10.18653/v1/2022.woah-1.12 2022

[46] [46]

Enriching Abusive Language Detection with Community Context

Saleem, Haji Mohammad and Kurrek, Jana and Ruths, Derek. Enriching Abusive Language Detection with Community Context. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.13

work page doi:10.18653/v1/2022.woah-1.13 2022

[47] [47]

DeTox: A Comprehensive Dataset for G erman Offensive Language and Conversation Analysis

Demus, Christoph and Pitz, Jonas and Sch. DeTox: A Comprehensive Dataset for G erman Offensive Language and Conversation Analysis. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.14

work page doi:10.18653/v1/2022.woah-1.14 2022

[48] [48]

Multilingual H ate C heck: Functional Tests for Multilingual Hate Speech Detection Models

R. Multilingual H ate C heck: Functional Tests for Multilingual Hate Speech Detection Models. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.15

work page doi:10.18653/v1/2022.woah-1.15 2022

[49] [49]

Distributional properties of political dogwhistle representations in S wedish BERT

Hertzberg, Niclas and Cooper, Robin and Lindgren, Elina and R. Distributional properties of political dogwhistle representations in S wedish BERT. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.16

work page doi:10.18653/v1/2022.woah-1.16 2022

[50] [50]

Hate Speech Criteria: A Modular Approach to Task-Specific Hate Speech Definitions

Khurana, Urja and Vermeulen, Ivar and Nalisnick, Eric and Van Noorloos, Marloes and Fokkens, Antske. Hate Speech Criteria: A Modular Approach to Task-Specific Hate Speech Definitions. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.17

work page doi:10.18653/v1/2022.woah-1.17 2022

[51] [51]

Accounting for Offensive Speech as a Practice of Resistance

Diaz, Mark and Amironesei, Razvan and Weidinger, Laura and Gabriel, Iason. Accounting for Offensive Speech as a Practice of Resistance. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.18

work page doi:10.18653/v1/2022.woah-1.18 2022

[52] [52]

Towards a Multi-Entity Aspect-Based Sentiment Analysis for Characterizing Directed Social Regard in Online Messaging

Zheng, Joan and Friedman, Scott and Schmer-galunder, Sonja and Magnusson, Ian and Wheelock, Ruta and Gottlieb, Jeremy and Gomez, Diana and Miller, Christopher. Towards a Multi-Entity Aspect-Based Sentiment Analysis for Characterizing Directed Social Regard in Online Messaging. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:1...

work page doi:10.18653/v1/2022.woah-1.19 2022

[53] [53]

Flexible text generation for counterfactual fairness probing

Fryer, Zee and Axelrod, Vera and Packer, Ben and Beutel, Alex and Chen, Jilin and Webster, Kellie. Flexible text generation for counterfactual fairness probing. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.20

work page doi:10.18653/v1/2022.woah-1.20 2022

[54] [54]

Users Hate Blondes: Detecting Sexism in User Comments on Online R omanian News

Moldovan, Andreea and Cs. Users Hate Blondes: Detecting Sexism in User Comments on Online R omanian News. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.21

work page doi:10.18653/v1/2022.woah-1.21 2022

[55] [55]

Targeted Identity Group Prediction in Hate Speech Corpora

Sachdeva, Pratik and Barreto, Renata and Von Vacano, Claudia and Kennedy, Chris. Targeted Identity Group Prediction in Hate Speech Corpora. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.22

work page doi:10.18653/v1/2022.woah-1.22 2022

[56] [56]

Revisiting Queer Minorities in Lexicons

Ramesh, Krithika and Kumar, Sumeet and Khudabukhsh, Ashiqur. Revisiting Queer Minorities in Lexicons. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.23

work page doi:10.18653/v1/2022.woah-1.23 2022

[57] [57]

HATE - ITA : Hate Speech Detection in I talian Social Media Text

Nozza, Debora and Bianchi, Federico and Attanasio, Giuseppe. HATE - ITA : Hate Speech Detection in I talian Social Media Text. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.24

work page doi:10.18653/v1/2022.woah-1.24 2022

[58] [58]

Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022

[59] [59]

Changes in Tweet Geolocation over Time: A Study with Carmen 2.0

Zhang, Jingyu and DeLucia, Alexandra and Dredze, Mark. Changes in Tweet Geolocation over Time: A Study with Carmen 2.0. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022

[60] [60]

Extracting Mathematical Concepts from Text

Collard, Jacob and de Paiva, Valeria and Fong, Brendan and Subrahmanian, Eswaran. Extracting Mathematical Concepts from Text. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022

[61] [61]

Data-driven Approach to Differentiating between Depression and Dementia from Noisy Speech and Language Data

Ehghaghi, Malikeh and Rudzicz, Frank and Novikova, Jekaterina. Data-driven Approach to Differentiating between Depression and Dementia from Noisy Speech and Language Data. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022

[62] [62]

Cross-Dialect Social Media Dependency Parsing for Social Scientific Entity Attribute Analysis

Eggleston, Chloe and O ' Connor, Brendan. Cross-Dialect Social Media Dependency Parsing for Social Scientific Entity Attribute Analysis. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022

[63] [63]

Impact of Environmental Noise on A lzheimer ' s Disease Detection from Speech: Should You Let a Baby Cry?

Novikova, Jekaterina. Impact of Environmental Noise on A lzheimer ' s Disease Detection from Speech: Should You Let a Baby Cry?. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022

[64] [64]

Exploring Multimodal Features and Fusion Strategies for Analyzing Disaster Tweets

Pranesh, Raj. Exploring Multimodal Features and Fusion Strategies for Analyzing Disaster Tweets. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022

[65] [65]

NTULM : Enriching Social Media Text Representations with Non-Textual Units

Li, Jinning and Mishra, Shubhanshu and El-Kishky, Ahmed and Mehta, Sneha and Kulkarni, Vivek. NTULM : Enriching Social Media Text Representations with Non-Textual Units. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022

[66] [66]

Robust Candidate Generation for Entity Linking on Short Social Media Texts

Hebert, Liam and Makki, Raheleh and Mishra, Shubhanshu and Saghir, Hamidreza and Kamath, Anusha and Merhav, Yuval. Robust Candidate Generation for Entity Linking on Short Social Media Texts. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022

[67] [67]

T rans POS : Transformers for Consolidating Different POS Tagset Datasets

Li, Alex and Bankole-Hameed, Ilyas and Singh, Ranadeep and Ng, Gabriel and Gupta, Akshat. T rans POS : Transformers for Consolidating Different POS Tagset Datasets. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022

[68] [68]

An Effective, Performant Named Entity Recognition System for Noisy Business Telephone Conversation Transcripts

Fu, Xue-Yong and Chen, Cheng and Laskar, Md Tahmid Rahman and Tn, Shashi Bhushan and Corston-Oliver, Simon. An Effective, Performant Named Entity Recognition System for Noisy Business Telephone Conversation Transcripts. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022

[69] [69]

Leveraging Semantic and Sentiment Knowledge for User-Generated Text Sentiment Classification

Khan, Jawad and Ahmad, Niaz and Alam, Aftab and Lee, Youngmoon. Leveraging Semantic and Sentiment Knowledge for User-Generated Text Sentiment Classification. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022

[70] [70]

An Emotional Journey: Detecting Emotion Trajectories in D utch Customer Service Dialogues

Labat, Sofie and Hadifar, Amir and Demeester, Thomas and Hoste, Veronique. An Emotional Journey: Detecting Emotion Trajectories in D utch Customer Service Dialogues. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022

[71] [71]

Supervised and Unsupervised Evaluation of Synthetic Code-Switching

Orlov, Evgeny and Artemova, Ekaterina. Supervised and Unsupervised Evaluation of Synthetic Code-Switching. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022

[72] [72]

A rab G end: Gender Analysis and Inference on A rabic T witter

Mubarak, Hamdy and Chowdhury, Shammur Absar and Alam, Firoj. A rab G end: Gender Analysis and Inference on A rabic T witter. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022

[73] [73]

Automatic Identification of 5 C Vaccine Behaviour on Social Media

Sampath Kumar, Ajay Hemanth and Shausan, Aminath and Demartini, Gianluca and Rahimi, Afshin. Automatic Identification of 5 C Vaccine Behaviour on Social Media. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022

[74] [74]

Automatic Extraction of Structured Mineral Drillhole Results from Unstructured Mining Company Reports

Dimeski, Adam and Rahimi, Afshin. Automatic Extraction of Structured Mineral Drillhole Results from Unstructured Mining Company Reports. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022

[75] [75]

`` Kanglish alli names! '' Named Entity Recognition for K annada- E nglish Code-Mixed Social Media Data

S, Sumukh and Shrivastava, Manish. `` Kanglish alli names! '' Named Entity Recognition for K annada- E nglish Code-Mixed Social Media Data. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022

[76] [76]

Span Extraction Aided Improved Code-mixed Sentiment Classification

S, Ramaneswaran and Benhur, Sean and Ghosh, Sreyan. Span Extraction Aided Improved Code-mixed Sentiment Classification. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022

[77] [77]

A d BERT : An Effective Few Shot Learning Framework for Aligning Tweets to Superbowl Advertisements

Das, Debarati and Chenchu, Roopana and Abdollahi, Maral and Huh, Jisu and Srivastava, Jaideep. A d BERT : An Effective Few Shot Learning Framework for Aligning Tweets to Superbowl Advertisements. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022

[78] [78]

Increasing Robustness for Cross-domain Dialogue Act Classification on Social Media Data

Vielsted, Marcus and Wallenius, Nikolaj and van der Goot, Rob. Increasing Robustness for Cross-domain Dialogue Act Classification on Social Media Data. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022

[79] [79]

Disfluency Detection for V ietnamese

Dao, Mai Hoang and Truong, Thinh Hung and Nguyen, Dat Quoc. Disfluency Detection for V ietnamese. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022

[80] [80]

A multi-level approach for hierarchical Ticket Classification

Marcuzzo, Matteo and Zangari, Alessandro and Schiavinato, Michele and Giudice, Lorenzo and Gasparetto, Andrea and Albarelli, Andrea. A multi-level approach for hierarchical Ticket Classification. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022