pith. sign in

arxiv: 2606.25462 · v1 · pith:FKWJEMSQnew · submitted 2026-06-24 · 💻 cs.CL

Optimizing Abstractive Summarization With Fine-Tuned PEGASUS

Pith reviewed 2026-06-25 20:59 UTC · model grok-4.3

classification 💻 cs.CL
keywords abstractive summarizationPEGASUSXL-Sumfine-tuningROUGE metrictransformer modelmT5 baseline
0
0 comments X

The pith

Fine-tuned PEGASUS achieves state-of-the-art ROUGE scores on XL-Sum English corpus with gains over mT5.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper fine-tunes the PEGASUS model on the XL-Sum English corpus for abstractive summarization. It evaluates the output summaries against human references using the ROUGE metric and reports higher scores than the mT5 baseline. The reported gains are 4.04 percent on ROUGE-1, 15.25 percent on ROUGE-2, and 3.39 percent on ROUGE-L. A reader would care if these gains mean more accurate automatic summaries can be produced without copying source sentences directly.

Core claim

The authors fine-tune PEGASUS on the XL-Sum English corpus and evaluate the generated summaries using ROUGE against human summaries. They claim this gives state-of-the-art performance, with a 4.04% improvement in ROUGE-1, 15.25% increase in ROUGE-2, and 3.39% improvement in ROUGE-L over the baseline mT5 model.

What carries the argument

The fine-tuned PEGASUS model, a transformer sequence-to-sequence architecture pre-trained for summarization and then adapted to the target corpus.

If this is right

  • Higher ROUGE scores indicate closer matches to human-written summaries on the XL-Sum corpus.
  • The fine-tuning process improves abstractive output quality over the mT5 baseline.
  • ROUGE-2 gains suggest better capture of phrase-level overlaps in the generated text.
  • The approach demonstrates that targeted adaptation of PEGASUS can optimize performance on this dataset.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the gains hold, the model could support more reliable automatic news digests or content tools.
  • The reported ROUGE improvements might encourage testing the same fine-tuning recipe on other summarization benchmarks.
  • Future checks could add human preference judgments alongside the automatic metrics to confirm perceived quality.
  • The method might extend to the non-English portions of XL-Sum to test cross-lingual transfer.

Load-bearing premise

The mT5 baseline was trained and evaluated under conditions directly comparable to the fine-tuned PEGASUS.

What would settle it

A controlled re-run of both models on identical data splits and hyperparameters that shows equal or lower ROUGE scores for the PEGASUS version.

read the original abstract

Abstractive text summarization is the technique of generating a short and concise summary comprising the salient ideas of a source text without making a subset of the salient sentences from the source text. The introduction of transformer models such as BART, T5, and PEGASUS has made this sort of summarization process more efficient and accurate. The objective of this paper is to fine-tune PEGASUS on the XL-Sum English corpus to achieve a better performance compared to the baseline mT5 model. The performance of the generated summaries from the fine-tuned model is evaluated using the ROUGE metric, which basically compares the auto-generated summaries with human-created summaries. To the best of our knowledge, the results from our fine-tuned PEGASUS model give a state-of-the-art performance on the XL-Sum English Corpus. To quantify the improvement, there is a 4.04% improvement in the ROUGE-1 score, a 15.25% increase in the ROUGE-2 score, and a 3.39% improvement in the ROUGE-L score from the baseline model.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript describes fine-tuning the PEGASUS model on the XL-Sum English corpus for abstractive summarization. It evaluates the resulting summaries with ROUGE metrics and claims state-of-the-art performance, quantifying improvements of 4.04% ROUGE-1, 15.25% ROUGE-2, and 3.39% ROUGE-L over an mT5 baseline.

Significance. If the experimental conditions prove comparable and the numbers hold under scrutiny, the work would supply a modest empirical data point for PEGASUS on XL-Sum. The manuscript contains no machine-checked proofs, reproducible code, or parameter-free derivations, so its value rests entirely on the reported ROUGE deltas once the setup is documented.

major comments (2)
  1. [Abstract] Abstract: the central claim of specific percentage improvements and SOTA status on XL-Sum English cannot be assessed because the text supplies no training details, data splits, hyperparameter values, error bars, or ablation studies.
  2. [Abstract] Abstract: the reported 4.04/15.25/3.39% ROUGE gains are not attributable to the model change unless the mT5 baseline was re-run with the identical XL-Sum English partition, optimizer schedule, and evaluation script used for PEGASUS; no section confirms this comparability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We will revise the paper to provide the necessary experimental details and clarifications as outlined below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim of specific percentage improvements and SOTA status on XL-Sum English cannot be assessed because the text supplies no training details, data splits, hyperparameter values, error bars, or ablation studies.

    Authors: We agree with the referee that the abstract and current text lack sufficient details on training, splits, hyperparameters, error bars, and ablations to fully assess the claims. In the revised version, we will include these details in an expanded experimental section. revision: yes

  2. Referee: [Abstract] Abstract: the reported 4.04/15.25/3.39% ROUGE gains are not attributable to the model change unless the mT5 baseline was re-run with the identical XL-Sum English partition, optimizer schedule, and evaluation script used for PEGASUS; no section confirms this comparability.

    Authors: We confirm that the mT5 baseline was re-run using the exact same XL-Sum English data partition, training setup, optimizer schedule, and evaluation script as the PEGASUS model to ensure a fair comparison. This was done to attribute the improvements specifically to the model choice. We will explicitly document this in a new subsection on baseline implementation in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical fine-tuning and metric comparison

full rationale

The paper performs standard fine-tuning of PEGASUS on XL-Sum English and reports ROUGE-1/2/L scores against an mT5 baseline. No equations, derivations, fitted parameters renamed as predictions, or self-citation chains appear in the provided text. The central claim (SOTA via 4.04/15.25/3.39% ROUGE gains) rests on external benchmark comparison rather than any self-referential reduction. Baseline comparability is a potential correctness issue, not circularity per the evaluation rules.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review prevents enumeration of specific free parameters or axioms; work implicitly relies on standard transformer fine-tuning assumptions and the domain assumption that ROUGE correlates with summary quality.

pith-pipeline@v0.9.1-grok · 5745 in / 1150 out tokens · 24224 ms · 2026-06-25T20:59:19.410514+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

300 extracted references · 142 canonical work pages

  1. [1]

    Information , VOLUME =

    Abdel-Salam, Shehab and Rafea, Ahmed , TITLE =. Information , VOLUME =. 2022 , NUMBER =

  2. [2]

    arXiv preprint arXiv:1704.04368 , year=

    Get to the point: Summarization with pointer-generator networks , author=. arXiv preprint arXiv:1704.04368 , year=

  3. [3]

    arXiv preprint arXiv:2012.14136 , year=

    On generating extended summaries of long documents , author=. arXiv preprint arXiv:2012.14136 , year=

  4. [4]

    arXiv preprint arXiv:1905.01975 , year=

    Point-less: More abstractive summarization with pointer-generator networks , author=. arXiv preprint arXiv:1905.01975 , year=

  5. [5]

    Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval , pages=

    Generic text summarization using relevance measure and latent semantic analysis , author=. Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval , pages=

  6. [6]

    arXiv preprint arXiv:1602.06023 , year=

    Abstractive text summarization using sequence-to-sequence rnns and beyond , author=. arXiv preprint arXiv:1602.06023 , year=

  7. [7]

    Proceedings of the 2004 conference on empirical methods in natural language processing , pages=

    Textrank: Bringing order into text , author=. Proceedings of the 2004 conference on empirical methods in natural language processing , pages=

  8. [8]

    International Journal of Computer Applications , volume=

    Comparative study of text summarization methods , author=. International Journal of Computer Applications , volume=. 2014 , publisher=

  9. [9]

    Information Processing & Management , volume=

    Text summarization using Wikipedia , author=. Information Processing & Management , volume=. 2014 , publisher=

  10. [10]

    arXiv preprint arXiv:2006.01997 , year=

    Automatic text summarization of covid-19 medical research articles using bert and gpt-2 , author=. arXiv preprint arXiv:2006.01997 , year=

  11. [11]

    International Conference on Machine Learning , pages=

    Pegasus: Pre-training with extracted gap-sentences for abstractive summarization , author=. International Conference on Machine Learning , pages=. 2020 , organization=

  12. [12]

    Sustainable Advanced Computing , pages=

    Automated news summarization using transformers , author=. Sustainable Advanced Computing , pages=. 2022 , publisher=

  13. [13]

    Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) , pages=

    On extractive and abstractive neural document summarization with transformer language models , author=. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) , pages=

  14. [14]

    arXiv preprint arXiv:2102.09130 , year=

    Entity-level factual consistency of abstractive text summarization , author=. arXiv preprint arXiv:2102.09130 , year=

  15. [15]

    arXiv preprint arXiv:2012.00052 , year=

    Systematically exploring redundancy reduction in summarizing long documents , author=. arXiv preprint arXiv:2012.00052 , year=

  16. [16]

    arXiv preprint arXiv:1910.13461 , year=

    Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension , author=. arXiv preprint arXiv:1910.13461 , year=

  17. [17]

    The Journal of Machine Learning Research , volume=

    Exploring the limits of transfer learning with a unified text-to-text transformer , author=. The Journal of Machine Learning Research , volume=. 2020 , publisher=

  18. [18]

    Liu , title =

    Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu , title =. arXiv e-prints , year =

  19. [19]

    , title =

    Raffel, Colin and Shazeer, Noam and Roberts, Adam and Lee, Katherine and Narang, Sharan and Matena, Michael and Zhou, Yanqi and Li, Wei and Liu, Peter J. , title =. J. Mach. Learn. Res. , month =. 2020 , issue_date =

  20. [20]

    2017 , URL =

    Get To The Point: Summarization with Pointer-Generator Networks , author =. 2017 , URL =

  21. [21]

    On Faithfulness and Factuality in Abstractive Summarization

    Maynez, Joshua and Narayan, Shashi and Bohnet, Bernd and McDonald, Ryan. On Faithfulness and Factuality in Abstractive Summarization. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. doi:10.18653/v1/2020.acl-main.173

  22. [22]

    Cohen and Mirella Lapata

    Shashi Narayan and Shay B. Cohen and Mirella Lapata. Don't Give Me the Details, Just the Summary! T opic-Aware Convolutional Neural Networks for Extreme Summarization. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018

  23. [23]

    High-risk learning: acquiring new word vectors from tiny data

    Herbelot, Aur \'e lie and Baroni, Marco. High-risk learning: acquiring new word vectors from tiny data. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2017. doi:10.18653/v1/D17-1030

  24. [24]

    and Lapata, Mirella

    Narayan, Shashi and Cohen, Shay B. and Lapata, Mirella. Don ' t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018. doi:10.18653/v1/D18-1206

  25. [25]

    Saiful and Mubasshir, Kazi and Li, Yuan-Fang and Kang, Yong-Bin and Rahman, M

    Hasan, Tahmid and Bhattacharjee, Abhik and Islam, Md. Saiful and Mubasshir, Kazi and Li, Yuan-Fang and Kang, Yong-Bin and Rahman, M. Sohel and Shahriyar, Rifat. XL -Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages. Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. 2021. doi:10.18653/v1/2021.findings-acl.413

  26. [26]

    Advances in neural information processing systems , volume=

    A unified approach to interpreting model predictions , author=. Advances in neural information processing systems , volume=

  27. [27]

    Proceedings of the 3rd Wordplay: When Language Meets Games Workshop (Wordplay 2022). 2022

  28. [28]

    A Systematic Survey of Text Worlds as Embodied Natural Language Environments

    Jansen, Peter. A Systematic Survey of Text Worlds as Embodied Natural Language Environments. Proceedings of the 3rd Wordplay: When Language Meets Games Workshop (Wordplay 2022). 2022. doi:10.18653/v1/2022.wordplay-1.1

  29. [29]

    A Minimal Computational Improviser Based on Oral Thought

    Montfort, Nick and Bartlett Fernandez, Sebastian. A Minimal Computational Improviser Based on Oral Thought. Proceedings of the 3rd Wordplay: When Language Meets Games Workshop (Wordplay 2022). 2022. doi:10.18653/v1/2022.wordplay-1.2

  30. [30]

    Craft an Iron Sword: Dynamically Generating Interactive Game Characters by Prompting Large Language Models Tuned on Code

    Volum, Ryan and Rao, Sudha and Xu, Michael and DesGarennes, Gabriel and Brockett, Chris and Van Durme, Benjamin and Deng, Olivia and Malhotra, Akanksha and Dolan, Bill. Craft an Iron Sword: Dynamically Generating Interactive Game Characters by Prompting Large Language Models Tuned on Code. Proceedings of the 3rd Wordplay: When Language Meets Games Worksho...

  31. [31]

    A Sequence Modelling Approach to Question Answering in Text-Based Games

    Furman, Gregory and Toledo, Edan and Shock, Jonathan and Buys, Jan. A Sequence Modelling Approach to Question Answering in Text-Based Games. Proceedings of the 3rd Wordplay: When Language Meets Games Workshop (Wordplay 2022). 2022. doi:10.18653/v1/2022.wordplay-1.4

  32. [32]

    Automatic Exploration of Textual Environments with Language-Conditioned Autotelic Agents

    Teodorescu, Laetitia and Yuan, Xingdi and C \^o t \'e , Marc-Alexandre and Oudeyer, Pierre-Yves. Automatic Exploration of Textual Environments with Language-Conditioned Autotelic Agents. Proceedings of the 3rd Wordplay: When Language Meets Games Workshop (Wordplay 2022). 2022. doi:10.18653/v1/2022.wordplay-1.5

  33. [33]

    Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022

  34. [34]

    Separating Hate Speech and Offensive Language Classes via Adversarial Debiasing

    Yuan, Shuzhou and Maronikolakis, Antonis and Sch. Separating Hate Speech and Offensive Language Classes via Adversarial Debiasing. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.1

  35. [35]

    Towards Automatic Generation of Messages Countering Online Hate Speech and Microaggressions

    Ashida, Mana and Komachi, Mamoru. Towards Automatic Generation of Messages Countering Online Hate Speech and Microaggressions. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.2

  36. [36]

    G rease V ision: Rewriting the Rules of the Interface

    Datta, Siddhartha and Kollnig, Konrad and Shadbolt, Nigel. G rease V ision: Rewriting the Rules of the Interface. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.3

  37. [37]

    Improving Generalization of Hate Speech Detection Systems to Novel Target Groups via Domain Adaptation

    Ludwig, Florian and Dolos, Klara and Zesch, Torsten and Hobley, Eleanor. Improving Generalization of Hate Speech Detection Systems to Novel Target Groups via Domain Adaptation. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.4

  38. [38]

    `` Zo Grof ! '' : A Comprehensive Corpus for Offensive and Abusive Language in D utch

    Ruitenbeek, Ward and Zwart, Victor and Van Der Noord, Robin and Gnezdilov, Zhenja and Caselli, Tommaso. `` Zo Grof ! '' : A Comprehensive Corpus for Offensive and Abusive Language in D utch. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.5

  39. [39]

    Counter- TWIT : An I talian Corpus for Online Counterspeech in Ecological Contexts

    Goffredo, Pierpaolo and Basile, Valerio and Cepollaro, Bianca and Patti, Viviana. Counter- TWIT : An I talian Corpus for Online Counterspeech in Ecological Contexts. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.6

  40. [40]

    S tereo KG : Data-Driven Knowledge Graph Construction For Cultural Knowledge and Stereotypes

    Deshpande, Awantee and Ruiter, Dana and Mosbach, Marius and Klakow, Dietrich. S tereo KG : Data-Driven Knowledge Graph Construction For Cultural Knowledge and Stereotypes. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.7

  41. [41]

    The subtle language of exclusion: Identifying the Toxic Speech of Trans-exclusionary Radical Feminists

    Lu, Christina and Jurgens, David. The subtle language of exclusion: Identifying the Toxic Speech of Trans-exclusionary Radical Feminists. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.8

  42. [42]

    Lost in Distillation: A Case Study in Toxicity Modeling

    Chvasta, Alyssa and Lees, Alyssa and Sorensen, Jeffrey and Vasserman, Lucy and Goyal, Nitesh. Lost in Distillation: A Case Study in Toxicity Modeling. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.9

  43. [43]

    Cleansing & expanding the HURTLEX (el) with a multidimensional categorization of offensive words

    Stamou, Vivian and Alexiou, Iakovi and Klimi, Antigone and Molou, Eleftheria and Saivanidou, Alexandra and Markantonatou, Stella. Cleansing & expanding the HURTLEX (el) with a multidimensional categorization of offensive words. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.10

  44. [44]

    Free speech or Free Hate Speech? Analyzing the Proliferation of Hate Speech in Parler

    Israeli, Abraham and Tsur, Oren. Free speech or Free Hate Speech? Analyzing the Proliferation of Hate Speech in Parler. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.11

  45. [45]

    Resources for Multilingual Hate Speech Detection

    Arango Monnar, Ayme and Perez, Jorge and Poblete, Barbara and Salda \ n a, Magdalena and Proust, Valentina. Resources for Multilingual Hate Speech Detection. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.12

  46. [46]

    Enriching Abusive Language Detection with Community Context

    Saleem, Haji Mohammad and Kurrek, Jana and Ruths, Derek. Enriching Abusive Language Detection with Community Context. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.13

  47. [47]

    DeTox: A Comprehensive Dataset for G erman Offensive Language and Conversation Analysis

    Demus, Christoph and Pitz, Jonas and Sch. DeTox: A Comprehensive Dataset for G erman Offensive Language and Conversation Analysis. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.14

  48. [48]

    Multilingual H ate C heck: Functional Tests for Multilingual Hate Speech Detection Models

    R. Multilingual H ate C heck: Functional Tests for Multilingual Hate Speech Detection Models. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.15

  49. [49]

    Distributional properties of political dogwhistle representations in S wedish BERT

    Hertzberg, Niclas and Cooper, Robin and Lindgren, Elina and R. Distributional properties of political dogwhistle representations in S wedish BERT. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.16

  50. [50]

    Hate Speech Criteria: A Modular Approach to Task-Specific Hate Speech Definitions

    Khurana, Urja and Vermeulen, Ivar and Nalisnick, Eric and Van Noorloos, Marloes and Fokkens, Antske. Hate Speech Criteria: A Modular Approach to Task-Specific Hate Speech Definitions. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.17

  51. [51]

    Accounting for Offensive Speech as a Practice of Resistance

    Diaz, Mark and Amironesei, Razvan and Weidinger, Laura and Gabriel, Iason. Accounting for Offensive Speech as a Practice of Resistance. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.18

  52. [52]

    Towards a Multi-Entity Aspect-Based Sentiment Analysis for Characterizing Directed Social Regard in Online Messaging

    Zheng, Joan and Friedman, Scott and Schmer-galunder, Sonja and Magnusson, Ian and Wheelock, Ruta and Gottlieb, Jeremy and Gomez, Diana and Miller, Christopher. Towards a Multi-Entity Aspect-Based Sentiment Analysis for Characterizing Directed Social Regard in Online Messaging. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:1...

  53. [53]

    Flexible text generation for counterfactual fairness probing

    Fryer, Zee and Axelrod, Vera and Packer, Ben and Beutel, Alex and Chen, Jilin and Webster, Kellie. Flexible text generation for counterfactual fairness probing. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.20

  54. [54]

    Users Hate Blondes: Detecting Sexism in User Comments on Online R omanian News

    Moldovan, Andreea and Cs. Users Hate Blondes: Detecting Sexism in User Comments on Online R omanian News. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.21

  55. [55]

    Targeted Identity Group Prediction in Hate Speech Corpora

    Sachdeva, Pratik and Barreto, Renata and Von Vacano, Claudia and Kennedy, Chris. Targeted Identity Group Prediction in Hate Speech Corpora. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.22

  56. [56]

    Revisiting Queer Minorities in Lexicons

    Ramesh, Krithika and Kumar, Sumeet and Khudabukhsh, Ashiqur. Revisiting Queer Minorities in Lexicons. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.23

  57. [57]

    HATE - ITA : Hate Speech Detection in I talian Social Media Text

    Nozza, Debora and Bianchi, Federico and Attanasio, Giuseppe. HATE - ITA : Hate Speech Detection in I talian Social Media Text. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.24

  58. [58]

    Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

  59. [59]

    Changes in Tweet Geolocation over Time: A Study with Carmen 2.0

    Zhang, Jingyu and DeLucia, Alexandra and Dredze, Mark. Changes in Tweet Geolocation over Time: A Study with Carmen 2.0. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

  60. [60]

    Extracting Mathematical Concepts from Text

    Collard, Jacob and de Paiva, Valeria and Fong, Brendan and Subrahmanian, Eswaran. Extracting Mathematical Concepts from Text. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

  61. [61]

    Data-driven Approach to Differentiating between Depression and Dementia from Noisy Speech and Language Data

    Ehghaghi, Malikeh and Rudzicz, Frank and Novikova, Jekaterina. Data-driven Approach to Differentiating between Depression and Dementia from Noisy Speech and Language Data. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

  62. [62]

    Cross-Dialect Social Media Dependency Parsing for Social Scientific Entity Attribute Analysis

    Eggleston, Chloe and O ' Connor, Brendan. Cross-Dialect Social Media Dependency Parsing for Social Scientific Entity Attribute Analysis. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

  63. [63]

    Impact of Environmental Noise on A lzheimer ' s Disease Detection from Speech: Should You Let a Baby Cry?

    Novikova, Jekaterina. Impact of Environmental Noise on A lzheimer ' s Disease Detection from Speech: Should You Let a Baby Cry?. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

  64. [64]

    Exploring Multimodal Features and Fusion Strategies for Analyzing Disaster Tweets

    Pranesh, Raj. Exploring Multimodal Features and Fusion Strategies for Analyzing Disaster Tweets. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

  65. [65]

    NTULM : Enriching Social Media Text Representations with Non-Textual Units

    Li, Jinning and Mishra, Shubhanshu and El-Kishky, Ahmed and Mehta, Sneha and Kulkarni, Vivek. NTULM : Enriching Social Media Text Representations with Non-Textual Units. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

  66. [66]

    Robust Candidate Generation for Entity Linking on Short Social Media Texts

    Hebert, Liam and Makki, Raheleh and Mishra, Shubhanshu and Saghir, Hamidreza and Kamath, Anusha and Merhav, Yuval. Robust Candidate Generation for Entity Linking on Short Social Media Texts. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

  67. [67]

    T rans POS : Transformers for Consolidating Different POS Tagset Datasets

    Li, Alex and Bankole-Hameed, Ilyas and Singh, Ranadeep and Ng, Gabriel and Gupta, Akshat. T rans POS : Transformers for Consolidating Different POS Tagset Datasets. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

  68. [68]

    An Effective, Performant Named Entity Recognition System for Noisy Business Telephone Conversation Transcripts

    Fu, Xue-Yong and Chen, Cheng and Laskar, Md Tahmid Rahman and Tn, Shashi Bhushan and Corston-Oliver, Simon. An Effective, Performant Named Entity Recognition System for Noisy Business Telephone Conversation Transcripts. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

  69. [69]

    Leveraging Semantic and Sentiment Knowledge for User-Generated Text Sentiment Classification

    Khan, Jawad and Ahmad, Niaz and Alam, Aftab and Lee, Youngmoon. Leveraging Semantic and Sentiment Knowledge for User-Generated Text Sentiment Classification. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

  70. [70]

    An Emotional Journey: Detecting Emotion Trajectories in D utch Customer Service Dialogues

    Labat, Sofie and Hadifar, Amir and Demeester, Thomas and Hoste, Veronique. An Emotional Journey: Detecting Emotion Trajectories in D utch Customer Service Dialogues. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

  71. [71]

    Supervised and Unsupervised Evaluation of Synthetic Code-Switching

    Orlov, Evgeny and Artemova, Ekaterina. Supervised and Unsupervised Evaluation of Synthetic Code-Switching. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

  72. [72]

    A rab G end: Gender Analysis and Inference on A rabic T witter

    Mubarak, Hamdy and Chowdhury, Shammur Absar and Alam, Firoj. A rab G end: Gender Analysis and Inference on A rabic T witter. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

  73. [73]

    Automatic Identification of 5 C Vaccine Behaviour on Social Media

    Sampath Kumar, Ajay Hemanth and Shausan, Aminath and Demartini, Gianluca and Rahimi, Afshin. Automatic Identification of 5 C Vaccine Behaviour on Social Media. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

  74. [74]

    Automatic Extraction of Structured Mineral Drillhole Results from Unstructured Mining Company Reports

    Dimeski, Adam and Rahimi, Afshin. Automatic Extraction of Structured Mineral Drillhole Results from Unstructured Mining Company Reports. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

  75. [75]

    `` Kanglish alli names! '' Named Entity Recognition for K annada- E nglish Code-Mixed Social Media Data

    S, Sumukh and Shrivastava, Manish. `` Kanglish alli names! '' Named Entity Recognition for K annada- E nglish Code-Mixed Social Media Data. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

  76. [76]

    Span Extraction Aided Improved Code-mixed Sentiment Classification

    S, Ramaneswaran and Benhur, Sean and Ghosh, Sreyan. Span Extraction Aided Improved Code-mixed Sentiment Classification. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

  77. [77]

    A d BERT : An Effective Few Shot Learning Framework for Aligning Tweets to Superbowl Advertisements

    Das, Debarati and Chenchu, Roopana and Abdollahi, Maral and Huh, Jisu and Srivastava, Jaideep. A d BERT : An Effective Few Shot Learning Framework for Aligning Tweets to Superbowl Advertisements. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

  78. [78]

    Increasing Robustness for Cross-domain Dialogue Act Classification on Social Media Data

    Vielsted, Marcus and Wallenius, Nikolaj and van der Goot, Rob. Increasing Robustness for Cross-domain Dialogue Act Classification on Social Media Data. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

  79. [79]

    Disfluency Detection for V ietnamese

    Dao, Mai Hoang and Truong, Thinh Hung and Nguyen, Dat Quoc. Disfluency Detection for V ietnamese. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

  80. [80]

    A multi-level approach for hierarchical Ticket Classification

    Marcuzzo, Matteo and Zangari, Alessandro and Schiavinato, Michele and Giudice, Lorenzo and Gasparetto, Andrea and Albarelli, Andrea. A multi-level approach for hierarchical Ticket Classification. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

Showing first 80 references.