Optimizing Abstractive Summarization With Fine-Tuned PEGASUS
Pith reviewed 2026-06-25 20:59 UTC · model grok-4.3
The pith
Fine-tuned PEGASUS achieves state-of-the-art ROUGE scores on XL-Sum English corpus with gains over mT5.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors fine-tune PEGASUS on the XL-Sum English corpus and evaluate the generated summaries using ROUGE against human summaries. They claim this gives state-of-the-art performance, with a 4.04% improvement in ROUGE-1, 15.25% increase in ROUGE-2, and 3.39% improvement in ROUGE-L over the baseline mT5 model.
What carries the argument
The fine-tuned PEGASUS model, a transformer sequence-to-sequence architecture pre-trained for summarization and then adapted to the target corpus.
If this is right
- Higher ROUGE scores indicate closer matches to human-written summaries on the XL-Sum corpus.
- The fine-tuning process improves abstractive output quality over the mT5 baseline.
- ROUGE-2 gains suggest better capture of phrase-level overlaps in the generated text.
- The approach demonstrates that targeted adaptation of PEGASUS can optimize performance on this dataset.
Where Pith is reading between the lines
- If the gains hold, the model could support more reliable automatic news digests or content tools.
- The reported ROUGE improvements might encourage testing the same fine-tuning recipe on other summarization benchmarks.
- Future checks could add human preference judgments alongside the automatic metrics to confirm perceived quality.
- The method might extend to the non-English portions of XL-Sum to test cross-lingual transfer.
Load-bearing premise
The mT5 baseline was trained and evaluated under conditions directly comparable to the fine-tuned PEGASUS.
What would settle it
A controlled re-run of both models on identical data splits and hyperparameters that shows equal or lower ROUGE scores for the PEGASUS version.
read the original abstract
Abstractive text summarization is the technique of generating a short and concise summary comprising the salient ideas of a source text without making a subset of the salient sentences from the source text. The introduction of transformer models such as BART, T5, and PEGASUS has made this sort of summarization process more efficient and accurate. The objective of this paper is to fine-tune PEGASUS on the XL-Sum English corpus to achieve a better performance compared to the baseline mT5 model. The performance of the generated summaries from the fine-tuned model is evaluated using the ROUGE metric, which basically compares the auto-generated summaries with human-created summaries. To the best of our knowledge, the results from our fine-tuned PEGASUS model give a state-of-the-art performance on the XL-Sum English Corpus. To quantify the improvement, there is a 4.04% improvement in the ROUGE-1 score, a 15.25% increase in the ROUGE-2 score, and a 3.39% improvement in the ROUGE-L score from the baseline model.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript describes fine-tuning the PEGASUS model on the XL-Sum English corpus for abstractive summarization. It evaluates the resulting summaries with ROUGE metrics and claims state-of-the-art performance, quantifying improvements of 4.04% ROUGE-1, 15.25% ROUGE-2, and 3.39% ROUGE-L over an mT5 baseline.
Significance. If the experimental conditions prove comparable and the numbers hold under scrutiny, the work would supply a modest empirical data point for PEGASUS on XL-Sum. The manuscript contains no machine-checked proofs, reproducible code, or parameter-free derivations, so its value rests entirely on the reported ROUGE deltas once the setup is documented.
major comments (2)
- [Abstract] Abstract: the central claim of specific percentage improvements and SOTA status on XL-Sum English cannot be assessed because the text supplies no training details, data splits, hyperparameter values, error bars, or ablation studies.
- [Abstract] Abstract: the reported 4.04/15.25/3.39% ROUGE gains are not attributable to the model change unless the mT5 baseline was re-run with the identical XL-Sum English partition, optimizer schedule, and evaluation script used for PEGASUS; no section confirms this comparability.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We will revise the paper to provide the necessary experimental details and clarifications as outlined below.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim of specific percentage improvements and SOTA status on XL-Sum English cannot be assessed because the text supplies no training details, data splits, hyperparameter values, error bars, or ablation studies.
Authors: We agree with the referee that the abstract and current text lack sufficient details on training, splits, hyperparameters, error bars, and ablations to fully assess the claims. In the revised version, we will include these details in an expanded experimental section. revision: yes
-
Referee: [Abstract] Abstract: the reported 4.04/15.25/3.39% ROUGE gains are not attributable to the model change unless the mT5 baseline was re-run with the identical XL-Sum English partition, optimizer schedule, and evaluation script used for PEGASUS; no section confirms this comparability.
Authors: We confirm that the mT5 baseline was re-run using the exact same XL-Sum English data partition, training setup, optimizer schedule, and evaluation script as the PEGASUS model to ensure a fair comparison. This was done to attribute the improvements specifically to the model choice. We will explicitly document this in a new subsection on baseline implementation in the revised manuscript. revision: yes
Circularity Check
No circularity: purely empirical fine-tuning and metric comparison
full rationale
The paper performs standard fine-tuning of PEGASUS on XL-Sum English and reports ROUGE-1/2/L scores against an mT5 baseline. No equations, derivations, fitted parameters renamed as predictions, or self-citation chains appear in the provided text. The central claim (SOTA via 4.04/15.25/3.39% ROUGE gains) rests on external benchmark comparison rather than any self-referential reduction. Baseline comparability is a potential correctness issue, not circularity per the evaluation rules.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Information , VOLUME =
Abdel-Salam, Shehab and Rafea, Ahmed , TITLE =. Information , VOLUME =. 2022 , NUMBER =
2022
-
[2]
arXiv preprint arXiv:1704.04368 , year=
Get to the point: Summarization with pointer-generator networks , author=. arXiv preprint arXiv:1704.04368 , year=
-
[3]
arXiv preprint arXiv:2012.14136 , year=
On generating extended summaries of long documents , author=. arXiv preprint arXiv:2012.14136 , year=
arXiv 2012
-
[4]
arXiv preprint arXiv:1905.01975 , year=
Point-less: More abstractive summarization with pointer-generator networks , author=. arXiv preprint arXiv:1905.01975 , year=
Pith/arXiv arXiv 1905
-
[5]
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval , pages=
Generic text summarization using relevance measure and latent semantic analysis , author=. Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval , pages=
-
[6]
arXiv preprint arXiv:1602.06023 , year=
Abstractive text summarization using sequence-to-sequence rnns and beyond , author=. arXiv preprint arXiv:1602.06023 , year=
-
[7]
Proceedings of the 2004 conference on empirical methods in natural language processing , pages=
Textrank: Bringing order into text , author=. Proceedings of the 2004 conference on empirical methods in natural language processing , pages=
2004
-
[8]
International Journal of Computer Applications , volume=
Comparative study of text summarization methods , author=. International Journal of Computer Applications , volume=. 2014 , publisher=
2014
-
[9]
Information Processing & Management , volume=
Text summarization using Wikipedia , author=. Information Processing & Management , volume=. 2014 , publisher=
2014
-
[10]
arXiv preprint arXiv:2006.01997 , year=
Automatic text summarization of covid-19 medical research articles using bert and gpt-2 , author=. arXiv preprint arXiv:2006.01997 , year=
arXiv 2006
-
[11]
International Conference on Machine Learning , pages=
Pegasus: Pre-training with extracted gap-sentences for abstractive summarization , author=. International Conference on Machine Learning , pages=. 2020 , organization=
2020
-
[12]
Sustainable Advanced Computing , pages=
Automated news summarization using transformers , author=. Sustainable Advanced Computing , pages=. 2022 , publisher=
2022
-
[13]
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) , pages=
On extractive and abstractive neural document summarization with transformer language models , author=. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) , pages=
2020
-
[14]
arXiv preprint arXiv:2102.09130 , year=
Entity-level factual consistency of abstractive text summarization , author=. arXiv preprint arXiv:2102.09130 , year=
-
[15]
arXiv preprint arXiv:2012.00052 , year=
Systematically exploring redundancy reduction in summarizing long documents , author=. arXiv preprint arXiv:2012.00052 , year=
arXiv 2012
-
[16]
arXiv preprint arXiv:1910.13461 , year=
Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension , author=. arXiv preprint arXiv:1910.13461 , year=
Pith/arXiv arXiv 1910
-
[17]
The Journal of Machine Learning Research , volume=
Exploring the limits of transfer learning with a unified text-to-text transformer , author=. The Journal of Machine Learning Research , volume=. 2020 , publisher=
2020
-
[18]
Liu , title =
Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu , title =. arXiv e-prints , year =
-
[19]
, title =
Raffel, Colin and Shazeer, Noam and Roberts, Adam and Lee, Katherine and Narang, Sharan and Matena, Michael and Zhou, Yanqi and Li, Wei and Liu, Peter J. , title =. J. Mach. Learn. Res. , month =. 2020 , issue_date =
2020
-
[20]
2017 , URL =
Get To The Point: Summarization with Pointer-Generator Networks , author =. 2017 , URL =
2017
-
[21]
On Faithfulness and Factuality in Abstractive Summarization
Maynez, Joshua and Narayan, Shashi and Bohnet, Bernd and McDonald, Ryan. On Faithfulness and Factuality in Abstractive Summarization. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. doi:10.18653/v1/2020.acl-main.173
-
[22]
Cohen and Mirella Lapata
Shashi Narayan and Shay B. Cohen and Mirella Lapata. Don't Give Me the Details, Just the Summary! T opic-Aware Convolutional Neural Networks for Extreme Summarization. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018
2018
-
[23]
High-risk learning: acquiring new word vectors from tiny data
Herbelot, Aur \'e lie and Baroni, Marco. High-risk learning: acquiring new word vectors from tiny data. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2017. doi:10.18653/v1/D17-1030
-
[24]
Narayan, Shashi and Cohen, Shay B. and Lapata, Mirella. Don ' t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018. doi:10.18653/v1/D18-1206
-
[25]
Saiful and Mubasshir, Kazi and Li, Yuan-Fang and Kang, Yong-Bin and Rahman, M
Hasan, Tahmid and Bhattacharjee, Abhik and Islam, Md. Saiful and Mubasshir, Kazi and Li, Yuan-Fang and Kang, Yong-Bin and Rahman, M. Sohel and Shahriyar, Rifat. XL -Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages. Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. 2021. doi:10.18653/v1/2021.findings-acl.413
-
[26]
Advances in neural information processing systems , volume=
A unified approach to interpreting model predictions , author=. Advances in neural information processing systems , volume=
-
[27]
Proceedings of the 3rd Wordplay: When Language Meets Games Workshop (Wordplay 2022). 2022
2022
-
[28]
A Systematic Survey of Text Worlds as Embodied Natural Language Environments
Jansen, Peter. A Systematic Survey of Text Worlds as Embodied Natural Language Environments. Proceedings of the 3rd Wordplay: When Language Meets Games Workshop (Wordplay 2022). 2022. doi:10.18653/v1/2022.wordplay-1.1
-
[29]
A Minimal Computational Improviser Based on Oral Thought
Montfort, Nick and Bartlett Fernandez, Sebastian. A Minimal Computational Improviser Based on Oral Thought. Proceedings of the 3rd Wordplay: When Language Meets Games Workshop (Wordplay 2022). 2022. doi:10.18653/v1/2022.wordplay-1.2
-
[30]
Volum, Ryan and Rao, Sudha and Xu, Michael and DesGarennes, Gabriel and Brockett, Chris and Van Durme, Benjamin and Deng, Olivia and Malhotra, Akanksha and Dolan, Bill. Craft an Iron Sword: Dynamically Generating Interactive Game Characters by Prompting Large Language Models Tuned on Code. Proceedings of the 3rd Wordplay: When Language Meets Games Worksho...
-
[31]
A Sequence Modelling Approach to Question Answering in Text-Based Games
Furman, Gregory and Toledo, Edan and Shock, Jonathan and Buys, Jan. A Sequence Modelling Approach to Question Answering in Text-Based Games. Proceedings of the 3rd Wordplay: When Language Meets Games Workshop (Wordplay 2022). 2022. doi:10.18653/v1/2022.wordplay-1.4
-
[32]
Automatic Exploration of Textual Environments with Language-Conditioned Autotelic Agents
Teodorescu, Laetitia and Yuan, Xingdi and C \^o t \'e , Marc-Alexandre and Oudeyer, Pierre-Yves. Automatic Exploration of Textual Environments with Language-Conditioned Autotelic Agents. Proceedings of the 3rd Wordplay: When Language Meets Games Workshop (Wordplay 2022). 2022. doi:10.18653/v1/2022.wordplay-1.5
-
[33]
Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022
2022
-
[34]
Separating Hate Speech and Offensive Language Classes via Adversarial Debiasing
Yuan, Shuzhou and Maronikolakis, Antonis and Sch. Separating Hate Speech and Offensive Language Classes via Adversarial Debiasing. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.1
-
[35]
Towards Automatic Generation of Messages Countering Online Hate Speech and Microaggressions
Ashida, Mana and Komachi, Mamoru. Towards Automatic Generation of Messages Countering Online Hate Speech and Microaggressions. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.2
-
[36]
G rease V ision: Rewriting the Rules of the Interface
Datta, Siddhartha and Kollnig, Konrad and Shadbolt, Nigel. G rease V ision: Rewriting the Rules of the Interface. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.3
-
[37]
Ludwig, Florian and Dolos, Klara and Zesch, Torsten and Hobley, Eleanor. Improving Generalization of Hate Speech Detection Systems to Novel Target Groups via Domain Adaptation. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.4
-
[38]
`` Zo Grof ! '' : A Comprehensive Corpus for Offensive and Abusive Language in D utch
Ruitenbeek, Ward and Zwart, Victor and Van Der Noord, Robin and Gnezdilov, Zhenja and Caselli, Tommaso. `` Zo Grof ! '' : A Comprehensive Corpus for Offensive and Abusive Language in D utch. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.5
-
[39]
Counter- TWIT : An I talian Corpus for Online Counterspeech in Ecological Contexts
Goffredo, Pierpaolo and Basile, Valerio and Cepollaro, Bianca and Patti, Viviana. Counter- TWIT : An I talian Corpus for Online Counterspeech in Ecological Contexts. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.6
-
[40]
S tereo KG : Data-Driven Knowledge Graph Construction For Cultural Knowledge and Stereotypes
Deshpande, Awantee and Ruiter, Dana and Mosbach, Marius and Klakow, Dietrich. S tereo KG : Data-Driven Knowledge Graph Construction For Cultural Knowledge and Stereotypes. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.7
-
[41]
Lu, Christina and Jurgens, David. The subtle language of exclusion: Identifying the Toxic Speech of Trans-exclusionary Radical Feminists. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.8
-
[42]
Lost in Distillation: A Case Study in Toxicity Modeling
Chvasta, Alyssa and Lees, Alyssa and Sorensen, Jeffrey and Vasserman, Lucy and Goyal, Nitesh. Lost in Distillation: A Case Study in Toxicity Modeling. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.9
-
[43]
Cleansing & expanding the HURTLEX (el) with a multidimensional categorization of offensive words
Stamou, Vivian and Alexiou, Iakovi and Klimi, Antigone and Molou, Eleftheria and Saivanidou, Alexandra and Markantonatou, Stella. Cleansing & expanding the HURTLEX (el) with a multidimensional categorization of offensive words. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.10
-
[44]
Free speech or Free Hate Speech? Analyzing the Proliferation of Hate Speech in Parler
Israeli, Abraham and Tsur, Oren. Free speech or Free Hate Speech? Analyzing the Proliferation of Hate Speech in Parler. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.11
-
[45]
Resources for Multilingual Hate Speech Detection
Arango Monnar, Ayme and Perez, Jorge and Poblete, Barbara and Salda \ n a, Magdalena and Proust, Valentina. Resources for Multilingual Hate Speech Detection. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.12
-
[46]
Enriching Abusive Language Detection with Community Context
Saleem, Haji Mohammad and Kurrek, Jana and Ruths, Derek. Enriching Abusive Language Detection with Community Context. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.13
-
[47]
DeTox: A Comprehensive Dataset for G erman Offensive Language and Conversation Analysis
Demus, Christoph and Pitz, Jonas and Sch. DeTox: A Comprehensive Dataset for G erman Offensive Language and Conversation Analysis. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.14
-
[48]
Multilingual H ate C heck: Functional Tests for Multilingual Hate Speech Detection Models
R. Multilingual H ate C heck: Functional Tests for Multilingual Hate Speech Detection Models. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.15
-
[49]
Distributional properties of political dogwhistle representations in S wedish BERT
Hertzberg, Niclas and Cooper, Robin and Lindgren, Elina and R. Distributional properties of political dogwhistle representations in S wedish BERT. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.16
-
[50]
Hate Speech Criteria: A Modular Approach to Task-Specific Hate Speech Definitions
Khurana, Urja and Vermeulen, Ivar and Nalisnick, Eric and Van Noorloos, Marloes and Fokkens, Antske. Hate Speech Criteria: A Modular Approach to Task-Specific Hate Speech Definitions. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.17
-
[51]
Accounting for Offensive Speech as a Practice of Resistance
Diaz, Mark and Amironesei, Razvan and Weidinger, Laura and Gabriel, Iason. Accounting for Offensive Speech as a Practice of Resistance. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.18
-
[52]
Zheng, Joan and Friedman, Scott and Schmer-galunder, Sonja and Magnusson, Ian and Wheelock, Ruta and Gottlieb, Jeremy and Gomez, Diana and Miller, Christopher. Towards a Multi-Entity Aspect-Based Sentiment Analysis for Characterizing Directed Social Regard in Online Messaging. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:1...
-
[53]
Flexible text generation for counterfactual fairness probing
Fryer, Zee and Axelrod, Vera and Packer, Ben and Beutel, Alex and Chen, Jilin and Webster, Kellie. Flexible text generation for counterfactual fairness probing. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.20
-
[54]
Users Hate Blondes: Detecting Sexism in User Comments on Online R omanian News
Moldovan, Andreea and Cs. Users Hate Blondes: Detecting Sexism in User Comments on Online R omanian News. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.21
-
[55]
Targeted Identity Group Prediction in Hate Speech Corpora
Sachdeva, Pratik and Barreto, Renata and Von Vacano, Claudia and Kennedy, Chris. Targeted Identity Group Prediction in Hate Speech Corpora. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.22
-
[56]
Revisiting Queer Minorities in Lexicons
Ramesh, Krithika and Kumar, Sumeet and Khudabukhsh, Ashiqur. Revisiting Queer Minorities in Lexicons. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.23
-
[57]
HATE - ITA : Hate Speech Detection in I talian Social Media Text
Nozza, Debora and Bianchi, Federico and Attanasio, Giuseppe. HATE - ITA : Hate Speech Detection in I talian Social Media Text. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.24
-
[58]
Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022
2022
-
[59]
Changes in Tweet Geolocation over Time: A Study with Carmen 2.0
Zhang, Jingyu and DeLucia, Alexandra and Dredze, Mark. Changes in Tweet Geolocation over Time: A Study with Carmen 2.0. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022
2022
-
[60]
Extracting Mathematical Concepts from Text
Collard, Jacob and de Paiva, Valeria and Fong, Brendan and Subrahmanian, Eswaran. Extracting Mathematical Concepts from Text. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022
2022
-
[61]
Data-driven Approach to Differentiating between Depression and Dementia from Noisy Speech and Language Data
Ehghaghi, Malikeh and Rudzicz, Frank and Novikova, Jekaterina. Data-driven Approach to Differentiating between Depression and Dementia from Noisy Speech and Language Data. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022
2022
-
[62]
Cross-Dialect Social Media Dependency Parsing for Social Scientific Entity Attribute Analysis
Eggleston, Chloe and O ' Connor, Brendan. Cross-Dialect Social Media Dependency Parsing for Social Scientific Entity Attribute Analysis. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022
2022
-
[63]
Impact of Environmental Noise on A lzheimer ' s Disease Detection from Speech: Should You Let a Baby Cry?
Novikova, Jekaterina. Impact of Environmental Noise on A lzheimer ' s Disease Detection from Speech: Should You Let a Baby Cry?. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022
2022
-
[64]
Exploring Multimodal Features and Fusion Strategies for Analyzing Disaster Tweets
Pranesh, Raj. Exploring Multimodal Features and Fusion Strategies for Analyzing Disaster Tweets. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022
2022
-
[65]
NTULM : Enriching Social Media Text Representations with Non-Textual Units
Li, Jinning and Mishra, Shubhanshu and El-Kishky, Ahmed and Mehta, Sneha and Kulkarni, Vivek. NTULM : Enriching Social Media Text Representations with Non-Textual Units. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022
2022
-
[66]
Robust Candidate Generation for Entity Linking on Short Social Media Texts
Hebert, Liam and Makki, Raheleh and Mishra, Shubhanshu and Saghir, Hamidreza and Kamath, Anusha and Merhav, Yuval. Robust Candidate Generation for Entity Linking on Short Social Media Texts. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022
2022
-
[67]
T rans POS : Transformers for Consolidating Different POS Tagset Datasets
Li, Alex and Bankole-Hameed, Ilyas and Singh, Ranadeep and Ng, Gabriel and Gupta, Akshat. T rans POS : Transformers for Consolidating Different POS Tagset Datasets. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022
2022
-
[68]
An Effective, Performant Named Entity Recognition System for Noisy Business Telephone Conversation Transcripts
Fu, Xue-Yong and Chen, Cheng and Laskar, Md Tahmid Rahman and Tn, Shashi Bhushan and Corston-Oliver, Simon. An Effective, Performant Named Entity Recognition System for Noisy Business Telephone Conversation Transcripts. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022
2022
-
[69]
Leveraging Semantic and Sentiment Knowledge for User-Generated Text Sentiment Classification
Khan, Jawad and Ahmad, Niaz and Alam, Aftab and Lee, Youngmoon. Leveraging Semantic and Sentiment Knowledge for User-Generated Text Sentiment Classification. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022
2022
-
[70]
An Emotional Journey: Detecting Emotion Trajectories in D utch Customer Service Dialogues
Labat, Sofie and Hadifar, Amir and Demeester, Thomas and Hoste, Veronique. An Emotional Journey: Detecting Emotion Trajectories in D utch Customer Service Dialogues. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022
2022
-
[71]
Supervised and Unsupervised Evaluation of Synthetic Code-Switching
Orlov, Evgeny and Artemova, Ekaterina. Supervised and Unsupervised Evaluation of Synthetic Code-Switching. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022
2022
-
[72]
A rab G end: Gender Analysis and Inference on A rabic T witter
Mubarak, Hamdy and Chowdhury, Shammur Absar and Alam, Firoj. A rab G end: Gender Analysis and Inference on A rabic T witter. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022
2022
-
[73]
Automatic Identification of 5 C Vaccine Behaviour on Social Media
Sampath Kumar, Ajay Hemanth and Shausan, Aminath and Demartini, Gianluca and Rahimi, Afshin. Automatic Identification of 5 C Vaccine Behaviour on Social Media. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022
2022
-
[74]
Automatic Extraction of Structured Mineral Drillhole Results from Unstructured Mining Company Reports
Dimeski, Adam and Rahimi, Afshin. Automatic Extraction of Structured Mineral Drillhole Results from Unstructured Mining Company Reports. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022
2022
-
[75]
`` Kanglish alli names! '' Named Entity Recognition for K annada- E nglish Code-Mixed Social Media Data
S, Sumukh and Shrivastava, Manish. `` Kanglish alli names! '' Named Entity Recognition for K annada- E nglish Code-Mixed Social Media Data. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022
2022
-
[76]
Span Extraction Aided Improved Code-mixed Sentiment Classification
S, Ramaneswaran and Benhur, Sean and Ghosh, Sreyan. Span Extraction Aided Improved Code-mixed Sentiment Classification. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022
2022
-
[77]
A d BERT : An Effective Few Shot Learning Framework for Aligning Tweets to Superbowl Advertisements
Das, Debarati and Chenchu, Roopana and Abdollahi, Maral and Huh, Jisu and Srivastava, Jaideep. A d BERT : An Effective Few Shot Learning Framework for Aligning Tweets to Superbowl Advertisements. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022
2022
-
[78]
Increasing Robustness for Cross-domain Dialogue Act Classification on Social Media Data
Vielsted, Marcus and Wallenius, Nikolaj and van der Goot, Rob. Increasing Robustness for Cross-domain Dialogue Act Classification on Social Media Data. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022
2022
-
[79]
Disfluency Detection for V ietnamese
Dao, Mai Hoang and Truong, Thinh Hung and Nguyen, Dat Quoc. Disfluency Detection for V ietnamese. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022
2022
-
[80]
A multi-level approach for hierarchical Ticket Classification
Marcuzzo, Matteo and Zangari, Alessandro and Schiavinato, Michele and Giudice, Lorenzo and Gasparetto, Andrea and Albarelli, Andrea. A multi-level approach for hierarchical Ticket Classification. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022
2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.