Applying a Pre-trained Language Model to Spanish Twitter Humor Prediction
Pith reviewed 2026-05-25 01:26 UTC · model grok-4.3
The pith
A language model pre-trained from scratch on Spanish Twitter data transfers effectively to humor prediction, placing third in classification and second in regression for the HAHA 2019 challenge.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We trained a language model from scratch on a large Twitter-based Spanish corpus and transferred that knowledge to our competition model for the HAHA 2019 Challenge, achieving 3rd place in the classification task and 2nd place in the regression task, while using label smoothing in the loss function to address inherent label errors.
What carries the argument
The Spanish Twitter language model pre-trained from scratch, which performs the knowledge transfer to the downstream humor classification and regression tasks.
If this is right
- The same pre-training plus fine-tuning pipeline can be applied to other Spanish social media classification tasks.
- Label smoothing reduces overconfidence on crowdsourced humor labels and improves generalization.
- The released code and model enable direct replication and extension by others on similar Twitter humor datasets.
Where Pith is reading between the lines
- If the Twitter corpus captures dialectal variation well, the method could extend to other regional Spanish varieties with minimal additional data.
- The success against a simple baseline suggests that pre-training scale matters more than task-specific feature engineering for this domain.
- Similar pre-training on other low-resource social media languages might close performance gaps with English systems.
Load-bearing premise
Training a language model from scratch on a large Twitter corpus provides effective knowledge transfer to the humor prediction task despite potential label noise addressed by smoothing.
What would settle it
A replication that trains the same downstream model from random initialization on the HAHA data alone and matches or exceeds the reported rankings would indicate the pre-training step adds little value.
Figures
read the original abstract
Our entry into the HAHA 2019 Challenge placed $3^{rd}$ in the classification task and $2^{nd}$ in the regression task. We describe our system and innovations, as well as comparing our results to a Naive Bayes baseline. A large Twitter based corpus allowed us to train a language model from scratch focused on Spanish and transfer that knowledge to our competition model. To overcome the inherent errors in some labels we reduce our class confidence with label smoothing in the loss function. All the code for our project is included in a GitHub repository for easy reference and to enable replication by others.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper reports their entry into the HAHA 2019 Challenge, which placed 3rd in the classification task and 2nd in the regression task. The system pre-trains a language model from scratch on a large Spanish Twitter corpus, transfers the knowledge to the humor prediction task, and applies label smoothing in the loss function to address label noise; results are compared to a Naive Bayes baseline, with all code released on GitHub.
Significance. If the reported competition rankings hold, the work provides a concrete demonstration of the value of domain-specific pre-training on Twitter data for Spanish social-media NLP tasks and the practical application of label smoothing for noisy supervision. The explicit release of replication code is a strength that supports verification and reuse.
minor comments (3)
- [Abstract, Results] Abstract and results sections report only the final competition rankings without the underlying metrics (e.g., F1, accuracy, or RMSE values) achieved by the submitted system or the Naive Bayes baseline. Including these numbers would allow readers to assess the magnitude of improvement independently of the external ranking.
- [Method] The description of the pre-training procedure, model architecture, and hyper-parameters is high-level; while the GitHub repository is referenced, key details (corpus size, training steps, smoothing parameter value) should be stated in the paper for self-contained reading.
- [Results] No error analysis or qualitative examples of predictions are provided, which would help explain the sources of the reported performance.
Simulated Author's Rebuttal
We thank the referee for their review of our HAHA 2019 submission. The referee summary accurately reflects the paper's content, and we appreciate the positive assessment of the domain-specific pre-training and code release. No major comments were provided in the report.
Circularity Check
No significant circularity
full rationale
The paper reports empirical competition rankings (3rd classification, 2nd regression) achieved by a described system of Twitter LM pretraining, transfer learning, and label smoothing. No mathematical derivation, equations, or fitted-parameter predictions are present; the central claims are factual statements about external challenge results and are supported by a public GitHub repository. No self-citation chains, self-definitional steps, or reductions of outputs to inputs by construction exist.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Quasi-Recurrent Neural Networks
Bradbury, J., Merity, S., Xiong, C., Socher, R.: Quasi-recurrent neural networks. CoRR abs/1611.01576 (2016), http://arxiv.org/abs/1611.01576
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[2]
Castro, S., Chiruzzo, L., Ros´ a, A., Garat, D., Moncecchi, G.: A crowd-annotated spanish corpus for humor analysis. In: Proceedings of the Sixth International Work- shop on Natural Language Processing for Social Media. pp. 7–11 (2018) 9 http://forums.fasta.ai
work page 2018
- [3]
-
[4]
In: Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019)
Chiruzzo, L., Castro, S., Etcheverry, M., Garat, D., Prada, J.J., Ros´ a, A.: Overview of HAHA at IberLEF 2019: Humor Analysis based on Human Annotation. In: Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019). CEUR Workshop Proceedings, CEUR-WS, Bilbao, Spain (9 2019)
work page 2019
-
[5]
Universal Language Model Fine-Tuning with Subword Tokenization for Polish
Czapla, P., Howard, J., Kardas, M.: Universal language model fine-tuning with subword tokenization for polish. CoRR abs/1810.10222 (2018), http://arxiv. org/abs/1810.10222
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[6]
Universal Language Model Fine-tuning for Text Classification
Howard, J., Ruder, S.: Universal language model fine-tuning for text classification. CoRR abs/1801.06146 (2018), http://arxiv.org/abs/1801.06146
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[7]
Kudo, T., Richardson, J.: Sentencepiece: A simple and language independent sub- word tokenizer and detokenizer for neural text processing. CoRRabs/1808.06226 (2018), http://arxiv.org/abs/1808.06226
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[8]
Regularizing and Optimizing LSTM Language Models
Merity, S., Keskar, N.S., Socher, R.: Regularizing and optimizing LSTM language models. CoRR abs/1708.02182 (2017), http://arxiv.org/abs/1708.02182
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[9]
Regularizing Neural Networks by Penalizing Confident Output Distributions
Pereyra, G., Tucker, G., Chorowski, J., Kaiser, L., Hinton, G.E.: Regularizing neu- ral networks by penalizing confident output distributions. CoRRabs/1701.06548 (2017), http://arxiv.org/abs/1701.06548
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[10]
Using the Output Embedding to Improve Language Models
Press, O., Wolf, L.: Using the output embedding to improve language models. CoRR abs/1608.05859 (2016), http://arxiv.org/abs/1608.05859
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[11]
Smith, L.N.: A disciplined approach to neural network hyper-parameters: Part 1 - learning rate, batch size, momentum, and weight decay. CoRR abs/1803.09820 (2018), http://arxiv.org/abs/1803.09820
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[12]
Wang, S., Manning, C.D.: Baselines and bigrams: Simple, good sentiment and topic classification. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2. pp. 90–94. ACL ’12, Association for Computational Linguistics, Stroudsburg, PA, USA (2012), http: //dl.acm.org/citation.cfm?id=2390665.2390688
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.