Recognition: unknown
MetFuse: Figurative Fusion between Metonymy and Metaphor
Pith reviewed 2026-05-10 16:19 UTC · model grok-4.3
The pith
A dataset of sentences blending metonymy and metaphor shows hybrids make metonymic references easier to spot for both humans and models.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors construct a framework that rewrites literal sentences into metonymic, metaphoric, and hybrid figurative variants, then release the resulting MetFuse collection of one thousand meaning-aligned quadruplets. Experiments show that augmenting existing training sets with these examples raises accuracy on metonymy and metaphor classification benchmarks, with hybrid sentences supplying the strongest signal for metonymy detection. Targeted analysis further demonstrates that the co-occurrence of metaphor renders the metonymic noun more salient, improving recognition rates for both human annotators and large language models.
What carries the argument
The sentence transformation framework that produces meaning-aligned quadruplets containing a literal sentence, a metonymy-only variant, a metaphor-only variant, and a hybrid variant that combines both figures.
If this is right
- Adding hybrid figurative examples raises metonymy classification accuracy more than adding purely metonymic or purely metaphoric examples.
- Models trained on MetFuse-augmented data outperform models trained only on prior figurative datasets across eight separate benchmarks.
- The presence of a metaphor alongside metonymy increases the explicitness of the metonymic reference for both human readers and large language models.
- Systems can exploit interactions between figurative types rather than treating metonymy and metaphor as fully independent phenomena.
Where Pith is reading between the lines
- Treating multiple figurative devices jointly during data creation may improve performance on broader figurative-language understanding tasks beyond isolated classification.
- The same transformation approach could be extended to other trope combinations to test whether cross-figure context generally reduces ambiguity.
- Controlled fusion of linguistic phenomena offers a route to richer training signals in settings where pure examples are scarce.
Load-bearing premise
The generated figurative sentences stay natural and preserve original meaning, and observed gains come specifically from the fused figurative content rather than from simply adding more data or from generation artifacts.
What would settle it
An experiment that adds an equal volume of randomly generated or non-figurative sentences to the same training sets and finds comparable or larger gains on the metonymy and metaphor benchmarks would falsify the claim that the specific figurative fusions drive the improvements.
Figures
read the original abstract
Metonymy and metaphor often co-occur in natural language, yet computational work has studied them largely in isolation. We introduce a framework that transforms a literal sentence into three figurative variants: metonymic, metaphoric, and hybrid. Using this framework, we construct MetFuse, the first dedicated dataset of figurative fusion between metonymy and metaphor, containing 1,000 human-verified meaning-aligned quadruplets totaling 4,000 sentences. Extrinsic experiments on eight existing benchmarks show that augmenting training data with MetFuse consistently improves both metonymy and metaphor classification, with hybrid examples yielding the largest gains on metonymy tasks. Using this dataset, we also analyze how the presence of one figurative type impacts another. Our findings show that both human annotators and large language models better identify metonymy in hybrid sentences than in metonymy-only sentences, demonstrating that the presence of a metaphor makes a metonymic noun more explicit. Our dataset is publicly available at: https://github.com/cincynlp/MetFuse.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces a framework to transform literal sentences into metonymic, metaphoric, and hybrid figurative variants, constructs the MetFuse dataset of 1,000 human-verified meaning-aligned quadruplets (4,000 sentences total), and reports extrinsic experiments on eight benchmarks showing that augmentation with MetFuse (particularly hybrids) improves metonymy and metaphor classification. It further claims that both humans and LLMs identify metonymy more readily in hybrid sentences than in metonymy-only ones, attributing this to the metaphor making the metonymic noun more explicit.
Significance. If the central claims hold after addressing controls, this provides the first dedicated resource for figurative fusion and empirical evidence of interaction effects between metonymy and metaphor, which have largely been studied in isolation. The public dataset release and the analysis of how one figurative device affects another represent clear strengths that could support improved figurative language models in NLP.
major comments (3)
- [Experiments] Experiments section: The augmentation results claim consistent benchmark gains from MetFuse (with hybrids largest on metonymy tasks) but include no ablations matching the added example count using literal sentences, duplicates, or non-figurative generations from the same framework; this leaves open whether gains arise from figurative fusion content or simply increased data volume, as noted in the stress-test concern.
- [Dataset Construction] Dataset construction and human verification: Verification is limited to meaning alignment of quadruplets with no reported inter-annotator agreement, quantification of annotator process, or metrics for generation quality/naturalness; this is load-bearing for the claim that the 4,000 sentences accurately represent real-world figurative usage without artifacts that classifiers could exploit.
- [Analysis] Analysis of figurative interaction: The claim that hybrids make metonymy more explicit for humans and LLMs lacks statistical tests, effect sizes, or controls for potential syntactic/lexical confounds introduced by the transformation framework, weakening the interaction-effect conclusion.
minor comments (2)
- The abstract states improvements on eight benchmarks but does not name them; adding an explicit list or reference to a table early in the paper would improve clarity.
- [Framework] Notation for the quadruplets (literal, metonymy, metaphor, hybrid) is clear in the abstract but could be formalized with a small example table in the framework description for readers unfamiliar with the distinctions.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed feedback. We address each major comment below and will revise the manuscript to incorporate the suggested improvements.
read point-by-point responses
-
Referee: [Experiments] Experiments section: The augmentation results claim consistent benchmark gains from MetFuse (with hybrids largest on metonymy tasks) but include no ablations matching the added example count using literal sentences, duplicates, or non-figurative generations from the same framework; this leaves open whether gains arise from figurative fusion content or simply increased data volume, as noted in the stress-test concern.
Authors: We agree that the current experiments lack necessary controls for data volume. In the revised manuscript, we will add ablations augmenting training sets with matched numbers of literal sentences, duplicates, and non-figurative outputs from the framework. These results will be reported in the Experiments section to demonstrate that gains derive from figurative fusion rather than volume alone. revision: yes
-
Referee: [Dataset Construction] Dataset construction and human verification: Verification is limited to meaning alignment of quadruplets with no reported inter-annotator agreement, quantification of annotator process, or metrics for generation quality/naturalness; this is load-bearing for the claim that the 4,000 sentences accurately represent real-world figurative usage without artifacts that classifiers could exploit.
Authors: We acknowledge this gap in reporting. The revised Dataset Construction section will include details on the annotator process (number of annotators, guidelines, and time invested), inter-annotator agreement metrics, and human ratings for naturalness and quality on a sampled subset. This will better substantiate the dataset's fidelity to real-world usage. revision: yes
-
Referee: [Analysis] Analysis of figurative interaction: The claim that hybrids make metonymy more explicit for humans and LLMs lacks statistical tests, effect sizes, or controls for potential syntactic/lexical confounds introduced by the transformation framework, weakening the interaction-effect conclusion.
Authors: We accept this critique. The revised Analysis section will incorporate statistical tests (e.g., paired significance tests) and effect sizes for the identification differences. We will also add confound-controlled analyses using lexically and syntactically matched sentence subsets to isolate the interaction effect between metonymy and metaphor. revision: yes
Circularity Check
No significant circularity; empirical dataset construction and extrinsic evaluation are self-contained
full rationale
The paper's core contributions consist of a sentence transformation framework that produces metonymic, metaphoric, and hybrid variants from literal inputs, followed by human verification to create the MetFuse dataset of 1,000 quadruplets and extrinsic augmentation experiments on eight independent benchmarks. No mathematical derivations, first-principles predictions, or fitted parameters are claimed; performance gains and human/LLM identification analyses are measured directly against external classification tasks. The pipeline does not reduce any result to its own inputs by construction, nor does it rely on self-citation chains or uniqueness theorems for load-bearing justification. Claims rest on new data and observable improvements rather than self-referential loops.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Human verification ensures the generated figurative variants are meaning-aligned with the original literal sentences
- ad hoc to paper Performance improvements on the eight benchmarks are attributable to the figurative fusion content rather than confounding factors such as data size
invented entities (1)
-
MetFuse dataset and generation framework
no independent evidence
Forward citations
Cited by 1 Pith paper
-
Exploring Concreteness Through a Figurative Lens
LLMs compress concreteness into a consistent 1D direction in mid-to-late layers that separates literal from figurative noun uses and supports efficient classification plus steering.
Reference graph
Works this paper leans on
-
[1]
online" 'onlinestring :=
ENTRY address archivePrefix author booktitle chapter edition editor eid eprint eprinttype howpublished institution journal key month note number organization pages publisher school series title type volume year doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRING...
-
[2]
write newline
" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...
- [3]
-
[4]
Kevin Alex Mathews and Michael Strube. 2020. https://aclanthology.org/2020.lrec-1.697 A large harvested corpus of location metonymy . In Proceedings of the Twelfth Language Resources and Evaluation Conference (LREC 2020)
2020
-
[5]
Antonio Barcelona. 2003. Metaphor and Metonymy at the Crossroads: A Cognitive Perspective. Mouton de Gruyter
2003
-
[6]
Julia Birke and Anoop Sarkar. 2007. https://aclanthology.org/W07-0104 Active learning for the identification of nonliteral language . In Proceedings of the Workshop on Computational Approaches to Figurative Language, pages 21--28, Rochester, New York. Association for Computational Linguistics
2007
-
[7]
Clarke Brigitte Nerlich
David D. Clarke Brigitte Nerlich. 2001. https://www.sciencedirect.com/science/article/pii/S0378216699001320 Ambiguities we live by: towards a pragmatics of polysemy . Journal of Pragmatics, 33:1--20
2001
-
[8]
Jose Camacho-collados, Kiamehr Rezaee, Talayeh Riahi, Asahi Ushio, Daniel Loureiro, Dimosthenis Antypas, Joanne Boisson, Luis Espinosa Anke, Fangyu Liu, and Eugenio Mart \'i nez C \'a mara. 2022. https://aclanthology.org/2022.emnlp-demos.5/ T weet NLP : Cutting-edge natural language processing for social media . In Proceedings of the 2022 Conference on Em...
2022
-
[9]
Tuhin Chakrabarty, Yejin Choi, and Vered Shwartz. 2022 a . https://aclanthology.org/2022.tacl-1.34/ It ' s not rocket science: Interpreting figurative language in narratives . Transactions of the Association for Computational Linguistics, 10
2022
-
[10]
Tuhin Chakrabarty, Smaranda Muresan, and Nanyun Peng. 2020. https://doi.org/10.18653/v1/2020.emnlp-main.524 Generating similes effortlessly like a pro: A style transfer approach for simile generation . In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020)
-
[11]
Tuhin Chakrabarty, Arkadiy Saakyan, Debanjan Ghosh, and Smaranda Muresan. 2022 b . https://aclanthology.org/2022.emnlp-main.481/ FLUTE : Figurative language understanding through textual explanations . In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022)
2022
-
[12]
Tuhin Chakrabarty, Xurui Zhang, Smaranda Muresan, and Nanyun Peng. 2021. https://doi.org/10.18653/v1/2021.naacl-main.336 MERMAID : Metaphor generation with symbolism and discriminative decoding . In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL 2021)
-
[13]
Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, Luke Marris, Sam Petulla, and others. 2025. https://arxiv.org/abs/2507.06261 Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities . Pr...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[14]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. https://doi.org/10.18653/v1/N19-1423 BERT : Pre-training of deep bidirectional transformers for language understanding . In Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL 2019)
-
[15]
Saptarshi Ghosh and Tianyu Jiang. 2025. https://doi.org/10.18653/v1/2025.naacl-long.330 C on M e C : A dataset for metonymy resolution with common nouns . In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL 2025)
-
[16]
Saptarshi Ghosh, Linfeng Liu, and Tianyu Jiang. 2026. https://doi.org/10.18653/v1/2026.eacl-long.92 A computational approach to visual metonymy . In Proceedings of the 19th Conference of the E uropean Chapter of the A ssociation for C omputational L inguistics (EACL 2026)
-
[17]
Louis Goossens. 1990. https://doi.org/10.1515/cogl.1990.1.3.323 Metaphtonymy: The interaction of metaphor and metonymy in expressions for linguistic action . Cognitive Linguistics, 1(3):323--342
-
[18]
Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, Amy Yang, Angela Fan, and others. 2024. https://arxiv.org/abs/2407.21783 The llama 3 herd of models . Preprint, arXiv:2407.21783
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[19]
Milan Gritta, Mohammad Taher Pilehvar, Nut Limsopatham, and Nigel Collier. 2017. https://aclanthology.org/P17-1115 V ancouver welcomes you! minimalist location metonymy resolution . In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL 2017)
2017
- [20]
-
[21]
metonymy and pragmatic inferencing
Javier Herrero-Ruiz. 2004. https://doi.org/10.18172/jes.100 Panther, k-u., and thornburg. eds. 2003. "metonymy and pragmatic inferencing". amsterdam/philadelphia: John benjamins. 280 pp. Journal of English Studies, 4:237
-
[22]
Maria Jodlowiec and Agnieszka Piskorska. 2015. https://doi.org/10.1515/ip-2015-0009 Metonymy revisited: Towards a new relevance-theoretic account . Intercultural Pragmatics, 12
-
[23]
Zoltán Kövecses. 2010. https://doi.org/10.1093/acprof:oso/9780190224868.003.0005 Metaphor and culture . 2:197--220
work page doi:10.1093/acprof:oso/9780190224868.003.0005 2010
-
[24]
George Lakoff and Mark Johnson. 1980. Metaphors We Live By. University of Chicago Press, Chicago
1980
-
[25]
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. https://arxiv.org/abs/1907.11692 Roberta: A robustly optimized bert pretraining approach . Preprint, arXiv:1907.11692
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[26]
Katja Markert and Malvina Nissim. 2007. https://aclanthology.org/S07-1007 S em E val-2007 task 08: Metonymy resolution at S em E val-2007 . In Proceedings of the Fourth International Workshop on Semantic Evaluations ( S em E val 2007)
2007
-
[27]
Rowan Hall Maudslay, Simone Teufel, Francis Bond, and James Pustejovsky. 2024. https://aclanthology.org/2024.lrec-main.266/ C hain N et: Structured metaphor and metonymy in W ord N et . In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
2024
-
[28]
Saif Mohammad, Ekaterina Shutova, and Peter Turney. 2016. https://doi.org/10.18653/v1/S16-2003 Metaphor as a medium for emotion: An empirical study . In Proceedings of the Fifth Joint Conference on Lexical and Computational Semantics, pages 23--33, Berlin, Germany. Association for Computational Linguistics
-
[29]
Vivi Nastase and Michael Strube. 2009. https://aclanthology.org/D09-1095 Combining collocations, lexical and encyclopedic knowledge for metonymy resolution . In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (EMNLP 2009)
2009
-
[30]
Geoffrey Nunberg. 1995. https://doi.org/10.1093/jos/12.2.109 Transfers of meaning . Journal of Semantics - J SEMANT, 12:109--132
-
[31]
gpt-oss-120b & gpt-oss-20b Model Card
OpenAI, :, Sandhini Agarwal, Lama Ahmad, Jason Ai, Sam Altman, Andy Applebaum, Edwin Arbus, Rahul K. Arora, Yu Bai, Bowen Baker, Haiming Bao, Boaz Barak, Ally Bennett, Tyler Bertao, and others. 2025. https://arxiv.org/abs/2508.10925 gpt-oss-120b gpt-oss-20b model card . Preprint, arXiv:2508.10925
work page internal anchor Pith review Pith/arXiv arXiv 2025
- [32]
-
[33]
Paolo Pedinotti and Alessandro Lenci. 2020. https://doi.org/10.18653/v1/2020.coling-main.602 Don't invite BERT to drink a bottle: Modeling the interpretation of metonymies using BERT and distributional representations . In Proceedings of the 28th International Conference on Computational Linguistics (COLING 2020)
-
[34]
Günter Radden and Zoltán Kövecses. 1999. https://doi.org/10.1075/hcp.4.03rad Towards a theory of metonymy . Metonymy in Language and Thought, pages 17--59
-
[35]
Nils Reimers and Iryna Gurevych. 2019. https://doi.org/10.18653/v1/D19-1410 Sentence- BERT : Sentence embeddings using S iamese BERT -networks . In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP 2019)
-
[36]
Francisco Ruiz de Mendoza. 2002. Metonymy, Grammar, and Communication
2002
-
[37]
Francisco Ruiz de Mendoza and Annalisa Baicchi. 2005. https://www.academia.edu/722584/Cognitive_linguistics_internal_dynamics_and_interdisciplinary_interaction Cognitive Linguistics: Internal dynamics and interdisciplinary interaction
2005
-
[38]
G. Steen. 2010. https://books.google.com/books?id=lrc0-OXtnA0C A Method for Linguistic Metaphor Identification: From MIP to MIPVU . Converging evidence in language and communication research. John Benjamins Publishing Company
2010
-
[39]
Kevin Stowe, Nils Beck, and Iryna Gurevych. 2021 a . https://aclanthology.org/2021.conll-1.26/ Exploring metaphoric paraphrase generation . In Proceedings of the 25th Conference on Computational Natural Language Learning (CoNLL 2021)
2021
-
[40]
Kevin Stowe, Tuhin Chakrabarty, Nanyun Peng, Smaranda Muresan, and Iryna Gurevych. 2021 b . https://doi.org/10.18653/v1/2021.acl-long.524 Metaphor generation with conceptual mappings . In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL 2021)
-
[41]
Asuka Terai and Masanori Nakagawa. 2010. https://dl.acm.org/doi/10.5555/1889001.1889022 A computational system of metaphor generation with evaluation mechanism . In Proceedings of the 20th International Conference on Artificial Neural Networks, Berlin, Heidelberg. Springer-Verlag
-
[42]
Yufei Tian, Arvind krishna Sridhar, and Nanyun Peng. 2021. https://aclanthology.org/2021.findings-emnlp.136/ H ypo G en: Hyperbole generation with commonsense and counterfactual knowledge . In Findings of the Association for Computational Linguistics: EMNLP 2021 (Findings of EMNLP 2021)
2021
-
[43]
Mark Turner and Gilles Fauconnier. 2002. The Way We Think: Conceptual Blending And The Mind's Hidden Complexities
2002
-
[44]
An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, Fan Zhou, Fei Huang, and others. 2025. https://arxiv.org/abs/2505.09388 Qwen3 technical report . Preprint, arXiv:2505.09388
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[45]
Zhiwei Yu and Xiaojun Wan. 2019. https://aclanthology.org/N19-1092/ How to avoid sentences spelling boring? towards a neural approach to unsupervised metaphor generation . In Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, (NAACL 2019)
2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.