Language Modelling Makes Sense: Propagating Representations through WordNet for Full-Coverage Word Sense Disambiguation
Pith reviewed 2026-05-25 17:25 UTC · model grok-4.3
The pith
Propagating contextual embeddings through WordNet produces sense-level vectors that let a simple nearest-neighbor method outperform neural sequence models on word sense disambiguation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Contextual embeddings from neural language models can be propagated through WordNet relations to produce sense-level embeddings with full coverage of the sense inventory. These embeddings require no explicit knowledge of sense distributions and no task-specific modelling. As a result a simple k-NN method using them consistently surpasses the performance of previous systems that employ powerful neural sequencing models.
What carries the argument
Propagation of contextual embeddings through WordNet relations to generate sense-level vectors
If this is right
- A k-NN classifier on the sense embeddings outperforms previous neural WSD systems.
- The method remains effective when part-of-speech and lemma features are ignored.
- Full-inventory disambiguation is possible without recourse to sense-frequency data.
- The resulting sense embeddings enable concept-level analyses of contextual embeddings and their source language models.
Where Pith is reading between the lines
- The propagation technique could be applied to other lexical knowledge bases to test whether similar sense coverage emerges.
- Comparing the sense vectors against human sense similarity judgments would reveal how faithfully the propagation preserves semantic distance.
- Downstream tasks requiring fine-grained meaning, such as semantic role labeling, could benefit from substituting these vectors for raw contextual embeddings.
Load-bearing premise
Propagating contextual embeddings through WordNet relations produces accurate sense-level vectors that preserve the distinctions needed for disambiguation without any sense-frequency information or task-specific training.
What would settle it
A nearest-neighbor classifier using the propagated sense embeddings failing to exceed the accuracy of prior neural WSD systems on standard benchmarks such as SemEval would falsify the central claim.
Figures
read the original abstract
Contextual embeddings represent a new generation of semantic representations learned from Neural Language Modelling (NLM) that addresses the issue of meaning conflation hampering traditional word embeddings. In this work, we show that contextual embeddings can be used to achieve unprecedented gains in Word Sense Disambiguation (WSD) tasks. Our approach focuses on creating sense-level embeddings with full-coverage of WordNet, and without recourse to explicit knowledge of sense distributions or task-specific modelling. As a result, a simple Nearest Neighbors (k-NN) method using our representations is able to consistently surpass the performance of previous systems using powerful neural sequencing models. We also analyse the robustness of our approach when ignoring part-of-speech and lemma features, requiring disambiguation against the full sense inventory, and revealing shortcomings to be improved. Finally, we explore applications of our sense embeddings for concept-level analyses of contextual embeddings and their respective NLMs.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that contextual embeddings from neural language models can be propagated through WordNet relations to produce full-coverage sense-level vectors. These vectors enable a simple, parameter-free k-NN classifier to outperform prior neural WSD systems on standard benchmarks without using sense-frequency information or task-specific training. The work also includes robustness analyses (ignoring POS/lemma features, full-inventory disambiguation) and applications to concept-level analysis of NLMs.
Significance. If the empirical results hold, the work is significant because it shows that sense distinctions can be recovered from existing contextual embeddings and a static lexical resource in a fully unsupervised manner, yielding a simpler and stronger baseline than complex sequence models. The parameter-free nature and full WordNet coverage are notable strengths; the approach also supplies a tool for inspecting what NLMs have learned at the concept level.
minor comments (3)
- §3 (method): the precise propagation procedure (which relations, number of hops, aggregation function, handling of cycles) should be stated with pseudocode or a small worked example so that the construction of the sense vectors is fully reproducible from the text alone.
- Table 2 / §4.2: report the number of senses per lemma in the evaluation sets and confirm that the k-NN lookup is performed over the entire WordNet inventory rather than a reduced candidate set; this directly affects the strength of the 'full-coverage' claim.
- §5 (analysis): the robustness experiments that drop POS and lemma features are valuable, but the paper should also report the corresponding drop in the strongest neural baselines for direct comparison.
Simulated Author's Rebuttal
We thank the referee for the positive summary, significance assessment, and recommendation of minor revision. No major comments were provided in the report.
Circularity Check
No significant circularity
full rationale
The derivation relies on external pre-trained contextual embeddings (from NLMs) and the independent WordNet graph for propagation to produce sense vectors, followed by a parameter-free k-NN. No equation or step reduces by construction to a fitted input, self-definition, or self-citation chain; the performance claim is tested against external WSD benchmarks without sense-frequency data or task-specific training. The approach is self-contained against those benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Contextual embeddings from NLMs separate word senses according to local context.
Reference graph
Works this paper leans on
-
[1]
Sanjeev Arora, Yingyu Liang, and Tengyu Ma. 2017. https://openreview.net/forum?id=SyK00v5xx A simple but tough-to-beat baseline for sentence embeddings . In International Conference on Learning Representations (ICLR)
work page 2017
-
[2]
Pierpaolo Basile, Annalina Caputo, and Giovanni Semeraro. 2014. https://www.aclweb.org/anthology/C14-1151 An enhanced L esk word sense disambiguation algorithm through a distributional semantic model . In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers , pages 1591--1600, Dublin, Ireland. Dublin...
work page 2014
-
[3]
Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. https://doi.org/10.1162/tacl_a_00051 Enriching word vectors with subword information . Transactions of the Association for Computational Linguistics, 5:135--146
-
[4]
Tolga Bolukbasi, Kai-Wei Chang, James Zou, Venkatesh Saligrama, and Adam Kalai. 2016. http://dl.acm.org/citation.cfm?id=3157382.3157584 Man is to computer programmer as woman is to homemaker? debiasing word embeddings . In Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS'16, pages 4356--4364, USA. Curran Asso...
-
[5]
Aylin Caliskan, Joanna J. Bryson, and Arvind Narayanan. 2017. https://doi.org/10.1126/science.aal4230 Semantics derived automatically from language corpora contain human-like biases . Science, 356(6334):183--186
-
[6]
Jose Camacho-Collados and Mohammad Taher Pilehvar. 2018. https://doi.org/10.1613/jair.1.11259 From word to sense embeddings: A survey on vector representations of meaning . J. Artif. Int. Res., 63(1):743--788
-
[7]
Jose Camacho-Collados, Mohammad Taher Pilehvar, and Roberto Navigli. 2016. https://doi.org/https://doi.org/10.1016/j.artint.2016.07.005 Nasari: Integrating explicit knowledge and corpus statistics for a multilingual representation of concepts and entities . Artificial Intelligence, 240:36 -- 64
-
[8]
Xinxiong Chen, Zhiyuan Liu, and Maosong Sun. 2014. https://doi.org/10.3115/v1/D14-1110 A unified model for word sense representation and disambiguation . In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing ( EMNLP ) , pages 1025--1035, Doha, Qatar. Association for Computational Linguistics
-
[9]
Jacob Devlin, Ming - Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. http://arxiv.org/abs/1810.04805v1 BERT: pre-training of deep bidirectional transformers for language understanding . CoRR, abs/1810.04805v1
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[10]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. https://www.aclweb.org/anthology/N19-1423 BERT : Pre-training of deep bidirectional transformers for language understanding . In Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (L...
work page 2019
-
[11]
Christiane Fellbaum. 1998. In WordNet : an electronic lexical database. MIT Press
work page 1998
-
[12]
Ignacio Iacobacci, Mohammad Taher Pilehvar, and Roberto Navigli. 2016. https://doi.org/10.18653/v1/P16-1085 Embeddings for word sense disambiguation: An evaluation study . In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 897--907, Berlin, Germany. Association for Computational Linguistics
-
[13]
Minh Le, Marten Postma, Jacopo Urbani, and Piek Vossen. 2018. https://www.aclweb.org/anthology/C18-1030 A deep dive into word sense disambiguation with LSTM . In Proceedings of the 27th International Conference on Computational Linguistics, pages 354--365, Santa Fe, New Mexico, USA. Association for Computational Linguistics
work page 2018
- [14]
-
[15]
Michael Lesk. 1986. https://doi.org/10.1145/318723.318728 Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from an ice cream cone . In Proceedings of the 5th Annual International Conference on Systems Documentation, SIGDOC '86, pages 24--26, New York, NY, USA. ACM
-
[16]
Daniel Loureiro and Al \' pio M \'a rio Jorge. 2019. Liaad at semdeep-5 challenge: Word-in-context (wic). In SemDeep-5@IJCAI 2019, page forthcoming
work page 2019
-
[17]
Fuli Luo, Tianyu Liu, Zexue He, Qiaolin Xia, Zhifang Sui, and Baobao Chang. 2018 a . https://www.aclweb.org/anthology/D18-1170 Leveraging gloss knowledge in neural word sense disambiguation by hierarchical co-attention . In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1402--1411, Brussels, Belgium. Associat...
work page 2018
-
[18]
Fuli Luo, Tianyu Liu, Qiaolin Xia, Baobao Chang, and Zhifang Sui. 2018 b . https://www.aclweb.org/anthology/P18-1230 Incorporating glosses into neural word sense disambiguation . In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2473--2482, Melbourne, Australia. Association for Comput...
work page 2018
-
[19]
Oren Melamud, Jacob Goldberger, and Ido Dagan. 2016. https://doi.org/10.18653/v1/K16-1006 context2vec: Learning generic context embedding with bidirectional LSTM . In Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning , pages 51--61, Berlin, Germany. Association for Computational Linguistics
-
[20]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. http://dl.acm.org/citation.cfm?id=2999792.2999959 Distributed representations of words and phrases and their compositionality . In Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2, NIPS'13, pages 3111--3119, USA. Curran Associates Inc
-
[21]
Miller, Martin Chodorow, Shari Landes, Claudia Leacock, and Robert G
George A. Miller, Martin Chodorow, Shari Landes, Claudia Leacock, and Robert G. Thomas. 1994. https://www.aclweb.org/anthology/H94-1046 Using a semantic concordance for sense identification . In HUMAN LANGUAGE TECHNOLOGY : Proceedings of a Workshop held at Plainsboro, New Jersey, March 8-11, 1994
work page 1994
-
[22]
Roberto Navigli. 2009. https://doi.org/10.1145/1459352.1459355 Word sense disambiguation: A survey . ACM Computing Surveys, 41(2):10:1--10:69
-
[23]
Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. https://doi.org/10.18653/v1/N18-1202 Deep contextualized word representations . In Proceedings of the 2018 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Lon...
-
[24]
Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. https://blog.openai.com/language-unsupervised/ Improving language understanding by generative pre-training
work page 2018
-
[25]
Alessandro Raganato, Jose Camacho-Collados, and Roberto Navigli. 2017 a . https://www.aclweb.org/anthology/E17-1010 Word sense disambiguation: A unified evaluation framework and empirical comparison . In Proceedings of the 15th Conference of the E uropean Chapter of the Association for Computational Linguistics: Volume 1, Long Papers , pages 99--110, Vale...
work page 2017
-
[26]
Alessandro Raganato, Claudio Delli Bovi, and Roberto Navigli. 2017 b . https://doi.org/10.18653/v1/D17-1120 Neural sequence learning models for word sense disambiguation . In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1156--1167, Copenhagen, Denmark. Association for Computational Linguistics
-
[27]
Philip Resnik. 1997. https://www.aclweb.org/anthology/W97-0209 Selectional preference and sense disambiguation . In Tagging Text with Lexical Semantics: Why, What, and How?
work page 1997
-
[28]
Sascha Rothe and Hinrich Sch \"u tze. 2015. https://doi.org/10.3115/v1/P15-1173 A uto E xtend: Extending word embeddings to embeddings for synsets and lexemes . In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages ...
-
[29]
Lo \" c Vial, Benjamin Lecouteux, and Didier Schwab. 2018. http://arxiv.org/abs/1811.00960 Improving the coverage and the generalization ability of neural word sense disambiguation through hypernymy and hyponymy relationships . CoRR, abs/1811.00960
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[30]
Dayu Yuan, Julian Richardson, Ryan Doherty, Colin Evans, and Eric Altendorf. 2016. https://www.aclweb.org/anthology/C16-1130 Semi-supervised word sense disambiguation with neural models . In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers , pages 1374--1385, Osaka, Japan. The COLING 2016 Organiz...
work page 2016
-
[31]
Zhi Zhong and Hwee Tou Ng. 2010. https://www.aclweb.org/anthology/P10-4014 It makes sense: A wide-coverage word sense disambiguation system for free text . In Proceedings of the ACL 2010 System Demonstrations , pages 78--83, Uppsala, Sweden. Association for Computational Linguistics
work page 2010
-
[32]
ENTRY address author booktitle chapter edition editor howpublished institution journal key month note number organization pages publisher school series title type volume year eprint doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRINGS urlintro eprinturl eprintpr...
-
[33]
" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.