pith. sign in

arxiv: 1906.10007 · v1 · pith:ZAAS6I2Znew · submitted 2019-06-24 · 💻 cs.CL

Language Modelling Makes Sense: Propagating Representations through WordNet for Full-Coverage Word Sense Disambiguation

Pith reviewed 2026-05-25 17:25 UTC · model grok-4.3

classification 💻 cs.CL
keywords word sense disambiguationcontextual embeddingsWordNetneural language modelsnearest neighborssense embeddingspolysemy
0
0 comments X

The pith

Propagating contextual embeddings through WordNet produces sense-level vectors that let a simple nearest-neighbor method outperform neural sequence models on word sense disambiguation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper demonstrates that contextual embeddings learned by neural language models can be turned into sense-specific representations covering every entry in WordNet. Propagation along WordNet relations achieves this coverage without using sense-frequency statistics or any task-specific training. Once obtained, these sense vectors allow a basic k-NN classifier to exceed the accuracy of earlier systems built on powerful neural sequencing architectures. The approach also supports direct examination of how contextual embeddings encode conceptual distinctions at the sense level.

Core claim

Contextual embeddings from neural language models can be propagated through WordNet relations to produce sense-level embeddings with full coverage of the sense inventory. These embeddings require no explicit knowledge of sense distributions and no task-specific modelling. As a result a simple k-NN method using them consistently surpasses the performance of previous systems that employ powerful neural sequencing models.

What carries the argument

Propagation of contextual embeddings through WordNet relations to generate sense-level vectors

If this is right

  • A k-NN classifier on the sense embeddings outperforms previous neural WSD systems.
  • The method remains effective when part-of-speech and lemma features are ignored.
  • Full-inventory disambiguation is possible without recourse to sense-frequency data.
  • The resulting sense embeddings enable concept-level analyses of contextual embeddings and their source language models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The propagation technique could be applied to other lexical knowledge bases to test whether similar sense coverage emerges.
  • Comparing the sense vectors against human sense similarity judgments would reveal how faithfully the propagation preserves semantic distance.
  • Downstream tasks requiring fine-grained meaning, such as semantic role labeling, could benefit from substituting these vectors for raw contextual embeddings.

Load-bearing premise

Propagating contextual embeddings through WordNet relations produces accurate sense-level vectors that preserve the distinctions needed for disambiguation without any sense-frequency information or task-specific training.

What would settle it

A nearest-neighbor classifier using the propagated sense embeddings failing to exceed the accuracy of prior neural WSD systems on standard benchmarks such as SemEval would falsify the central claim.

Figures

Figures reproduced from arXiv: 1906.10007 by Alipio Jorge, Daniel Loureiro.

Figure 1
Figure 1. Figure 1: Illustration of our k-NN approach for WSD, which relies on full-coverage sense embeddings repre￾sented in the same space as contextualized embeddings. For simplification, we label senses as synsets. Grey nodes belong to different lemmas (see §5.3). Our WSD approach is strictly based on k-NN (see [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Performance gains with LMMS2348 when ac￾cepting additional neighbors as valid predictions. 5.2 Part-of-Speech Mismatches The solution we introduced in §4.4 addressed missing lemmas, but we didn’t propose a solution that addressed missing POS information. Indeed, the confusion matrix in [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Examples of gender bias found in the sense [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
read the original abstract

Contextual embeddings represent a new generation of semantic representations learned from Neural Language Modelling (NLM) that addresses the issue of meaning conflation hampering traditional word embeddings. In this work, we show that contextual embeddings can be used to achieve unprecedented gains in Word Sense Disambiguation (WSD) tasks. Our approach focuses on creating sense-level embeddings with full-coverage of WordNet, and without recourse to explicit knowledge of sense distributions or task-specific modelling. As a result, a simple Nearest Neighbors (k-NN) method using our representations is able to consistently surpass the performance of previous systems using powerful neural sequencing models. We also analyse the robustness of our approach when ignoring part-of-speech and lemma features, requiring disambiguation against the full sense inventory, and revealing shortcomings to be improved. Finally, we explore applications of our sense embeddings for concept-level analyses of contextual embeddings and their respective NLMs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper claims that contextual embeddings from neural language models can be propagated through WordNet relations to produce full-coverage sense-level vectors. These vectors enable a simple, parameter-free k-NN classifier to outperform prior neural WSD systems on standard benchmarks without using sense-frequency information or task-specific training. The work also includes robustness analyses (ignoring POS/lemma features, full-inventory disambiguation) and applications to concept-level analysis of NLMs.

Significance. If the empirical results hold, the work is significant because it shows that sense distinctions can be recovered from existing contextual embeddings and a static lexical resource in a fully unsupervised manner, yielding a simpler and stronger baseline than complex sequence models. The parameter-free nature and full WordNet coverage are notable strengths; the approach also supplies a tool for inspecting what NLMs have learned at the concept level.

minor comments (3)
  1. §3 (method): the precise propagation procedure (which relations, number of hops, aggregation function, handling of cycles) should be stated with pseudocode or a small worked example so that the construction of the sense vectors is fully reproducible from the text alone.
  2. Table 2 / §4.2: report the number of senses per lemma in the evaluation sets and confirm that the k-NN lookup is performed over the entire WordNet inventory rather than a reduced candidate set; this directly affects the strength of the 'full-coverage' claim.
  3. §5 (analysis): the robustness experiments that drop POS and lemma features are valuable, but the paper should also report the corresponding drop in the strongest neural baselines for direct comparison.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, significance assessment, and recommendation of minor revision. No major comments were provided in the report.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The derivation relies on external pre-trained contextual embeddings (from NLMs) and the independent WordNet graph for propagation to produce sense vectors, followed by a parameter-free k-NN. No equation or step reduces by construction to a fitted input, self-definition, or self-citation chain; the performance claim is tested against external WSD benchmarks without sense-frequency data or task-specific training. The approach is self-contained against those benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review performed on abstract only; no explicit free parameters, axioms, or invented entities are described.

axioms (1)
  • domain assumption Contextual embeddings from NLMs separate word senses according to local context.
    Central premise for using NLM embeddings as the starting point for sense propagation.

pith-pipeline@v0.9.0 · 5683 in / 1041 out tokens · 23405 ms · 2026-05-25T17:25:43.088811+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages · 2 internal anchors

  1. [1]

    Sanjeev Arora, Yingyu Liang, and Tengyu Ma. 2017. https://openreview.net/forum?id=SyK00v5xx A simple but tough-to-beat baseline for sentence embeddings . In International Conference on Learning Representations (ICLR)

  2. [2]

    Pierpaolo Basile, Annalina Caputo, and Giovanni Semeraro. 2014. https://www.aclweb.org/anthology/C14-1151 An enhanced L esk word sense disambiguation algorithm through a distributional semantic model . In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers , pages 1591--1600, Dublin, Ireland. Dublin...

  3. [3]

    Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. https://doi.org/10.1162/tacl_a_00051 Enriching word vectors with subword information . Transactions of the Association for Computational Linguistics, 5:135--146

  4. [4]

    Tolga Bolukbasi, Kai-Wei Chang, James Zou, Venkatesh Saligrama, and Adam Kalai. 2016. http://dl.acm.org/citation.cfm?id=3157382.3157584 Man is to computer programmer as woman is to homemaker? debiasing word embeddings . In Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS'16, pages 4356--4364, USA. Curran Asso...

  5. [5]

    Bryson, and Arvind Narayanan

    Aylin Caliskan, Joanna J. Bryson, and Arvind Narayanan. 2017. https://doi.org/10.1126/science.aal4230 Semantics derived automatically from language corpora contain human-like biases . Science, 356(6334):183--186

  6. [6]

    Jose Camacho-Collados and Mohammad Taher Pilehvar. 2018. https://doi.org/10.1613/jair.1.11259 From word to sense embeddings: A survey on vector representations of meaning . J. Artif. Int. Res., 63(1):743--788

  7. [7]

    Jose Camacho-Collados, Mohammad Taher Pilehvar, and Roberto Navigli. 2016. https://doi.org/https://doi.org/10.1016/j.artint.2016.07.005 Nasari: Integrating explicit knowledge and corpus statistics for a multilingual representation of concepts and entities . Artificial Intelligence, 240:36 -- 64

  8. [8]

    Xinxiong Chen, Zhiyuan Liu, and Maosong Sun. 2014. https://doi.org/10.3115/v1/D14-1110 A unified model for word sense representation and disambiguation . In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing ( EMNLP ) , pages 1025--1035, Doha, Qatar. Association for Computational Linguistics

  9. [9]

    Jacob Devlin, Ming - Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. http://arxiv.org/abs/1810.04805v1 BERT: pre-training of deep bidirectional transformers for language understanding . CoRR, abs/1810.04805v1

  10. [10]

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. https://www.aclweb.org/anthology/N19-1423 BERT : Pre-training of deep bidirectional transformers for language understanding . In Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (L...

  11. [11]

    Christiane Fellbaum. 1998. In WordNet : an electronic lexical database. MIT Press

  12. [12]

    Ignacio Iacobacci, Mohammad Taher Pilehvar, and Roberto Navigli. 2016. https://doi.org/10.18653/v1/P16-1085 Embeddings for word sense disambiguation: An evaluation study . In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 897--907, Berlin, Germany. Association for Computational Linguistics

  13. [13]

    Minh Le, Marten Postma, Jacopo Urbani, and Piek Vossen. 2018. https://www.aclweb.org/anthology/C18-1030 A deep dive into word sense disambiguation with LSTM . In Proceedings of the 27th International Conference on Computational Linguistics, pages 354--365, Santa Fe, New Mexico, USA. Association for Computational Linguistics

  14. [14]

    Doug Lenat, Mayank Prakash, and Mary Shepherd. 1986. http://dl.acm.org/citation.cfm?id=13432.13435 Cyc: Using common sense knowledge to overcome brittleness and knowledge acquistion bottlenecks . AI Mag., 6(4):65--85

  15. [15]

    Michael Lesk. 1986. https://doi.org/10.1145/318723.318728 Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from an ice cream cone . In Proceedings of the 5th Annual International Conference on Systems Documentation, SIGDOC '86, pages 24--26, New York, NY, USA. ACM

  16. [16]

    Daniel Loureiro and Al \' pio M \'a rio Jorge. 2019. Liaad at semdeep-5 challenge: Word-in-context (wic). In SemDeep-5@IJCAI 2019, page forthcoming

  17. [17]

    Fuli Luo, Tianyu Liu, Zexue He, Qiaolin Xia, Zhifang Sui, and Baobao Chang. 2018 a . https://www.aclweb.org/anthology/D18-1170 Leveraging gloss knowledge in neural word sense disambiguation by hierarchical co-attention . In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1402--1411, Brussels, Belgium. Associat...

  18. [18]

    Fuli Luo, Tianyu Liu, Qiaolin Xia, Baobao Chang, and Zhifang Sui. 2018 b . https://www.aclweb.org/anthology/P18-1230 Incorporating glosses into neural word sense disambiguation . In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2473--2482, Melbourne, Australia. Association for Comput...

  19. [19]

    Oren Melamud, Jacob Goldberger, and Ido Dagan. 2016. https://doi.org/10.18653/v1/K16-1006 context2vec: Learning generic context embedding with bidirectional LSTM . In Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning , pages 51--61, Berlin, Germany. Association for Computational Linguistics

  20. [20]

    Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. http://dl.acm.org/citation.cfm?id=2999792.2999959 Distributed representations of words and phrases and their compositionality . In Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2, NIPS'13, pages 3111--3119, USA. Curran Associates Inc

  21. [21]

    Miller, Martin Chodorow, Shari Landes, Claudia Leacock, and Robert G

    George A. Miller, Martin Chodorow, Shari Landes, Claudia Leacock, and Robert G. Thomas. 1994. https://www.aclweb.org/anthology/H94-1046 Using a semantic concordance for sense identification . In HUMAN LANGUAGE TECHNOLOGY : Proceedings of a Workshop held at Plainsboro, New Jersey, March 8-11, 1994

  22. [22]

    Roberto Navigli. 2009. https://doi.org/10.1145/1459352.1459355 Word sense disambiguation: A survey . ACM Computing Surveys, 41(2):10:1--10:69

  23. [23]

    Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. https://doi.org/10.18653/v1/N18-1202 Deep contextualized word representations . In Proceedings of the 2018 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Lon...

  24. [24]

    Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. https://blog.openai.com/language-unsupervised/ Improving language understanding by generative pre-training

  25. [25]

    Alessandro Raganato, Jose Camacho-Collados, and Roberto Navigli. 2017 a . https://www.aclweb.org/anthology/E17-1010 Word sense disambiguation: A unified evaluation framework and empirical comparison . In Proceedings of the 15th Conference of the E uropean Chapter of the Association for Computational Linguistics: Volume 1, Long Papers , pages 99--110, Vale...

  26. [26]

    Alessandro Raganato, Claudio Delli Bovi, and Roberto Navigli. 2017 b . https://doi.org/10.18653/v1/D17-1120 Neural sequence learning models for word sense disambiguation . In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1156--1167, Copenhagen, Denmark. Association for Computational Linguistics

  27. [27]

    Philip Resnik. 1997. https://www.aclweb.org/anthology/W97-0209 Selectional preference and sense disambiguation . In Tagging Text with Lexical Semantics: Why, What, and How?

  28. [28]

    Sascha Rothe and Hinrich Sch \"u tze. 2015. https://doi.org/10.3115/v1/P15-1173 A uto E xtend: Extending word embeddings to embeddings for synsets and lexemes . In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages ...

  29. [29]

    Lo \" c Vial, Benjamin Lecouteux, and Didier Schwab. 2018. http://arxiv.org/abs/1811.00960 Improving the coverage and the generalization ability of neural word sense disambiguation through hypernymy and hyponymy relationships . CoRR, abs/1811.00960

  30. [30]

    Dayu Yuan, Julian Richardson, Ryan Doherty, Colin Evans, and Eric Altendorf. 2016. https://www.aclweb.org/anthology/C16-1130 Semi-supervised word sense disambiguation with neural models . In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers , pages 1374--1385, Osaka, Japan. The COLING 2016 Organiz...

  31. [31]

    Zhi Zhong and Hwee Tou Ng. 2010. https://www.aclweb.org/anthology/P10-4014 It makes sense: A wide-coverage word sense disambiguation system for free text . In Proceedings of the ACL 2010 System Demonstrations , pages 78--83, Uppsala, Sweden. Association for Computational Linguistics

  32. [32]

    URL: " 'urlintro :=

    ENTRY address author booktitle chapter edition editor howpublished institution journal key month note number organization pages publisher school series title type volume year eprint doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRINGS urlintro eprinturl eprintpr...

  33. [33]

    write newline

    " write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...