LIAAD at SemDeep-5 Challenge: Word-in-Context (WiC)

Alipio Jorge; Daniel Loureiro

arxiv: 1906.10002 · v1 · pith:XK3LHDAWnew · submitted 2019-06-24 · 💻 cs.CL · cs.AI

LIAAD at SemDeep-5 Challenge: Word-in-Context (WiC)

Daniel Loureiro , Alipio Jorge This is my paper

Pith reviewed 2026-05-25 17:28 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords word sense disambiguationword-in-contextcontextual embeddingssense embeddingsSemDeep challengenatural language processing

0 comments

The pith

A word sense disambiguation system using contextual embeddings adapts directly to word-in-context detection and reaches competitive results without task training data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a system that placed second in the SemDeep-5 Word-in-Context challenge. It starts from a word sense disambiguation method that produces sense embeddings from contextual embeddings across a full inventory of senses. This system is then adapted without modification to decide whether a target word carries the same sense in two given sentences. The resulting approach matches competitive performance levels even when the challenge's training and development sets are ignored entirely.

Core claim

Our solution is based on a novel system for Word Sense Disambiguation using contextual embeddings and full-inventory sense embeddings. We adapt this WSD system, in a straightforward manner, for the present task of detecting whether the same sense occurs in a pair of sentences. Additionally, we show that our solution is able to achieve competitive performance even without using the provided training or development sets, mitigating potential concerns related to task overfitting.

What carries the argument

The novel WSD system based on contextual embeddings and full-inventory sense embeddings, adapted to decide whether a target word shares the same sense across a sentence pair.

If this is right

The WSD system can be repurposed for the WiC task without any task-specific training or fine-tuning.
Performance on the challenge remains competitive while avoiding reliance on the supplied training and development sets.
Concerns about overfitting to the particular WiC dataset are reduced because the core components are drawn from general WSD resources.
Sense distinctions captured by the embeddings transfer to the binary same-sense decision required by WiC.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same embedding-based sense representations could be tested on other binary or multi-way sense comparison tasks without new labeled data.
If the approach generalizes, it would reduce the need for large task-specific annotated sets in semantic evaluation benchmarks.
Direct comparison of sense embeddings from different sentences offers a parameter-light alternative to models trained end-to-end on WiC.

Load-bearing premise

The novel WSD system based on contextual embeddings and full-inventory sense embeddings can be adapted in a straightforward manner to detect whether the same sense occurs in a pair of sentences.

What would settle it

Running the same adapted system on the official WiC test set and finding that its accuracy falls substantially below the top entries that do use the provided training data.

Figures

Figures reproduced from arXiv: 1906.10002 by Alipio Jorge, Daniel Loureiro.

**Figure 1.** Figure 1: Illustration of our k-NN approach for WSD, which relies on full-coverage sense embeddings represented in the same space as contextualized embeddings. 2.3 Binary Classification The WiC task calls for a binary judgement on whether the meaning of a target word occurring in a pair of sentences is the same or not. As such, our most immediate solution is to perform WSD and base our decision on the resulting sen… view at source ↗

**Figure 2.** Figure 2: Components and interactions involved in our approaches. The sim [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Distribution of Prediction Probabilities [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: ROC curve for results of our best model on [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

read the original abstract

This paper describes the LIAAD system that was ranked second place in the Word-in-Context challenge (WiC) featured in SemDeep-5. Our solution is based on a novel system for Word Sense Disambiguation (WSD) using contextual embeddings and full-inventory sense embeddings. We adapt this WSD system, in a straightforward manner, for the present task of detecting whether the same sense occurs in a pair of sentences. Additionally, we show that our solution is able to achieve competitive performance even without using the provided training or development sets, mitigating potential concerns related to task overfitting

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A competition report on a zero-shot WSD adaptation that placed second in the WiC task without using the provided training data.

read the letter

The main takeaway is that this system report describes a second-place entry in the WiC shared task by taking an existing WSD pipeline based on contextual embeddings and full-inventory sense embeddings and applying it directly to sentence pairs. The zero-shot angle is the clearest point of interest: they report competitive results while skipping the task's train and dev sets entirely, which directly speaks to overfitting concerns in shared tasks. That part of the work is straightforward and useful as a data point. The paper does a clean job of documenting their submission and the ranking it achieved. It treats the adaptation as a direct mapping from sense assignment to same-sense detection, without extra task-specific layers, and the stress-test note confirms no obvious internal contradictions or leakage in that construction. What is new is limited to the specific application and the empirical outcome on this benchmark. The underlying WSD technique is described as novel in the abstract, but the text gives no equations or side-by-side distinctions from prior embedding-based WSD, so the novelty sits mostly in the competition result rather than a methodological advance. Soft spots are minor and expected for this genre. There is no error analysis, no ablation on the adaptation step, and no quantitative breakdown beyond the ranking claim. As a short system paper, that is proportionate; it does not pretend to be a general method paper. The citation pattern is thin because the work is recent and focused on one task. This paper is for readers who follow SemEval-style shared tasks or need practical examples of WSD transfer. It will not reorganize the field, but it supplies a verifiable entry that others can build on or compare against. A serious editor should send it to peer review for the workshop track rather than desk-reject it; the result is concrete and the zero-shot claim is falsifiable through the competition data.

Referee Report

1 major / 0 minor

Summary. The paper describes the LIAAD system that ranked second in the SemDeep-5 Word-in-Context (WiC) challenge. The approach adapts a novel Word Sense Disambiguation (WSD) system based on contextual embeddings and full-inventory sense embeddings to detect whether the same sense occurs in a pair of sentences. The authors emphasize that competitive performance is achieved without using the provided training or development sets.

Significance. If the performance claim holds, the work is significant for showing that a pre-trained WSD pipeline can be directly adapted to WiC in a zero-shot manner. This provides a concrete example of mitigating task overfitting concerns in lexical semantics and demonstrates the practical utility of full-inventory sense embeddings.

major comments (1)

Abstract: The abstract supplies no quantitative results, error analysis, or derivation; the central performance claim cannot be verified from the given text.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive evaluation of our work's significance in demonstrating zero-shot adaptation of a WSD system to the WiC task. We address the single major comment below.

read point-by-point responses

Referee: Abstract: The abstract supplies no quantitative results, error analysis, or derivation; the central performance claim cannot be verified from the given text.

Authors: We agree that the abstract would be strengthened by including key quantitative results to allow verification of the performance claim. The submitted abstract emphasized the zero-shot nature of the approach but omitted specific metrics. In the revised version, we will add the ranking (second place in the SemDeep-5 WiC challenge) and the corresponding test-set accuracy. Error analysis and derivations are presented in the body of the paper, consistent with typical abstract length constraints. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The manuscript is a competition system report describing a zero-shot adaptation of a pre-existing WSD pipeline to the WiC task. No equations, fitted parameters, or derivation chain appear in the provided text. The central performance claim rests on empirical submission results rather than any self-referential mapping, uniqueness theorem, or renamed empirical pattern. The adaptation is presented as direct and task-independent, with no load-bearing self-citation or construction that reduces the reported outcome to its own inputs by definition.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no free parameters, axioms, or invented entities; ledger left empty.

pith-pipeline@v0.9.0 · 5622 in / 912 out tokens · 27530 ms · 2026-05-25T17:28:58.691843+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages · 1 internal anchor

[1]

Alan Ansell, Felipe Bravo-Marquez, and Bernhard Pfahringer. 2019. An elmo-inspired approach to semdeep-5's word-in-context task. In SemDeep-5@IJCAI 2019, page forthcoming

work page 2019
[2]

Sanjeev Arora, Yingyu Liang, and Tengyu Ma. 2017. https://openreview.net/forum?id=SyK00v5xx A simple but tough-to-beat baseline for sentence embeddings . In International Conference on Learning Representations (ICLR)

work page 2017
[3]

Jose Camacho-Collados and Mohammad Taher Pilehvar. 2018. https://doi.org/10.1613/jair.1.11259 From word to sense embeddings: A survey on vector representations of meaning . J. Artif. Int. Res., 63(1):743--788

work page doi:10.1613/jair.1.11259 2018
[4]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. https://www.aclweb.org/anthology/N19-1423 BERT : Pre-training of deep bidirectional transformers for language understanding . In Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (L...

work page 2019
[5]

Christiane Fellbaum. 1998. In WordNet : an electronic lexical database. MIT Press

work page 1998
[6]

Minh Le, Marten Postma, Jacopo Urbani, and Piek Vossen. 2018. https://www.aclweb.org/anthology/C18-1030 A deep dive into word sense disambiguation with LSTM . In Proceedings of the 27th International Conference on Computational Linguistics, pages 354--365, Santa Fe, New Mexico, USA. Association for Computational Linguistics

work page 2018
[7]

Michael Lesk. 1986. https://doi.org/10.1145/318723.318728 Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from an ice cream cone . In Proceedings of the 5th Annual International Conference on Systems Documentation, SIGDOC '86, pages 24--26, New York, NY, USA. ACM

work page doi:10.1145/318723.318728 1986
[8]

Daniel Loureiro and Al \' pio Jorge. 2019. Language modelling makes sense: Propagating representations through wordnet for full-coverage word sense disambiguation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, page forthcoming, Florence, Italy. Association for Computational Linguistics

work page 2019
[9]

Oren Melamud, Jacob Goldberger, and Ido Dagan. 2016. https://doi.org/10.18653/v1/K16-1006 context2vec: Learning generic context embedding with bidirectional LSTM . In Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning , pages 51--61, Berlin, Germany. Association for Computational Linguistics

work page doi:10.18653/v1/k16-1006 2016
[10]

Miller, Martin Chodorow, Shari Landes, Claudia Leacock, and Robert G

George A. Miller, Martin Chodorow, Shari Landes, Claudia Leacock, and Robert G. Thomas. 1994. https://www.aclweb.org/anthology/H94-1046 Using a semantic concordance for sense identification . In HUMAN LANGUAGE TECHNOLOGY : Proceedings of a Workshop held at Plainsboro, New Jersey, March 8-11, 1994

work page 1994
[11]

Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. https://doi.org/10.18653/v1/N18-1202 Deep contextualized word representations . In Proceedings of the 2018 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Lon...

work page doi:10.18653/v1/n18-1202 2018
[12]

Mohammad Taher Pilehvar and Jose Camacho-Collados. 2019. Wic: the word-in-context dataset for evaluating context-sensitive meaning representations. In Proceedings of NAACL, Minneapolis, United States

work page 2019
[13]

Aina Gar \' Soler, Marianna Apidianaki, and Alexandre Allauzen. 2019. Limsi-multisem at the ijcai semdeep-5 wic challenge: Context representations for word usage similarity estimation. In SemDeep-5@IJCAI 2019, page forthcoming

work page 2019
[14]

Alex Wang, Yada Pruksachatkun, Nikita Nangia, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. 2019. http://arxiv.org/abs/1905.00537 Superglue: A stickier benchmark for general-purpose language understanding systems . CoRR, abs/1905.00537

work page internal anchor Pith review Pith/arXiv arXiv 2019
[15]

URL: " 'urlintro :=

ENTRY address author booktitle chapter edition editor howpublished institution journal key month note number organization pages publisher school series title type volume year eprint doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRINGS urlintro eprinturl eprintpr...

work page
[16]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page

[1] [1]

Alan Ansell, Felipe Bravo-Marquez, and Bernhard Pfahringer. 2019. An elmo-inspired approach to semdeep-5's word-in-context task. In SemDeep-5@IJCAI 2019, page forthcoming

work page 2019

[2] [2]

Sanjeev Arora, Yingyu Liang, and Tengyu Ma. 2017. https://openreview.net/forum?id=SyK00v5xx A simple but tough-to-beat baseline for sentence embeddings . In International Conference on Learning Representations (ICLR)

work page 2017

[3] [3]

Jose Camacho-Collados and Mohammad Taher Pilehvar. 2018. https://doi.org/10.1613/jair.1.11259 From word to sense embeddings: A survey on vector representations of meaning . J. Artif. Int. Res., 63(1):743--788

work page doi:10.1613/jair.1.11259 2018

[4] [4]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. https://www.aclweb.org/anthology/N19-1423 BERT : Pre-training of deep bidirectional transformers for language understanding . In Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (L...

work page 2019

[5] [5]

Christiane Fellbaum. 1998. In WordNet : an electronic lexical database. MIT Press

work page 1998

[6] [6]

Minh Le, Marten Postma, Jacopo Urbani, and Piek Vossen. 2018. https://www.aclweb.org/anthology/C18-1030 A deep dive into word sense disambiguation with LSTM . In Proceedings of the 27th International Conference on Computational Linguistics, pages 354--365, Santa Fe, New Mexico, USA. Association for Computational Linguistics

work page 2018

[7] [7]

Michael Lesk. 1986. https://doi.org/10.1145/318723.318728 Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from an ice cream cone . In Proceedings of the 5th Annual International Conference on Systems Documentation, SIGDOC '86, pages 24--26, New York, NY, USA. ACM

work page doi:10.1145/318723.318728 1986

[8] [8]

Daniel Loureiro and Al \' pio Jorge. 2019. Language modelling makes sense: Propagating representations through wordnet for full-coverage word sense disambiguation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, page forthcoming, Florence, Italy. Association for Computational Linguistics

work page 2019

[9] [9]

Oren Melamud, Jacob Goldberger, and Ido Dagan. 2016. https://doi.org/10.18653/v1/K16-1006 context2vec: Learning generic context embedding with bidirectional LSTM . In Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning , pages 51--61, Berlin, Germany. Association for Computational Linguistics

work page doi:10.18653/v1/k16-1006 2016

[10] [10]

Miller, Martin Chodorow, Shari Landes, Claudia Leacock, and Robert G

George A. Miller, Martin Chodorow, Shari Landes, Claudia Leacock, and Robert G. Thomas. 1994. https://www.aclweb.org/anthology/H94-1046 Using a semantic concordance for sense identification . In HUMAN LANGUAGE TECHNOLOGY : Proceedings of a Workshop held at Plainsboro, New Jersey, March 8-11, 1994

work page 1994

[11] [11]

Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. https://doi.org/10.18653/v1/N18-1202 Deep contextualized word representations . In Proceedings of the 2018 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Lon...

work page doi:10.18653/v1/n18-1202 2018

[12] [12]

Mohammad Taher Pilehvar and Jose Camacho-Collados. 2019. Wic: the word-in-context dataset for evaluating context-sensitive meaning representations. In Proceedings of NAACL, Minneapolis, United States

work page 2019

[13] [13]

Aina Gar \' Soler, Marianna Apidianaki, and Alexandre Allauzen. 2019. Limsi-multisem at the ijcai semdeep-5 wic challenge: Context representations for word usage similarity estimation. In SemDeep-5@IJCAI 2019, page forthcoming

work page 2019

[14] [14]

Alex Wang, Yada Pruksachatkun, Nikita Nangia, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. 2019. http://arxiv.org/abs/1905.00537 Superglue: A stickier benchmark for general-purpose language understanding systems . CoRR, abs/1905.00537

work page internal anchor Pith review Pith/arXiv arXiv 2019

[15] [15]

URL: " 'urlintro :=

ENTRY address author booktitle chapter edition editor howpublished institution journal key month note number organization pages publisher school series title type volume year eprint doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRINGS urlintro eprinturl eprintpr...

work page

[16] [16]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page