IdioLink introduces a benchmark dataset and evaluation showing that strong embedding models struggle to retrieve equivalent meanings across idiomatic and literal forms, relying on shallow cues instead.
S urvey: Multiword Expression Processing: A S urvey
4 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CL 4years
2026 4verdicts
UNVERDICTED 4representative citing papers
On a controlled Turkish dataset of 147 examples, few-shot prompting lets some LLMs match or beat a supervised BERT baseline for LVC detection, though results are highly sensitive to prompt design.
Systematic experiments show that text decomposition methods and privacy budget allocation strategies produce significantly different privacy-utility trade-offs even under comparable total epsilon budgets.
SemanticQA unifies prior multiword expression datasets into a benchmark that reveals substantial performance variation among language models on semantic reasoning tasks.
citing papers explorer
-
IdioLink: Retrieving Meaning Beyond Words Across Idiomatic and Literal Expressions
IdioLink introduces a benchmark dataset and evaluation showing that strong embedding models struggle to retrieve equivalent meanings across idiomatic and literal forms, relying on shallow cues instead.
-
Supervision versus Demonstration-Based In-Context Learning for Multiword Expression Classification
On a controlled Turkish dataset of 147 examples, few-shot prompting lets some LLMs match or beat a supervised BERT baseline for LVC detection, though results are highly sensitive to prompt design.
-
A Systematic Exploration of Text Decomposition and Budget Distribution in Differentially Private Text Obfuscation
Systematic experiments show that text decomposition methods and privacy budget allocation strategies produce significantly different privacy-utility trade-offs even under comparable total epsilon budgets.
-
Revisiting a Pain in the Neck: A Semantic Reasoning Benchmark for Language Models
SemanticQA unifies prior multiword expression datasets into a benchmark that reveals substantial performance variation among language models on semantic reasoning tasks.