pith. sign in

arxiv: 2506.08354 · v2 · pith:OUT5XWFNnew · submitted 2025-06-10 · 💻 cs.CL · cs.AI· cs.IR

Position: Text Embeddings Should Capture Implicit Semantics, Not Just Surface Meaning

classification 💻 cs.CL cs.AIcs.IR
keywords implicitembeddingsmeaningsemanticsembeddingresearchshouldsurface
0
0 comments X
read the original abstract

This position paper argues that text embedding research should move beyond surface meaning and embrace implicit semantics as a central modeling objective. Text embeddings are a foundational component of modern NLP, underpinning a wide range of applications and driving sustained research progress. Despite rapid progress, most embedding models remain narrowly focused on surface-level semantics, whereas linguistic theory emphasizes that much of human meaning is implicit, shaped by pragmatics, speaker intent, and sociocultural context. Current models are typically trained on datasets that lack such depth and evaluated using benchmarks that reward surface similarity. As a result, they struggle with tasks that require interpretive reasoning, stance recognition, or socially grounded understanding. Our pilot study makes this limitation explicit, showing that even state-of-the-art embeddings achieve only marginal improvements over simple lexical baselines on tasks probing implicit semantics. We therefore call for a paradigm shift: embedding research should prioritize linguistically grounded and diverse training data, develop benchmarks that probe deeper semantic understanding, and treat implicit meaning as a core modeling objective to better align embeddings with real-world language complexity. The code is available at http://github.com/dukesun99/Implicit-Embeddings.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Very Efficient Listwise Multimodal Reranking for Long Documents

    cs.IR 2026-05 unverdicted novelty 7.0

    ZipRerank delivers state-of-the-art multimodal listwise reranking accuracy for long documents at up to 10x lower latency via early interaction and single-pass scoring.