LLMs struggle to infer pragmatic meaning from non-verbal responses alone, showing accuracy drops of up to 60 percentage points versus verbal responses, though in-context learning improves results.
M ulti P rag E val: Multilingual Pragmatic Evaluation of Large Language Models
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CL 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
LLMs perform substantially better as pragmatic listeners judging language than as speakers generating it, revealing weak alignment between the two roles.
citing papers explorer
-
Unveiling the Limits of Large Language Models in Inferring Pragmatic Meaning from Non-Verbal Responses
LLMs struggle to infer pragmatic meaning from non-verbal responses alone, showing accuracy drops of up to 60 percentage points versus verbal responses, though in-context learning improves results.
-
How Hypocritical Is Your LLM judge? Listener-Speaker Asymmetries in the Pragmatic Competence of Large Language Models
LLMs perform substantially better as pragmatic listeners judging language than as speakers generating it, revealing weak alignment between the two roles.