{"paper":{"title":"LVLMs and Humans Ground Differently in Referential Communication","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Large vision-language models cannot interactively generate and resolve referring expressions to build common ground with humans or each other.","cross_cats":["cs.AI","cs.HC"],"primary_cat":"cs.CL","authors_text":"Amie J. Paige, Dimitris Samaras, Gregory Zelinsky, Owen Rambow, Panagiotis Kaliosis, Peter Zeng, Susan E. Brennan, Weiling Li, Zhengxiang Wang","submitted_at":"2026-01-27T16:52:20Z","abstract_excerpt":"For generative AI agents to partner effectively with human users, the ability to accurately predict human intent is critical. But this ability to collaborate remains limited by a critical deficit: an inability to model common ground. We present a referential communication experiment with a factorial design involving director-matcher pairs (human-human, human-AI, AI-human, and AI-AI) that interact with multiple turns in repeated rounds to match pictures of objects not associated with any obvious lexicalized labels. We show that LVLMs cannot interactively generate and resolve referring expressio"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"We show that LVLMs cannot interactively generate and resolve referring expressions in a way that enables smooth communication, a crucial skill that underlies human language use.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The assumption that performance differences in this specific non-lexicalized object matching task directly reflect broader deficits in LVLMs' ability to model common ground in general communication.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"LVLMs and humans build common ground differently in referential tasks, with LVLMs failing to enable smooth interactive communication in a new corpus of 356 dialogues.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Large vision-language models cannot interactively generate and resolve referring expressions to build common ground with humans or each other.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"17ce61024a9feb1fb7b7cce64016ad991f4a9a2a02f792c4c0981c6a7c26326a"},"source":{"id":"2601.19792","kind":"arxiv","version":5},"verdict":{"id":"4edcbbba-7fa2-4ed2-ae0b-3a784e34d70d","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-16T10:31:52.892510Z","strongest_claim":"We show that LVLMs cannot interactively generate and resolve referring expressions in a way that enables smooth communication, a crucial skill that underlies human language use.","one_line_summary":"LVLMs and humans build common ground differently in referential tasks, with LVLMs failing to enable smooth interactive communication in a new corpus of 356 dialogues.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The assumption that performance differences in this specific non-lexicalized object matching task directly reflect broader deficits in LVLMs' ability to model common ground in general communication.","pith_extraction_headline":"Large vision-language models cannot interactively generate and resolve referring expressions to build common ground with humans or each other."},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2601.19792/integrity.json","findings":[],"available":true,"detectors_run":[],"snapshot_sha256":"c28c3603d3b5d939e8dc4c7e95fa8dfce3d595e45f758748cecf8e644a296938"},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":2,"snapshot_sha256":"e7ed72065fbea210b8fac9bfa9a23e1d5e4bebab290188f1b193d280e2156e7b"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}