pith. machine review for the scientific record. sign in

arxiv: 1603.08079 · v1 · submitted 2016-03-26 · 💻 cs.CV · cs.AI· cs.CL

Recognition: unknown

Do You See What I Mean? Visual Resolution of Linguistic Ambiguities

Authors on Pith no claims yet
classification 💻 cs.CV cs.AIcs.CL
keywords sentencedifferentinterpretationsambiguitieshandlanguagemodelsentences
0
0 comments X
read the original abstract

Understanding language goes hand in hand with the ability to integrate complex contextual information obtained via perception. In this work, we present a novel task for grounded language understanding: disambiguating a sentence given a visual scene which depicts one of the possible interpretations of that sentence. To this end, we introduce a new multimodal corpus containing ambiguous sentences, representing a wide range of syntactic, semantic and discourse ambiguities, coupled with videos that visualize the different interpretations for each sentence. We address this task by extending a vision model which determines if a sentence is depicted by a video. We demonstrate how such a model can be adjusted to recognize different interpretations of the same underlying sentence, allowing to disambiguate sentences in a unified fashion across the different ambiguity types.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. A Multimodal Dataset for Visually Grounded Ambiguity in Machine Translation

    cs.CL 2026-05 unverdicted novelty 6.0

    VIDA provides 2,500 visually-dependent ambiguous MT instances and LLM-judge metrics; chain-of-thought SFT improves disambiguation accuracy over standard SFT, especially out-of-distribution.