pith. sign in

arxiv: 1204.2742 · v1 · pith:Z5YH2SRLnew · submitted 2012-04-12 · 💻 cs.CV · cs.AI

Video In Sentences Out

classification 💻 cs.CV cs.AI
keywords phrasesthoseeventmodifiersnounobjectsvideoaction
0
0 comments X
read the original abstract

We present a system that produces sentential descriptions of video: who did what to whom, and where and how they did it. Action class is rendered as a verb, participant objects as noun phrases, properties of those objects as adjectival modifiers in those noun phrases,spatial relations between those participants as prepositional phrases, and characteristics of the event as prepositional-phrase adjuncts and adverbial modifiers. Extracting the information needed to render these linguistic entities requires an approach to event recognition that recovers object tracks, the track-to-role assignments, and changing body posture.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.