pith. sign in

arxiv: 1812.08407 · v1 · pith:BN6CRKTLnew · submitted 2018-12-20 · 💻 cs.CL

Context, Attention and Audio Feature Explorations for Audio Visual Scene-Aware Dialog

classification 💻 cs.CL
keywords dialogaudiosystemattentionexplorationsfeaturepartscene-aware
0
0 comments X p. Extension
pith:BN6CRKTL Add to your LaTeX paper What is a Pith Number?
\usepackage{pith}
\pithnumber{BN6CRKTL}

Prints a linked pith:BN6CRKTL badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

read the original abstract

With the recent advancements in AI, Intelligent Virtual Assistants (IVA) have become a ubiquitous part of every home. Going forward, we are witnessing a confluence of vision, speech and dialog system technologies that are enabling the IVAs to learn audio-visual groundings of utterances and have conversations with users about the objects, activities and events surrounding them. As a part of the 7th Dialog System Technology Challenges (DSTC7), for Audio Visual Scene-Aware Dialog (AVSD) track, We explore `topics' of the dialog as an important contextual feature into the architecture along with explorations around multimodal Attention. We also incorporate an end-to-end audio classification ConvNet, AclNet, into our models. We present detailed analysis of the experiments and show that some of our model variations outperform the baseline system presented for this task.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.