Open-Ended Visual Question-Answering

Issey Masuda; Santiago Pascual de la Puente; Xavier Giro-i-Nieto

arxiv: 1610.02692 · v1 · pith:JCFARO3Gnew · submitted 2016-10-09 · 💻 cs.CL · cs.CV· cs.MM

Open-Ended Visual Question-Answering

Issey Masuda , Santiago Pascual de la Puente , Xavier Giro-i-Nieto This is my paper

classification 💻 cs.CL cs.CVcs.MM

keywords visualquestionquestion-answeringembeddingexploreimagenetworksaccept

0 comments

read the original abstract

This thesis report studies methods to solve Visual Question-Answering (VQA) tasks with a Deep Learning framework. As a preliminary step, we explore Long Short-Term Memory (LSTM) networks used in Natural Language Processing (NLP) to tackle Question-Answering (text based). We then modify the previous model to accept an image as an input in addition to the question. For this purpose, we explore the VGG-16 and K-CNN convolutional neural networks to extract visual features from the image. These are merged with the word embedding or with a sentence embedding of the question to predict the answer. This work was successfully submitted to the Visual Question Answering Challenge 2016, where it achieved a 53,62% of accuracy in the test dataset. The developed software has followed the best programming practices and Python code style, providing a consistent baseline in Keras for different configurations.

This paper has not been read by Pith yet.

Open-Ended Visual Question-Answering

discussion (0)