Hadamard Product for Low-rank Bilinear Pooling

Byoung-Tak Zhang; Jeonghee Kim; Jin-Hwa Kim; Jung-Woo Ha; Kyoung-Woon On; Woosang Lim

arxiv: 1610.04325 · v4 · pith:BWZOZQPTnew · submitted 2016-10-14 · 💻 cs.CV · cs.AI· cs.NE

Hadamard Product for Low-rank Bilinear Pooling

Jin-Hwa Kim , Kyoung-Woon On , Woosang Lim , Jeonghee Kim , Jung-Woo Ha , Byoung-Tak Zhang This is my paper

classification 💻 cs.CV cs.AIcs.NE

keywords bilinearpoolingrepresentationstasksvisualhadamardlow-rankmodels

0 comments

read the original abstract

Bilinear models provide rich representations compared with linear models. They have been applied in various visual tasks, such as object recognition, segmentation, and visual question-answering, to get state-of-the-art performances taking advantage of the expanded representations. However, bilinear representations tend to be high-dimensional, limiting the applicability to computationally complex tasks. We propose low-rank bilinear pooling using Hadamard product for an efficient attention mechanism of multimodal learning. We show that our model outperforms compact bilinear pooling in visual question-answering tasks with the state-of-the-art results on the VQA dataset, having a better parsimonious property.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Demystifying CLIP Data
cs.CV 2023-09 accept novelty 6.0

MetaCLIP curates balanced 400M-pair subsets from CommonCrawl that outperform CLIP data, reaching 70.8% zero-shot ImageNet accuracy on ViT-B versus CLIP's 68.3%.
Learning Invariant Modality Representation for Robust Multimodal Learning from a Causal Inference Perspective
cs.LG 2026-04 unverdicted novelty 5.0

CmIR uses causal inference to separate invariant causal representations from spurious ones in multimodal data, improving generalization under distribution shifts and noise via invariance, mutual information, and recon...