pith. sign in

arxiv: 1610.04325 · v4 · pith:BWZOZQPTnew · submitted 2016-10-14 · 💻 cs.CV · cs.AI· cs.NE

Hadamard Product for Low-rank Bilinear Pooling

classification 💻 cs.CV cs.AIcs.NE
keywords bilinearpoolingrepresentationstasksvisualhadamardlow-rankmodels
0
0 comments X
read the original abstract

Bilinear models provide rich representations compared with linear models. They have been applied in various visual tasks, such as object recognition, segmentation, and visual question-answering, to get state-of-the-art performances taking advantage of the expanded representations. However, bilinear representations tend to be high-dimensional, limiting the applicability to computationally complex tasks. We propose low-rank bilinear pooling using Hadamard product for an efficient attention mechanism of multimodal learning. We show that our model outperforms compact bilinear pooling in visual question-answering tasks with the state-of-the-art results on the VQA dataset, having a better parsimonious property.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Demystifying CLIP Data

    cs.CV 2023-09 accept novelty 6.0

    MetaCLIP curates balanced 400M-pair subsets from CommonCrawl that outperform CLIP data, reaching 70.8% zero-shot ImageNet accuracy on ViT-B versus CLIP's 68.3%.

  2. Learning Invariant Modality Representation for Robust Multimodal Learning from a Causal Inference Perspective

    cs.LG 2026-04 unverdicted novelty 5.0

    CmIR uses causal inference to separate invariant causal representations from spurious ones in multimodal data, improving generalization under distribution shifts and noise via invariance, mutual information, and recon...