FiLM: Visual Reasoning with a General Conditioning Layer

Aaron Courville; Ethan Perez; Florian Strub; Harm de Vries; Vincent Dumoulin

arxiv: 1709.07871 · v2 · pith:XT37AZWGnew · submitted 2017-09-22 · 💻 cs.CV · cs.AI· cs.CL· stat.ML

FiLM: Visual Reasoning with a General Conditioning Layer

Ethan Perez , Florian Strub , Harm de Vries , Vincent Dumoulin , Aaron Courville This is my paper

classification 💻 cs.CV cs.AIcs.CLstat.ML

keywords filmreasoningconditioninglayersvisualfeature-wiseneuralablations

0 comments

read the original abstract

We introduce a general-purpose conditioning method for neural networks called FiLM: Feature-wise Linear Modulation. FiLM layers influence neural network computation via a simple, feature-wise affine transformation based on conditioning information. We show that FiLM layers are highly effective for visual reasoning - answering image-related questions which require a multi-step, high-level process - a task which has proven difficult for standard deep learning methods that do not explicitly model reasoning. Specifically, we show on visual reasoning tasks that FiLM layers 1) halve state-of-the-art error for the CLEVR benchmark, 2) modulate features in a coherent manner, 3) are robust to ablations and architectural modifications, and 4) generalize well to challenging, new data from few examples or even zero-shot.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 18 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Discrete Diffusion for Complex and Congested Multi-Agent Path Finding with Sparse Social Attention
cs.AI 2026-05 unverdicted novelty 7.0

DiffLNS uses a discrete diffusion initializer to produce warm-start plans that lift LNS2 success rates to 95.8% across 20 congested MAPF settings, generalizing from 96 to 312 agents.
SeBA: Semi-supervised few-shot learning via Separated-at-Birth Alignment for tabular data
cs.LG 2026-05 unverdicted novelty 7.0

SeBA is a joint-embedding framework that separates tabular data into two complementary views and aligns one view's representations to the nearest-neighbor structure of the other, improving feature-label relationships ...
Radio-Interferometric Image Reconstruction with Denoising Diffusion Restoration Models
astro-ph.IM 2026-01 unverdicted novelty 7.0

A diffusion model trained on real radio galaxy images reconstructs high-fidelity interferometric observations from VLA, EHT, and ALMA simulations and outperforms CLEAN on gridded visibilities.
Diffusion Models Beat GANs on Image Synthesis
cs.LG 2021-05 accept novelty 7.0

Diffusion models with architecture improvements and classifier guidance achieve superior FID scores to GANs on unconditional and conditional ImageNet image synthesis.
A General Differentiable Ray-Wave Framework for Hybrid Refractive-Diffractive System Modeling and Optimization
physics.optics 2026-05 unverdicted novelty 6.0

A plug-and-play differentiable model bridging ray and wave optics for hybrid systems that enables end-to-end optimization of planar and conformal diffractive elements.
MoMo: Conditioned Contrastive Representation Learning for Preference-Modulated Planning
cs.LG 2026-05 unverdicted novelty 6.0

MoMo uses Feature-Wise Linear Modulation and low-rank neural modulation to condition contrastive planning representations on user preferences while preserving inference efficiency and probability density ratios.
MoMo: Conditioned Contrastive Representation Learning for Preference-Modulated Planning
cs.LG 2026-05 unverdicted novelty 6.0

MoMo conditions contrastive representations and prediction operators on user preferences via FiLM and low-rank modulation to enable continuous modulation of plan safety while preserving inference efficiency.
Self-Supervised Spatial And Zero-Shot Angular Super-Resolution by Spatial-Angular Implicit Representation For Rotating-View SNR-Efficient Diffusion MRI
cs.CV 2026-05 unverdicted novelty 6.0

SA-INR achieves self-supervised spatial super-resolution and zero-shot angular super-resolution in rotating-view dMRI from single-view acquisitions per direction, reaching 34.82 dB PSNR on trained and 33.08 dB on unse...
Lucid-XR: An Extended-Reality Data Engine for Robotic Manipulation
cs.RO 2026-04 unverdicted novelty 6.0

Lucid-XR uses XR-headset physics simulation and physics-guided video generation to create synthetic data that trains robot policies transferring zero-shot to unseen real-world manipulation tasks.
The Override Gap: A Magnitude Account of Knowledge Conflict Failure in Hypernetwork-Based Instant LLM Adaptation
cs.LG 2026-04 conditional novelty 6.0

Knowledge conflicts in hypernetwork LLM adaptation stem from constant adapter margins losing to frequency-dependent pretrained margins; selective layer boosting and conflict-aware triggering raise deep-conflict accura...
Train-Small Deploy-Large: Leveraging Diffusion-Based Multi-Robot Planning
cs.RO 2026-04 unverdicted novelty 6.0

A diffusion-based multi-robot planner trained on few agents generalizes to larger numbers during deployment using inter-agent attention and temporal convolution.
AE-ViT: Stable Long-Horizon Parametric Partial Differential Equations Modeling
cs.LG 2026-04 unverdicted novelty 6.0

AE-ViT combines a convolutional autoencoder with a latent-space transformer and multi-stage parameter plus coordinate injection to deliver stable long-horizon predictions for parametric PDEs, cutting relative rollout ...
Forecasting implied volatility surface with generative diffusion models
q-fin.CP 2025-11 unverdicted novelty 6.0

A conditioned diffusion model with SNR-weighted arbitrage penalty generates one-day-ahead arbitrage-free implied volatility surfaces and outperforms baselines on market data.
Local Diffusion Models and Phases of Data Distributions
cs.LG 2025-08 unverdicted novelty 6.0

The paper introduces a phase framework for data distributions connected by local denoisers and demonstrates that reverse diffusion consists of trivial and data phases separated by a transition where local score functi...
Segmenting Objects in Day and Night:Edge-Conditioned CNN for Thermal Image Semantic Segmentation
cs.CV 2019-07 unverdicted novelty 6.0

EC-CNN uses a gated feature-wise transform to incorporate edge priors for thermal semantic segmentation and introduces the SODA dataset of over 7,000 labeled thermal images.
Federated Medical Image Classification under Class and Domain Imbalance exploiting Synthetic Sample Generation
cs.CV 2026-04 unverdicted novelty 5.0

FedSSG generates and shares synthetic samples within a federated setup to reduce class imbalance and domain shift problems in medical image classification.
The Override Gap: A Magnitude Account of Knowledge Conflict Failure in Hypernetwork-Based Instant LLM Adaptation
cs.LG 2026-04 unverdicted novelty 5.0

Knowledge conflicts in hypernetwork LLM adaptation stem from constant adapter margins losing to frequency-dependent pretrained margins; selective layer boosting and conflict-aware triggering close the gap.
Revisiting Integration of Image and Metadata for DICOM Series Classification: Cross-Attention and Dictionary Learning
eess.IV 2026-02 unverdicted novelty 5.0

Proposes a multimodal model with cross-attention and missingness-aware dictionary learning for robust DICOM series classification that outperforms image-only, metadata-only, and other multimodal baselines on liver MRI...