hub

FiLM: Visual Reasoning with a General Conditioning Layer

Ethan Perez, Florian Strub, Harm de Vries, Vincent Dumoulin, Aaron Courville · 2017 · cs.CV · arXiv 1709.07871

27 Pith papers cite this work. Polarity classification is still indexing.

27 Pith papers citing it

open full Pith review browse 27 citing papers arXiv PDF

abstract

We introduce a general-purpose conditioning method for neural networks called FiLM: Feature-wise Linear Modulation. FiLM layers influence neural network computation via a simple, feature-wise affine transformation based on conditioning information. We show that FiLM layers are highly effective for visual reasoning - answering image-related questions which require a multi-step, high-level process - a task which has proven difficult for standard deep learning methods that do not explicitly model reasoning. Specifically, we show on visual reasoning tasks that FiLM layers 1) halve state-of-the-art error for the CLEVR benchmark, 2) modulate features in a coherent manner, 3) are robust to ablations and architectural modifications, and 4) generalize well to challenging, new data from few examples or even zero-shot.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2 method 2

citation-polarity summary

background 2 use method 2

representative citing papers

Diffusion-warm sampling of the XY model enables fast thermalization at scale

quant-ph · 2026-06-29 · unverdicted · novelty 7.0

A temperature-conditioned diffusion model trained on small XY lattices produces accurate larger-lattice samples and cuts MCMC thermalization time by roughly 10x.

DGLD: Domain-Gated Latent Diffusion for the Discovery of Novel Energetic Materials

physics.chem-ph · 2026-05-26 · unverdicted · novelty 7.0

DGLD applies domain-gated latent diffusion with label-quality gating and multi-task guidance to discover 12 novel energetic material leads validated by DFT, outperforming SMILES-LSTM, SELFIES-GA, and REINVENT baselines in novelty and on-target performance.

Field-level multi-tracers simulation-based inference of cosmological parameters from 3D maps

astro-ph.CO · 2026-05-25 · unverdicted · novelty 7.0

The work demonstrates that multi-tracer field-level SBI on galaxy and HI maps yields 2-7 times better constraints on Omega_m and sigma_8 than single-tracer or summary-statistic approaches, with 3D maps performing best.

Discrete Diffusion for Complex and Congested Multi-Agent Path Finding with Sparse Social Attention

cs.AI · 2026-05-13 · unverdicted · novelty 7.0

DiffLNS uses a discrete diffusion initializer to produce warm-start plans that lift LNS2 success rates to 95.8% across 20 congested MAPF settings, generalizing from 96 to 312 agents.

Radio-Interferometric Image Reconstruction with Denoising Diffusion Restoration Models

astro-ph.IM · 2026-01-22 · unverdicted · novelty 7.0

A diffusion model trained on real radio galaxy images reconstructs high-fidelity interferometric observations from VLA, EHT, and ALMA simulations and outperforms CLEAN on gridded visibilities.

SeBA: Semi-supervised few-shot learning via Separated-at-Birth Alignment for tabular data

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

SeBA is a joint-embedding framework that separates tabular data into two complementary views and aligns one view's representations to the nearest-neighbor structure of the other, improving feature-label relationships and achieving SOTA results in most benchmarks without relying on augmentations.

Diffusion Models Beat GANs on Image Synthesis

cs.LG · 2021-05-11 · accept · novelty 7.0

Diffusion models with architecture improvements and classifier guidance achieve superior FID scores to GANs on unconditional and conditional ImageNet image synthesis.

FLORA: A deep learning approach to predict forest attributes from heterogeneous LiDAR data

cs.CV · 2026-06-30 · unverdicted · novelty 6.0

FLORA is an octree-based deep learning framework with auxiliary data fusion that predicts forest attributes from heterogeneous LiDAR, achieving rRMSE of 12.3% for dominant height and 39% for total volume on 32k French NFI plots.

Relevance Is Not Permission: Warranted Attention for Value Contributions

cs.AI · 2026-06-29 · unverdicted · novelty 6.0 · 2 refs

Warrant adds a query-item permission gate g_ij to attention value terms, improving primary metrics in 27 of 32 comparisons across CTDG, MTPP, RAG, STPP, and TKG tasks.

Attention mechanism for scalable mesh-based neural surrogates of free-surface fluids

cs.CE · 2026-06-22 · unverdicted · novelty 6.0

Self-attention mechanisms are used to build mesh-preserving neural surrogates that approximate PFEM dynamics for free-surface flows, delivering accurate transient predictions and improved scalability on 2D and 3D benchmarks.

Conditional Graph Diffusion for Negotiation Support: Overcoming Discrete Infeasibility and Preference Elicitation Gaps

cs.GT · 2026-06-01 · unverdicted · novelty 6.0

Conditional Graph Diffusion generates continuous negotiation outcomes with high individual rationality using GATv2 encoders, cross-attention fusion, and inference-time normative guidance gradients.

A General Differentiable Ray-Wave Framework for Hybrid Refractive-Diffractive System Modeling and Optimization

physics.optics · 2026-05-14 · unverdicted · novelty 6.0

A plug-and-play differentiable model bridging ray and wave optics for hybrid systems that enables end-to-end optimization of planar and conformal diffractive elements.

MoMo: Conditioned Contrastive Representation Learning for Preference-Modulated Planning

cs.LG · 2026-05-08 · unverdicted · novelty 6.0 · 2 refs

MoMo conditions contrastive representations and prediction operators on user preferences via FiLM and low-rank modulation to enable continuous modulation of plan safety while preserving inference efficiency.

Forecasting implied volatility surface with generative diffusion models

q-fin.CP · 2025-11-10 · unverdicted · novelty 6.0

A conditioned diffusion model with SNR-weighted arbitrage penalty generates one-day-ahead arbitrage-free implied volatility surfaces and outperforms baselines on market data.

Local Diffusion Models and Phases of Data Distributions

cs.LG · 2025-08-08 · unverdicted · novelty 6.0

The paper introduces a phase framework for data distributions connected by local denoisers and demonstrates that reverse diffusion consists of trivial and data phases separated by a transition where local score functions must fail, tied to spatial Markovianity.

Segmenting Objects in Day and Night:Edge-Conditioned CNN for Thermal Image Semantic Segmentation

cs.CV · 2019-07-24 · unverdicted · novelty 6.0

EC-CNN uses a gated feature-wise transform to incorporate edge priors for thermal semantic segmentation and introduces the SODA dataset of over 7,000 labeled thermal images.

Self-Supervised Spatial And Zero-Shot Angular Super-Resolution by Spatial-Angular Implicit Representation For Rotating-View SNR-Efficient Diffusion MRI

cs.CV · 2026-05-04 · unverdicted · novelty 6.0

SA-INR achieves self-supervised spatial super-resolution and zero-shot angular super-resolution in rotating-view dMRI from single-view acquisitions per direction, reaching 34.82 dB PSNR on trained and 33.08 dB on unseen directions in simulated data while improving DTI fitting.

Lucid-XR: An Extended-Reality Data Engine for Robotic Manipulation

cs.RO · 2026-04-30 · unverdicted · novelty 6.0

Lucid-XR uses XR-headset physics simulation and physics-guided video generation to create synthetic data that trains robot policies transferring zero-shot to unseen real-world manipulation tasks.

The Override Gap: A Magnitude Account of Knowledge Conflict Failure in Hypernetwork-Based Instant LLM Adaptation

cs.LG · 2026-04-26 · conditional · novelty 6.0 · 2 refs

Knowledge conflicts in hypernetwork LLM adaptation stem from constant adapter margins losing to frequency-dependent pretrained margins; selective layer boosting and conflict-aware triggering raise deep-conflict accuracy to 71-72.5% on Gemma-2B and Mistral-7B.

Train-Small Deploy-Large: Leveraging Diffusion-Based Multi-Robot Planning

cs.RO · 2026-04-08 · unverdicted · novelty 6.0

A diffusion-based multi-robot planner trained on few agents generalizes to larger numbers during deployment using inter-agent attention and temporal convolution.

AE-ViT: Stable Long-Horizon Parametric Partial Differential Equations Modeling

cs.LG · 2026-04-07 · unverdicted · novelty 6.0

AE-ViT combines a convolutional autoencoder with a latent-space transformer and multi-stage parameter plus coordinate injection to deliver stable long-horizon predictions for parametric PDEs, cutting relative rollout error by roughly five times versus prior DL-ROMs and ViTs on advection-diffusion-re

Inside the Latent Flow: Causal Deciphering of Attention Dynamics in Audio Separation Foundation Models

cs.SD · 2026-06-08 · unverdicted · novelty 5.0

Causal probing of attention in audio separation transformers identifies dual pathways and asynchronous convergence, enabling a training-free Layer-Selective Attention Caching method that reduces self-attention computation by ~25% with negligible quality loss.

TAM: Torque Adaptation Module for Robust Motion Transfer in Manipulation

cs.RO · 2026-06-04 · unverdicted · novelty 5.0

TAM is a policy-agnostic torque adaptation module trained in randomized simulation that improves zero-shot real-robot performance on dynamic manipulation tasks compared to system identification and RMA baselines.

SPADE: Sketch-guided Path Planning Augmented with Diffusion Experts

cs.RO · 2026-06-02 · unverdicted · novelty 5.0

SPADE combines sketch-guided path planning with diffusion-augmented imitation learning to achieve better generalization and lower error with fewer parameters than prior methods.

citing papers explorer

Showing 27 of 27 citing papers.

Diffusion-warm sampling of the XY model enables fast thermalization at scale quant-ph · 2026-06-29 · unverdicted · none · ref 29 · internal anchor
A temperature-conditioned diffusion model trained on small XY lattices produces accurate larger-lattice samples and cuts MCMC thermalization time by roughly 10x.
DGLD: Domain-Gated Latent Diffusion for the Discovery of Novel Energetic Materials physics.chem-ph · 2026-05-26 · unverdicted · none · ref 67 · internal anchor
DGLD applies domain-gated latent diffusion with label-quality gating and multi-task guidance to discover 12 novel energetic material leads validated by DFT, outperforming SMILES-LSTM, SELFIES-GA, and REINVENT baselines in novelty and on-target performance.
Field-level multi-tracers simulation-based inference of cosmological parameters from 3D maps astro-ph.CO · 2026-05-25 · unverdicted · none · ref 131 · internal anchor
The work demonstrates that multi-tracer field-level SBI on galaxy and HI maps yields 2-7 times better constraints on Omega_m and sigma_8 than single-tracer or summary-statistic approaches, with 3D maps performing best.
Discrete Diffusion for Complex and Congested Multi-Agent Path Finding with Sparse Social Attention cs.AI · 2026-05-13 · unverdicted · none · ref 12 · internal anchor
DiffLNS uses a discrete diffusion initializer to produce warm-start plans that lift LNS2 success rates to 95.8% across 20 congested MAPF settings, generalizing from 96 to 312 agents.
Radio-Interferometric Image Reconstruction with Denoising Diffusion Restoration Models astro-ph.IM · 2026-01-22 · unverdicted · none · ref 12 · internal anchor
A diffusion model trained on real radio galaxy images reconstructs high-fidelity interferometric observations from VLA, EHT, and ALMA simulations and outperforms CLEAN on gridded visibilities.
SeBA: Semi-supervised few-shot learning via Separated-at-Birth Alignment for tabular data cs.LG · 2026-05-08 · unverdicted · none · ref 227
SeBA is a joint-embedding framework that separates tabular data into two complementary views and aligns one view's representations to the nearest-neighbor structure of the other, improving feature-label relationships and achieving SOTA results in most benchmarks without relying on augmentations.
Diffusion Models Beat GANs on Image Synthesis cs.LG · 2021-05-11 · accept · none · ref 48
Diffusion models with architecture improvements and classifier guidance achieve superior FID scores to GANs on unconditional and conditional ImageNet image synthesis.
FLORA: A deep learning approach to predict forest attributes from heterogeneous LiDAR data cs.CV · 2026-06-30 · unverdicted · none · ref 146 · internal anchor
FLORA is an octree-based deep learning framework with auxiliary data fusion that predicts forest attributes from heterogeneous LiDAR, achieving rRMSE of 12.3% for dominant height and 39% for total volume on 32k French NFI plots.
Relevance Is Not Permission: Warranted Attention for Value Contributions cs.AI · 2026-06-29 · unverdicted · none · ref 8 · 2 links · internal anchor
Warrant adds a query-item permission gate g_ij to attention value terms, improving primary metrics in 27 of 32 comparisons across CTDG, MTPP, RAG, STPP, and TKG tasks.
Attention mechanism for scalable mesh-based neural surrogates of free-surface fluids cs.CE · 2026-06-22 · unverdicted · none · ref 56 · internal anchor
Self-attention mechanisms are used to build mesh-preserving neural surrogates that approximate PFEM dynamics for free-surface flows, delivering accurate transient predictions and improved scalability on 2D and 3D benchmarks.
Conditional Graph Diffusion for Negotiation Support: Overcoming Discrete Infeasibility and Preference Elicitation Gaps cs.GT · 2026-06-01 · unverdicted · none · ref 32 · internal anchor
Conditional Graph Diffusion generates continuous negotiation outcomes with high individual rationality using GATv2 encoders, cross-attention fusion, and inference-time normative guidance gradients.
A General Differentiable Ray-Wave Framework for Hybrid Refractive-Diffractive System Modeling and Optimization physics.optics · 2026-05-14 · unverdicted · none · ref 123 · internal anchor
A plug-and-play differentiable model bridging ray and wave optics for hybrid systems that enables end-to-end optimization of planar and conformal diffractive elements.
MoMo: Conditioned Contrastive Representation Learning for Preference-Modulated Planning cs.LG · 2026-05-08 · unverdicted · none · ref 48 · 2 links · internal anchor
MoMo conditions contrastive representations and prediction operators on user preferences via FiLM and low-rank modulation to enable continuous modulation of plan safety while preserving inference efficiency.
Forecasting implied volatility surface with generative diffusion models q-fin.CP · 2025-11-10 · unverdicted · none · ref 13 · internal anchor
A conditioned diffusion model with SNR-weighted arbitrage penalty generates one-day-ahead arbitrage-free implied volatility surfaces and outperforms baselines on market data.
Local Diffusion Models and Phases of Data Distributions cs.LG · 2025-08-08 · unverdicted · none · ref 52 · internal anchor
The paper introduces a phase framework for data distributions connected by local denoisers and demonstrates that reverse diffusion consists of trivial and data phases separated by a transition where local score functions must fail, tied to spatial Markovianity.
Segmenting Objects in Day and Night:Edge-Conditioned CNN for Thermal Image Semantic Segmentation cs.CV · 2019-07-24 · unverdicted · none · ref 23 · internal anchor
EC-CNN uses a gated feature-wise transform to incorporate edge priors for thermal semantic segmentation and introduces the SODA dataset of over 7,000 labeled thermal images.
Self-Supervised Spatial And Zero-Shot Angular Super-Resolution by Spatial-Angular Implicit Representation For Rotating-View SNR-Efficient Diffusion MRI cs.CV · 2026-05-04 · unverdicted · none · ref 14
SA-INR achieves self-supervised spatial super-resolution and zero-shot angular super-resolution in rotating-view dMRI from single-view acquisitions per direction, reaching 34.82 dB PSNR on trained and 33.08 dB on unseen directions in simulated data while improving DTI fitting.
Lucid-XR: An Extended-Reality Data Engine for Robotic Manipulation cs.RO · 2026-04-30 · unverdicted · none · ref 21
Lucid-XR uses XR-headset physics simulation and physics-guided video generation to create synthetic data that trains robot policies transferring zero-shot to unseen real-world manipulation tasks.
The Override Gap: A Magnitude Account of Knowledge Conflict Failure in Hypernetwork-Based Instant LLM Adaptation cs.LG · 2026-04-26 · conditional · none · ref 29 · 2 links
Knowledge conflicts in hypernetwork LLM adaptation stem from constant adapter margins losing to frequency-dependent pretrained margins; selective layer boosting and conflict-aware triggering raise deep-conflict accuracy to 71-72.5% on Gemma-2B and Mistral-7B.
Train-Small Deploy-Large: Leveraging Diffusion-Based Multi-Robot Planning cs.RO · 2026-04-08 · unverdicted · none · ref 15
A diffusion-based multi-robot planner trained on few agents generalizes to larger numbers during deployment using inter-agent attention and temporal convolution.
AE-ViT: Stable Long-Horizon Parametric Partial Differential Equations Modeling cs.LG · 2026-04-07 · unverdicted · none · ref 19
AE-ViT combines a convolutional autoencoder with a latent-space transformer and multi-stage parameter plus coordinate injection to deliver stable long-horizon predictions for parametric PDEs, cutting relative rollout error by roughly five times versus prior DL-ROMs and ViTs on advection-diffusion-re
Inside the Latent Flow: Causal Deciphering of Attention Dynamics in Audio Separation Foundation Models cs.SD · 2026-06-08 · unverdicted · none · ref 33 · internal anchor
Causal probing of attention in audio separation transformers identifies dual pathways and asynchronous convergence, enabling a training-free Layer-Selective Attention Caching method that reduces self-attention computation by ~25% with negligible quality loss.
TAM: Torque Adaptation Module for Robust Motion Transfer in Manipulation cs.RO · 2026-06-04 · unverdicted · none · ref 21 · internal anchor
TAM is a policy-agnostic torque adaptation module trained in randomized simulation that improves zero-shot real-robot performance on dynamic manipulation tasks compared to system identification and RMA baselines.
SPADE: Sketch-guided Path Planning Augmented with Diffusion Experts cs.RO · 2026-06-02 · unverdicted · none · ref 15 · internal anchor
SPADE combines sketch-guided path planning with diffusion-augmented imitation learning to achieve better generalization and lower error with fewer parameters than prior methods.
LEIA: Learned Environment for Interactive Architected Materials cs.LG · 2026-05-27 · unverdicted · none · ref 38 · internal anchor
LEIA is a world model for autoregressive 3D simulation of architected materials under interactive loading, benchmarked on MicroPlate and applied to surrogate-guided de novo design search with finite-element validation.
Revisiting Integration of Image and Metadata for DICOM Series Classification: Cross-Attention and Dictionary Learning eess.IV · 2026-02-27 · unverdicted · none · ref 9 · internal anchor
Proposes a multimodal model with cross-attention and missingness-aware dictionary learning for robust DICOM series classification that outperforms image-only, metadata-only, and other multimodal baselines on liver MRI datasets.
Federated Medical Image Classification under Class and Domain Imbalance exploiting Synthetic Sample Generation cs.CV · 2026-04-29 · unverdicted · none · ref 24
FedSSG generates and shares synthetic samples within a federated setup to reduce class imbalance and domain shift problems in medical image classification.

FiLM: Visual Reasoning with a General Conditioning Layer

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer