A temperature-conditioned diffusion model trained on small XY lattices produces accurate larger-lattice samples and cuts MCMC thermalization time by roughly 10x.
hub
FiLM: Visual Reasoning with a General Conditioning Layer
27 Pith papers cite this work. Polarity classification is still indexing.
abstract
We introduce a general-purpose conditioning method for neural networks called FiLM: Feature-wise Linear Modulation. FiLM layers influence neural network computation via a simple, feature-wise affine transformation based on conditioning information. We show that FiLM layers are highly effective for visual reasoning - answering image-related questions which require a multi-step, high-level process - a task which has proven difficult for standard deep learning methods that do not explicitly model reasoning. Specifically, we show on visual reasoning tasks that FiLM layers 1) halve state-of-the-art error for the CLEVR benchmark, 2) modulate features in a coherent manner, 3) are robust to ablations and architectural modifications, and 4) generalize well to challenging, new data from few examples or even zero-shot.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
DGLD applies domain-gated latent diffusion with label-quality gating and multi-task guidance to discover 12 novel energetic material leads validated by DFT, outperforming SMILES-LSTM, SELFIES-GA, and REINVENT baselines in novelty and on-target performance.
The work demonstrates that multi-tracer field-level SBI on galaxy and HI maps yields 2-7 times better constraints on Omega_m and sigma_8 than single-tracer or summary-statistic approaches, with 3D maps performing best.
DiffLNS uses a discrete diffusion initializer to produce warm-start plans that lift LNS2 success rates to 95.8% across 20 congested MAPF settings, generalizing from 96 to 312 agents.
A diffusion model trained on real radio galaxy images reconstructs high-fidelity interferometric observations from VLA, EHT, and ALMA simulations and outperforms CLEAN on gridded visibilities.
SeBA is a joint-embedding framework that separates tabular data into two complementary views and aligns one view's representations to the nearest-neighbor structure of the other, improving feature-label relationships and achieving SOTA results in most benchmarks without relying on augmentations.
Diffusion models with architecture improvements and classifier guidance achieve superior FID scores to GANs on unconditional and conditional ImageNet image synthesis.
FLORA is an octree-based deep learning framework with auxiliary data fusion that predicts forest attributes from heterogeneous LiDAR, achieving rRMSE of 12.3% for dominant height and 39% for total volume on 32k French NFI plots.
Warrant adds a query-item permission gate g_ij to attention value terms, improving primary metrics in 27 of 32 comparisons across CTDG, MTPP, RAG, STPP, and TKG tasks.
Self-attention mechanisms are used to build mesh-preserving neural surrogates that approximate PFEM dynamics for free-surface flows, delivering accurate transient predictions and improved scalability on 2D and 3D benchmarks.
Conditional Graph Diffusion generates continuous negotiation outcomes with high individual rationality using GATv2 encoders, cross-attention fusion, and inference-time normative guidance gradients.
A plug-and-play differentiable model bridging ray and wave optics for hybrid systems that enables end-to-end optimization of planar and conformal diffractive elements.
MoMo conditions contrastive representations and prediction operators on user preferences via FiLM and low-rank modulation to enable continuous modulation of plan safety while preserving inference efficiency.
A conditioned diffusion model with SNR-weighted arbitrage penalty generates one-day-ahead arbitrage-free implied volatility surfaces and outperforms baselines on market data.
The paper introduces a phase framework for data distributions connected by local denoisers and demonstrates that reverse diffusion consists of trivial and data phases separated by a transition where local score functions must fail, tied to spatial Markovianity.
EC-CNN uses a gated feature-wise transform to incorporate edge priors for thermal semantic segmentation and introduces the SODA dataset of over 7,000 labeled thermal images.
SA-INR achieves self-supervised spatial super-resolution and zero-shot angular super-resolution in rotating-view dMRI from single-view acquisitions per direction, reaching 34.82 dB PSNR on trained and 33.08 dB on unseen directions in simulated data while improving DTI fitting.
Lucid-XR uses XR-headset physics simulation and physics-guided video generation to create synthetic data that trains robot policies transferring zero-shot to unseen real-world manipulation tasks.
Knowledge conflicts in hypernetwork LLM adaptation stem from constant adapter margins losing to frequency-dependent pretrained margins; selective layer boosting and conflict-aware triggering raise deep-conflict accuracy to 71-72.5% on Gemma-2B and Mistral-7B.
A diffusion-based multi-robot planner trained on few agents generalizes to larger numbers during deployment using inter-agent attention and temporal convolution.
AE-ViT combines a convolutional autoencoder with a latent-space transformer and multi-stage parameter plus coordinate injection to deliver stable long-horizon predictions for parametric PDEs, cutting relative rollout error by roughly five times versus prior DL-ROMs and ViTs on advection-diffusion-re
Causal probing of attention in audio separation transformers identifies dual pathways and asynchronous convergence, enabling a training-free Layer-Selective Attention Caching method that reduces self-attention computation by ~25% with negligible quality loss.
TAM is a policy-agnostic torque adaptation module trained in randomized simulation that improves zero-shot real-robot performance on dynamic manipulation tasks compared to system identification and RMA baselines.
SPADE combines sketch-guided path planning with diffusion-augmented imitation learning to achieve better generalization and lower error with fewer parameters than prior methods.
citing papers explorer
-
Diffusion-warm sampling of the XY model enables fast thermalization at scale
A temperature-conditioned diffusion model trained on small XY lattices produces accurate larger-lattice samples and cuts MCMC thermalization time by roughly 10x.
-
DGLD: Domain-Gated Latent Diffusion for the Discovery of Novel Energetic Materials
DGLD applies domain-gated latent diffusion with label-quality gating and multi-task guidance to discover 12 novel energetic material leads validated by DFT, outperforming SMILES-LSTM, SELFIES-GA, and REINVENT baselines in novelty and on-target performance.
-
Field-level multi-tracers simulation-based inference of cosmological parameters from 3D maps
The work demonstrates that multi-tracer field-level SBI on galaxy and HI maps yields 2-7 times better constraints on Omega_m and sigma_8 than single-tracer or summary-statistic approaches, with 3D maps performing best.
-
Discrete Diffusion for Complex and Congested Multi-Agent Path Finding with Sparse Social Attention
DiffLNS uses a discrete diffusion initializer to produce warm-start plans that lift LNS2 success rates to 95.8% across 20 congested MAPF settings, generalizing from 96 to 312 agents.
-
Radio-Interferometric Image Reconstruction with Denoising Diffusion Restoration Models
A diffusion model trained on real radio galaxy images reconstructs high-fidelity interferometric observations from VLA, EHT, and ALMA simulations and outperforms CLEAN on gridded visibilities.
-
SeBA: Semi-supervised few-shot learning via Separated-at-Birth Alignment for tabular data
SeBA is a joint-embedding framework that separates tabular data into two complementary views and aligns one view's representations to the nearest-neighbor structure of the other, improving feature-label relationships and achieving SOTA results in most benchmarks without relying on augmentations.
-
Diffusion Models Beat GANs on Image Synthesis
Diffusion models with architecture improvements and classifier guidance achieve superior FID scores to GANs on unconditional and conditional ImageNet image synthesis.
-
FLORA: A deep learning approach to predict forest attributes from heterogeneous LiDAR data
FLORA is an octree-based deep learning framework with auxiliary data fusion that predicts forest attributes from heterogeneous LiDAR, achieving rRMSE of 12.3% for dominant height and 39% for total volume on 32k French NFI plots.
-
Relevance Is Not Permission: Warranted Attention for Value Contributions
Warrant adds a query-item permission gate g_ij to attention value terms, improving primary metrics in 27 of 32 comparisons across CTDG, MTPP, RAG, STPP, and TKG tasks.
-
Attention mechanism for scalable mesh-based neural surrogates of free-surface fluids
Self-attention mechanisms are used to build mesh-preserving neural surrogates that approximate PFEM dynamics for free-surface flows, delivering accurate transient predictions and improved scalability on 2D and 3D benchmarks.
-
Conditional Graph Diffusion for Negotiation Support: Overcoming Discrete Infeasibility and Preference Elicitation Gaps
Conditional Graph Diffusion generates continuous negotiation outcomes with high individual rationality using GATv2 encoders, cross-attention fusion, and inference-time normative guidance gradients.
-
A General Differentiable Ray-Wave Framework for Hybrid Refractive-Diffractive System Modeling and Optimization
A plug-and-play differentiable model bridging ray and wave optics for hybrid systems that enables end-to-end optimization of planar and conformal diffractive elements.
-
MoMo: Conditioned Contrastive Representation Learning for Preference-Modulated Planning
MoMo conditions contrastive representations and prediction operators on user preferences via FiLM and low-rank modulation to enable continuous modulation of plan safety while preserving inference efficiency.
-
Forecasting implied volatility surface with generative diffusion models
A conditioned diffusion model with SNR-weighted arbitrage penalty generates one-day-ahead arbitrage-free implied volatility surfaces and outperforms baselines on market data.
-
Local Diffusion Models and Phases of Data Distributions
The paper introduces a phase framework for data distributions connected by local denoisers and demonstrates that reverse diffusion consists of trivial and data phases separated by a transition where local score functions must fail, tied to spatial Markovianity.
-
Segmenting Objects in Day and Night:Edge-Conditioned CNN for Thermal Image Semantic Segmentation
EC-CNN uses a gated feature-wise transform to incorporate edge priors for thermal semantic segmentation and introduces the SODA dataset of over 7,000 labeled thermal images.
-
Self-Supervised Spatial And Zero-Shot Angular Super-Resolution by Spatial-Angular Implicit Representation For Rotating-View SNR-Efficient Diffusion MRI
SA-INR achieves self-supervised spatial super-resolution and zero-shot angular super-resolution in rotating-view dMRI from single-view acquisitions per direction, reaching 34.82 dB PSNR on trained and 33.08 dB on unseen directions in simulated data while improving DTI fitting.
-
Lucid-XR: An Extended-Reality Data Engine for Robotic Manipulation
Lucid-XR uses XR-headset physics simulation and physics-guided video generation to create synthetic data that trains robot policies transferring zero-shot to unseen real-world manipulation tasks.
-
The Override Gap: A Magnitude Account of Knowledge Conflict Failure in Hypernetwork-Based Instant LLM Adaptation
Knowledge conflicts in hypernetwork LLM adaptation stem from constant adapter margins losing to frequency-dependent pretrained margins; selective layer boosting and conflict-aware triggering raise deep-conflict accuracy to 71-72.5% on Gemma-2B and Mistral-7B.
-
Train-Small Deploy-Large: Leveraging Diffusion-Based Multi-Robot Planning
A diffusion-based multi-robot planner trained on few agents generalizes to larger numbers during deployment using inter-agent attention and temporal convolution.
-
AE-ViT: Stable Long-Horizon Parametric Partial Differential Equations Modeling
AE-ViT combines a convolutional autoencoder with a latent-space transformer and multi-stage parameter plus coordinate injection to deliver stable long-horizon predictions for parametric PDEs, cutting relative rollout error by roughly five times versus prior DL-ROMs and ViTs on advection-diffusion-re
-
Inside the Latent Flow: Causal Deciphering of Attention Dynamics in Audio Separation Foundation Models
Causal probing of attention in audio separation transformers identifies dual pathways and asynchronous convergence, enabling a training-free Layer-Selective Attention Caching method that reduces self-attention computation by ~25% with negligible quality loss.
-
TAM: Torque Adaptation Module for Robust Motion Transfer in Manipulation
TAM is a policy-agnostic torque adaptation module trained in randomized simulation that improves zero-shot real-robot performance on dynamic manipulation tasks compared to system identification and RMA baselines.
-
SPADE: Sketch-guided Path Planning Augmented with Diffusion Experts
SPADE combines sketch-guided path planning with diffusion-augmented imitation learning to achieve better generalization and lower error with fewer parameters than prior methods.
-
LEIA: Learned Environment for Interactive Architected Materials
LEIA is a world model for autoregressive 3D simulation of architected materials under interactive loading, benchmarked on MicroPlate and applied to surrogate-guided de novo design search with finite-element validation.
-
Revisiting Integration of Image and Metadata for DICOM Series Classification: Cross-Attention and Dictionary Learning
Proposes a multimodal model with cross-attention and missingness-aware dictionary learning for robust DICOM series classification that outperforms image-only, metadata-only, and other multimodal baselines on liver MRI datasets.
-
Federated Medical Image Classification under Class and Domain Imbalance exploiting Synthetic Sample Generation
FedSSG generates and shares synthetic samples within a federated setup to reduce class imbalance and domain shift problems in medical image classification.