NICE learns a composition of invertible neural-network layers that transform data into independent latent variables, enabling exact log-likelihood training and sampling for density estimation.
hub Mixed citations
Generative Adversarial Networks
Mixed citation behavior. Most common role is background (59%).
abstract
We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake. This framework corresponds to a minimax two-player game. In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to 1/2 everywhere. In the case where G and D are defined by multilayer perceptrons, the entire system can be trained with backpropagation. There is no need for any Markov chains or unrolled approximate inference networks during either training or generation of samples. Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the generated samples.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
Nested-GPT is an autoregressive Transformer surrogate that generates variable-multiplicity parton showers while enforcing ordered Markovian branching and matches reference Monte Carlo results for leading-log non-global logarithm resummation in the large-Nc limit.
Creativity is defined as meta-learning where a frozen diffusion creator optimizes candidates for rapid improvement by an adapting appraiser such as an autoencoder or CLIP adapter.
High-resolution interferometric imaging of eight post-AGB circumbinary discs reveals diverse inner-rim substructures including azimuthal brightness enhancements and arc-like features not explained by inclination alone.
Transformer networks sample up to 180x180 2D Ising systems and 64x64 Edwards-Anderson systems by generating spin groups with probability approximations, yielding ~20x higher effective sample size than prior neural samplers at criticality.
A modified DCGAN with an auxiliary discriminator using the membrane factor generates stable, previously unseen funicular shells optimized for pure compression in three dimensions.
VaFES constructs a latent space from reversible collective variables and variationally optimizes a tractable-density generative model to produce a continuous free energy surface from which rare events are directly sampled.
FlowGuard detects unsafe content during diffusion image generation via linear latent decoding and curriculum learning, outperforming prior methods by over 30% F1 while reducing GPU memory by 97% and projection time to 0.2 seconds.
A CNN-based discrete diffusion method refines sparse contours from segmentation masks using simplified denoising steps and minimal post-processing, outperforming baselines on small medical and environmental datasets while running 3.5 times faster.
The paper introduces the first comprehensive taxonomy and visualization of 11 categories of technologies facilitating AI-generated non-consensual intimate images, derived from synthesis of primary sources and demonstrated through case studies.
A LoRA-adapted conditional diffusion surrogate for electromagnetic calorimeter showers matches key observables within 2% RMSE and reproduces directional trends in design-utility gradients.
A U-Net GAN reconstructs CMB T and E maps from Planck-like simulations with foregrounds and systematics, achieving under 1% error outside the Galactic region and demonstrating first-time correction for non-circular beams and asymmetric scans.
PacTure uses view packing and next-scale autoregressive prediction to generate consistent multi-view PBR textures faster than prior sequential or cross-attention methods.
A hierarchical prior-decoder model using CLIP latents generates more diverse text-conditional images than direct methods while preserving photorealism and caption fidelity.
Diffusion models with architecture improvements and classifier guidance achieve superior FID scores to GANs on unconditional and conditional ImageNet image synthesis.
GlitchGAN generates class-conditioned time-domain glitches that pass Gravity Spy classification and show UMAP overlap with real samples while running at high speed.
A two-stage framework that decouples generation, selection, and refinement to improve budget use in diffusion-based dataset distillation.
CPGAN adds a lateral-acceleration collision loss and Voronoi motion features to a GAN to lower collision rates in bidirectional crowd flows while reproducing lane formation and N-t curves.
SGMD uses fake-score optimization toward the teacher with stop-gradient Fisher objective and NR/RC dual potentials to deliver ~3x training speedup and better motion dynamics in 4-step video diffusion models.
A hybrid agentic architecture integrates knowledge-based physical verification tools into LLM-driven CAD design loops, producing more complex and functionally valid designs than prior agentic baselines.
A flow-matching generative model trained on CoLBT-hydro data conditionally generates marginal final-state hadron spectra from jet-induced hydro responses in 0-10% Pb+Pb collisions at 5.02 TeV, matching training data statistics with approximately six orders of magnitude computational speedup.
DualFashion introduces a dual-diffusion Transformer with image and text branches that generates both visual items and semantic descriptions for explainable personalized fashion recommendation.
BF16 tensor cores on GPUs emulate FP32 SGEMM with superior performance, power efficiency, and numerical accuracy compared to native FP32, including a library implementation that handles denormals.
MAVIC corrects Bellman backups at instruction boundaries by adjusting the incoming objective and restoring continuation value, enabling consistent estimation under stochastic instruction switching in cooperative MARL.
citing papers explorer
-
Hierarchical Text-Conditional Image Generation with CLIP Latents
A hierarchical prior-decoder model using CLIP latents generates more diverse text-conditional images than direct methods while preserving photorealism and caption fidelity.
-
CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers
CogVideo is a large-scale transformer pretrained for text-to-video generation that outperforms public models in evaluations.