{"total":17,"items":[{"citing_arxiv_id":"2605.15185","ref_index":25,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Quantitative Video World Model Evaluation for Geometric-Consistency","primary_cat":"cs.CV","submitted_at":"2026-05-14T17:59:04+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"PDI-Bench computes 3D projective residuals from segmented and tracked points to quantify geometric inconsistency in AI-generated videos.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.07860","ref_index":47,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"On the Tradeoffs of On-Device Generative Models in Federated Predictive Maintenance Systems","primary_cat":"cs.LG","submitted_at":"2026-05-08T15:20:22+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Experiments on real industrial time series show that partial model sharing improves diffusion model performance in bandwidth-limited non-IID settings, while full sharing stabilizes GAN training but offers less robustness than VAE or DDPM alternatives.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Although effective, this formulation presents well-known challenges [46]. In practice, adversarial min-max optimization is often unstable and requires careful balancing between the generator and discriminator during training. Problems such as vanishing gradients, sensitivity to hyperparameters, and oscillatory convergence are common, especially on complex or heterogeneous data distributions [47]. Much of the progress in GAN research has therefore focused on stabilizing these dynamics. A key step in this direction is theWasserstein GAN [48], which replaces the Jensen-Shannon divergence of the original GAN with the Wasserstein-1 distance. This change turns the discriminator into acriticf ϕ, removes the sigmoid, and allows real-valued outputs:"},{"citing_arxiv_id":"2605.06678","ref_index":48,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"A Wasserstein GAN-based climate scenario generator for risk management and insurance: the case of soil subsidence","primary_cat":"cs.LG","submitted_at":"2026-04-22T08:30:53+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"A conditional Wasserstein GAN generates plausible future SWI drought trajectories for French insurance risk management under climate change.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.16621","ref_index":27,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Physics-informed, Generative Adversarial Design of Funicular Shells","primary_cat":"cs.CE","submitted_at":"2026-04-17T18:16:04+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"A modified DCGAN with an auxiliary discriminator using the membrane factor generates stable, previously unseen funicular shells optimized for pure compression in three dimensions.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Due to the adversarial training dynamics of GANs and the non-convex nature of the loss landscape, reaching the Nash equilibrium is particularly challenging [13]. This often leads to mode collapse, a common failure mode of GANs in which the generator collapses to a single mode of the data distribution, while the discriminator fails to distinguish between real and fake samples [27, 22]. Over the years, numerous techniques have been proposed to mitigate mode collapse, see [22, 27, 29, 30]. However, in this work two specific techniques are explicitly used to mitigate that issue: feature matching and spectral normalization. 3.3.1. Feature matching Feature matching (FM) addresses training instability by introducing a regularized objective that prevents"},{"citing_arxiv_id":"2604.13432","ref_index":39,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"MaMe & MaRe: Matrix-Based Token Merging and Restoration for Efficient Visual Perception and Synthesis","primary_cat":"cs.CV","submitted_at":"2026-04-15T03:06:24+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"MaMe is a differentiable matrix-only token merging method that doubles ViT-B throughput with a 2% accuracy drop on pre-trained models and enables faster, higher-quality image synthesis when paired with MaRe.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.09168","ref_index":61,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"ELT: Elastic Looped Transformers for Visual Generation","primary_cat":"cs.CV","submitted_at":"2026-04-10T09:53:27+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Elastic Looped Transformers share weights across recurrent blocks and apply intra-loop self-distillation to deliver 4x parameter reduction while matching competitive FID and FVD scores on ImageNet and UCF-101.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"repeatedly. Thisarchitecturalefficiencyparallelsthebiologicalvisualsystems[ 37,38],whererecurrent processing, rather than strictly feedforward pathways, is essential for resolving complex visual inputs. While looping of transformers was popularized by Universal Transformers [9] and has recently empowered language models with stronger reasoning capabilities [61, 75], its potential for high- fidelity visual generation remains largely untapped. From a practical standpoint, compared to the traditional models, Looped Transformers (a) are extremely parameter efficient and can perform significantly more compute (FLOPs) per parameter, (b) can have higher throughput by minimizing the \"memory wall\" bottleneck. They use a compact set of shared parameters and maintain its major"},{"citing_arxiv_id":"2604.05256","ref_index":54,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Protecting and Preserving Protest Dynamics for Responsible Analysis","primary_cat":"cs.CV","submitted_at":"2026-04-06T23:46:29+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"A responsible computing framework substitutes real protest imagery with labeled synthetic reproductions from conditional image synthesis to enable privacy-aware analysis of collective action patterns.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"uses Inception V3 to generate high level feature distributions of the data to compute the maximum mean discrepancy (MMD) as follows: KID=MMD 2 (𝑓real, 𝑓synth)=∥𝜇 F (𝑓real) −𝜇 F (𝑓synth) ∥2 F where F represents the reproducing kernel Hilbert space (RKHS) induced by a polynomial kernel. Manuscript submitted to ACM 12 Cohen Archbold, Usman Hassan, Nazmus Sakib, Sen-ching Cheung, and Abdullah-Al-Zubaer Imran In addition, we used the Inception Score (IS) [ 54] to measure entropy and diversity of extracted features. Given synthetic imagex, we extracted the conditional probability distribution 𝑃( y|x) and marginal class distribution 𝑃( y) using the features of Inception V3. The IS can then be calculated as follows: IS=exp (E𝑥 [𝐷KL (𝑃(y|x) ∥𝑃(y) )]) where 𝐷KL is the Kullback-Leibler divergence. We report IS as a supplementary measure of diversity and classifier confidence,"},{"citing_arxiv_id":"2603.14186","ref_index":20,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Setting-Matched and Semantics-Scaled Benchmarking of One-Step Generative Models Against Multistep Diffusion and Flow Models","primary_cat":"cs.CV","submitted_at":"2026-03-15T02:22:27+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Matched benchmarking reveals FID misleads in few-step regimes under CFG, prompting CLIP-scaled and PickScore-scaled FID and IS variants for better semantic evaluation of one-step image generators.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2601.02731","ref_index":47,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Omni2Sound: Towards Unified Video-Text-to-Audio Generation","primary_cat":"cs.SD","submitted_at":"2026-01-06T05:49:41+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"A single DiT-based diffusion model unifies video-to-audio, text-to-audio, and joint video-text-to-audio generation, supported by a new 470k-pair dataset and three-stage progressive training that resolves task competition.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2509.26258","ref_index":48,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"EnScale: Temporally-consistent multivariate generative downscaling via proper scoring rules","primary_cat":"physics.ao-ph","submitted_at":"2025-09-30T13:46:14+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"EnScale emulates high-resolution regional climate model outputs from global circulation models for multiple variables using a two-step generative process with sparse local stochastic layers and energy score optimization, including a temporally consistent variant.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2307.01952","ref_index":42,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis","primary_cat":"cs.CV","submitted_at":"2023-07-04T23:04:57+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SDXL improves upon prior Stable Diffusion versions through a larger UNet backbone, dual text encoders, novel conditioning, and a refinement model, producing higher-fidelity images competitive with black-box state-of-the-art generators.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2105.05233","ref_index":54,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Diffusion Models Beat GANs on Image Synthesis","primary_cat":"cs.LG","submitted_at":"2021-05-11T17:50:24+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Diffusion models with architecture improvements and classifier guidance achieve superior FID scores to GANs on unconditional and conditional ImageNet image synthesis.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"capable of producing realistic images and sound, there is still much room for improvement beyond the current state-of-the-art, and better generative models could have wide-ranging impacts on graphic design, games, music production, and countless other ﬁelds. GANs [19] currently hold the state-of-the-art on most image generation tasks [5, 68, 28] as measured by sample quality metrics such as FID [23], Inception Score [54] and Precision [32]. However, some of these metrics do not fully capture diversity, and it has been shown that GANs capture less diversity than state-of-the-art likelihood-based models [51, 43, 42]. Furthermore, GANs are often difﬁcult to train, collapsing without carefully selected hyperparameters and regularizers [5, 41, 4]. While GANs hold the state-of-the-art, their drawbacks make them difﬁcult to scale and apply to"},{"citing_arxiv_id":"2104.10157","ref_index":31,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"VideoGPT: Video Generation using VQ-VAE and Transformers","primary_cat":"cs.CV","submitted_at":"2021-04-20T17:58:03+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":6.0,"formal_verification":"none","one_line_summary":"VideoGPT generates competitive natural videos by learning discrete latents with VQ-VAE and modeling them autoregressively with a transformer.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1907.09543","ref_index":23,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Spatial sensitivity analysis for urban land use prediction with physics-constrained conditional generative adversarial networks","primary_cat":"cs.LG","submitted_at":"2019-07-22T19:32:43+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"A physics-constrained cGAN is trained as an image-to-image translator on remote-sensing layers to recover spatial sensitivities of urban land-use change to macroeconomic indicators via backpropagation gradients.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1907.06291","ref_index":14,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Measuring the Transferability of Adversarial Examples","primary_cat":"cs.LG","submitted_at":"2019-07-14T22:20:58+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"Empirical measurement of adversarial example transferability between VGG and Inception model classes with methodological refinements to attack strength selection, perturbation clipping, and evaluation via SSIM.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1906.11080","ref_index":21,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"AGAN: Towards Automated Design of Generative Adversarial Networks","primary_cat":"cs.LG","submitted_at":"2019-06-25T10:12:32+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":8.0,"formal_verification":"none","one_line_summary":"AGAN is the first neural architecture search method for GANs that discovers architectures outperforming state-of-the-art on CIFAR-10 unsupervised image generation and competitive on supervised tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1801.01401","ref_index":45,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Demystifying MMD GANs","primary_cat":"stat.ML","submitted_at":"2018-01-04T15:25:26+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":6.0,"formal_verification":"none","one_line_summary":"MMD GANs have unbiased critic gradients but biased generator gradients from sample-based learning, and the Kernel Inception Distance provides a practical new measure for GAN convergence and dynamic learning rate adaptation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}