ciwGAN and fiwGAN models trained on isolated words spontaneously generate concatenated multi-word outputs and display early compositionality precursors.
hub Mixed citations
Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks
Mixed citation behavior. Most common role is background (67%).
abstract
In recent years, supervised learning with convolutional networks (CNNs) has seen huge adoption in computer vision applications. Comparatively, unsupervised learning with CNNs has received less attention. In this work we hope to help bridge the gap between the success of CNNs for supervised learning and unsupervised learning. We introduce a class of CNNs called deep convolutional generative adversarial networks (DCGANs), that have certain architectural constraints, and demonstrate that they are a strong candidate for unsupervised learning. Training on various image datasets, we show convincing evidence that our deep convolutional adversarial pair learns a hierarchy of representations from object parts to scenes in both the generator and discriminator. Additionally, we use the learned features for novel tasks - demonstrating their applicability as general image representations.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
Toy models demonstrate that polysemanticity arises when neural networks store more sparse features than neurons via superposition, producing a phase transition tied to polytope geometry and increased adversarial vulnerability.
GPT-f, a transformer-based prover for Metamath, generated new short proofs that were accepted into the main library—the first such contribution from a deep-learning system.
Real NVP uses affine coupling layers to create invertible transformations that support exact density estimation, sampling, and latent inference without approximations.
W-Flow achieves state-of-the-art one-step ImageNet 256x256 generation at 1.29 FID by training a static neural network to follow a Wasserstein gradient flow that minimizes Sinkhorn divergence, delivering roughly 100x faster sampling than comparable multi-step models.
A relative projection error metric in foundation-model embedding space predicts the downstream utility of synthetic positive samples for binary classifiers.
Recursive generative retraining with pluralistic preferences converges to a stable diverse distribution that satisfies a weighted Nash bargaining solution.
Prompts can be split into separate roles for sampling design and recovery modeling in generative compressed sensing, with stable recovery bounds for matched prompts and an explicit penalty for mismatch, validated on Stable Diffusion.
A modified DCGAN with an auxiliary discriminator using the membrane factor generates stable, previously unseen funicular shells optimized for pure compression in three dimensions.
SurFITR is a new collection of 137k+ surveillance-style forged images that causes existing detectors to degrade while enabling substantial gains when used for training in both in-domain and cross-domain settings.
A pre-training diagnostic map based on spectral correlation resemblance to IQP circuits and excess structural complexity identifies suitable datasets like turbulence data for quantum generative models, yielding competitive low-resource performance.
ASTRA is a plug-and-play training-free method for precise multi-subject video editing that uses prompt-guided multimodal alignment and prior-based mask retargeting to avoid attention dilution and boundary issues.
Progressive growing stabilizes GAN training to produce high-resolution images of unprecedented quality and achieves a record unsupervised inception score of 8.80 on CIFAR10.
Mixed precision training uses FP16 for most computations, FP32 master weights for accumulation, and loss scaling to enable accurate training of large DNNs with halved memory usage.
VFMTok builds a generalist image tokenizer on frozen VFMs using adaptive quantization and semantic alignment, delivering gFID 1.36 for autoregressive and 1.25 for continuous generation on ImageNet with 3x faster convergence.
NeTMY neural fields with annealed encoding, multiscale optimization, and spectrum-fidelity losses achieve superior localization and distributional accuracy in NV-center inverse sensing by using a tensor power-summed dipolar operator that exposes and mitigates center-collapse failures.
CE-FI maps heterogeneous model representations to a shared embedding space via unsupervised training on unlabeled data, enabling privacy-preserving federated inference that outperforms solo models on image classification benchmarks.
A new framework evaluates utility of synthetic mobility trajectories while a membership inference attack reveals privacy vulnerabilities in generative models thought to be safe.
Embedding Arithmetic performs vector operations in the embedding space of T2I models to mitigate bias at inference time, outperforming baselines on diversity while preserving coherence via a new Concept Coherence Score.
FatigueFusion fuses fatigue features in latent space using algorithmic, data-driven, and PINN modules to synthesize novel fatigued motions from non-fatigued joint sequences in an end-to-end pipeline.
Finetuning generative models on limited instance segmentation data produces zero-shot generalization to unseen object categories and styles, matching or exceeding supervised baselines like SAM on ambiguous boundaries.
Scaling noise magnitude in NCE aligns gradients with MLE, enabling a practical approximation that improves performance on CIFAR-10 and ImageNet image modeling with fewer training steps.
MetaGPT embeds human SOPs into LLM prompts to create role-specialized agent teams that produce more coherent solutions on collaborative software engineering tasks than prior chat-based multi-agent systems.
VideoGPT generates competitive natural videos by learning discrete latents with VQ-VAE and modeling them autoregressively with a transformer.
citing papers explorer
-
Basic syntax from speech: Spontaneous concatenation in unsupervised deep neural networks
ciwGAN and fiwGAN models trained on isolated words spontaneously generate concatenated multi-word outputs and display early compositionality precursors.
-
Toy Models of Superposition
Toy models demonstrate that polysemanticity arises when neural networks store more sparse features than neurons via superposition, producing a phase transition tied to polytope geometry and increased adversarial vulnerability.
-
Generative Language Modeling for Automated Theorem Proving
GPT-f, a transformer-based prover for Metamath, generated new short proofs that were accepted into the main library—the first such contribution from a deep-learning system.
-
Density estimation using Real NVP
Real NVP uses affine coupling layers to create invertible transformations that support exact density estimation, sampling, and latent inference without approximations.
-
One-Step Generative Modeling via Wasserstein Gradient Flows
W-Flow achieves state-of-the-art one-step ImageNet 256x256 generation at 1.29 FID by training a static neural network to follow a Wasserstein gradient flow that minimizes Sinkhorn divergence, delivering roughly 100x faster sampling than comparable multi-step models.
-
Discriminative Span as a Predictor of Synthetic Data Utility via Classifier Reconstruction
A relative projection error metric in foundation-model embedding space predicts the downstream utility of synthetic positive samples for binary classifiers.
-
Curated Synthetic Data Doesn't Have to Collapse: A Theoretical Study of Generative Retraining with Pluralistic Preferences
Recursive generative retraining with pluralistic preferences converges to a stable diverse distribution that satisfies a weighted Nash bargaining solution.
-
Active Learning for Conditional Generative Compressed Sensing
Prompts can be split into separate roles for sampling design and recovery modeling in generative compressed sensing, with stable recovery bounds for matched prompts and an explicit penalty for mismatch, validated on Stable Diffusion.
-
Physics-informed, Generative Adversarial Design of Funicular Shells
A modified DCGAN with an auxiliary discriminator using the membrane factor generates stable, previously unseen funicular shells optimized for pure compression in three dimensions.
-
SurFITR: A Dataset for Surveillance Image Forgery Detection and Localisation
SurFITR is a new collection of 137k+ surveillance-style forged images that causes existing detectors to degrade while enabling substantial gains when used for training in both in-domain and cross-domain settings.
-
Toward Generative Quantum Utility via Correlation-Complexity Map
A pre-training diagnostic map based on spectral correlation resemblance to IQP circuits and excess structural complexity identifies suitable datasets like turbulence data for quantum generative models, yielding competitive low-resource performance.
-
ASTRA: Let Arbitrary Subjects Transform in Video Editing
ASTRA is a plug-and-play training-free method for precise multi-subject video editing that uses prompt-guided multimodal alignment and prior-based mask retargeting to avoid attention dilution and boundary issues.
-
Progressive Growing of GANs for Improved Quality, Stability, and Variation
Progressive growing stabilizes GAN training to produce high-resolution images of unprecedented quality and achieves a record unsupervised inception score of 8.80 on CIFAR10.
-
Mixed Precision Training
Mixed precision training uses FP16 for most computations, FP32 master weights for accumulation, and loss scaling to enable accurate training of large DNNs with halved memory usage.
-
Vision Foundation Models as Generalist Tokenizers for Image Generation
VFMTok builds a generalist image tokenizer on frozen VFMs using adaptive quantization and semantic alignment, delivering gFID 1.36 for autoregressive and 1.25 for continuous generation on ImageNet with 3x faster convergence.
-
Neural Fields for NV-Center Inverse Sensing
NeTMY neural fields with annealed encoding, multiscale optimization, and spectrum-fidelity losses achieve superior localization and distributional accuracy in NV-center inverse sensing by using a tensor power-summed dipolar operator that exposes and mitigates center-collapse failures.
-
Enabling Federated Inference via Unsupervised Consensus Embedding
CE-FI maps heterogeneous model representations to a shared embedding space via unsupervised training on unlabeled data, enabling privacy-preserving federated inference that outperforms solo models on image classification benchmarks.
-
A Dual Perspective on Synthetic Trajectory Generators: Utility Framework and Privacy Vulnerabilities
A new framework evaluates utility of synthetic mobility trajectories while a membership inference attack reveals privacy vulnerabilities in generative models thought to be safe.
-
Embedding Arithmetic: A Lightweight, Tuning-Free Framework for Post-hoc Bias Mitigation in Text-to-Image Models
Embedding Arithmetic performs vector operations in the embedding space of T2I models to mitigate bias at inference time, outperforming baselines on diversity while preserving coherence via a new Concept Coherence Score.
-
FatigueFusion: Latent Space Fusion for Fatigue-Driven Motion Synthesis
FatigueFusion fuses fatigue features in latent space using algorithmic, data-driven, and PINN modules to synthesize novel fatigued motions from non-fatigued joint sequences in an end-to-end pipeline.
-
gen2seg: Generative Models Enable Generalizable Instance Segmentation
Finetuning generative models on limited instance segmentation data produces zero-shot generalization to unseen object categories and styles, matching or exceeding supervised baselines like SAM on ambiguous boundaries.
-
"Noisier" Noise Contrastive Eestimation is (Almost) Maximum Likelihood
Scaling noise magnitude in NCE aligns gradients with MLE, enabling a practical approximation that improves performance on CIFAR-10 and ImageNet image modeling with fewer training steps.
-
MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework
MetaGPT embeds human SOPs into LLM prompts to create role-specialized agent teams that produce more coherent solutions on collaborative software engineering tasks than prior chat-based multi-agent systems.
-
VideoGPT: Video Generation using VQ-VAE and Transformers
VideoGPT generates competitive natural videos by learning discrete latents with VQ-VAE and modeling them autoregressively with a transformer.
-
Dual Adversarial Semantics-Consistent Network for Generalized Zero-Shot Learning
DASCN uses a unified primal-dual GAN architecture to generate semantics-consistent visual features for generalized zero-shot learning, claiming state-of-the-art gains.
-
Dual Adversarial Learning with Attention Mechanism for Fine-grained Medical Image Synthesis
Dual-discriminator GAN with adversarial attention improves fine-grained medical image synthesis, especially in hard-to-generate tumor regions, and outperforms prior methods on brain tumor and CT-to-MRI tasks.
-
RED: A ReRAM-based Deconvolution Accelerator
RED introduces pixel-wise mapping and zero-skipping dataflow for ReRAM deconvolution acceleration, reporting 1.15x-3.69x speedup and 8%-88.36% energy reduction versus prior ReRAM accelerators.
-
Demystifying MMD GANs
MMD GANs have unbiased critic gradients but biased generator gradients from sample-based learning, and the Kernel Inception Distance provides a practical new measure for GAN convergence and dynamic learning rate adaptation.
-
A Geometric Algebra-Informed 3D Gaussian Splatting Framework for Wireless Scene Representation
GAI-GS couples 3D Gaussian splatting with geometric algebra attention to encode spatial-electromagnetic relations and model multipath, attenuation, and reflections in wireless environments.
-
Are Candidate Models Really Needed for Active Learning?
Active learning with randomly initialized models achieves comparable results to traditional candidate-model methods, with low-confidence sampling proving most effective.
-
Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling
Visual generation models are evolving from passive renderers to interactive agentic world modelers, but current systems lack spatial reasoning, temporal consistency, and causal understanding, with evaluations overemphasizing perceptual quality.
-
ACPO: Anchor-Constrained Perceptual Optimization for Diffusion Models with No-Reference Quality Guidance
ACPO uses anchor-based regularization with NR-IQA guidance to enable stable perceptual quality improvements in diffusion model fine-tuning.
-
Improving Diversity in Black-box Few-shot Knowledge Distillation
An adaptive high-confidence image selection scheme during GAN training expands diversity in the distillation set for black-box few-shot KD and yields SOTA student accuracy on seven image datasets.
-
A Geometric Algebra-informed NeRF Framework for Generalizable Wireless Channel Prediction
GAI-NeRF combines geometric algebra attention and an adaptive ray tracing module inside a NeRF model to deliver more accurate and generalizable wireless channel predictions across varied indoor environments.
-
Unsupervised Detection of Spatiotemporal Anomalies in PMU Data Using Transformer-Based BiGAN
T-BiGAN integrates window-attention Transformers in a BiGAN to achieve ROC-AUC 0.95 and average precision 0.996 for unsupervised spatiotemporal anomaly detection in PMU data.
-
Quantum generative modeling for financial time series with temporal correlations
QGANs with quantum generators and classical discriminators generate financial time series matching target distributions and desired temporal correlations, with quality varying by circuit depth, bond dimension, and simulation method.
-
CCNETS: A Modular Causal Learning Framework for Pattern Recognition in Imbalanced Datasets
CCNETS is a new modular causal framework using three cooperative modules and a Zoint mechanism to align synthetic data generation with classifier needs on imbalanced pattern recognition tasks.
-
Synthetic Augmentation and Feature-based Filtering for Improved Cervical Histopathology Image Classification
cGAN data augmentation with feature-based filtering improves ResNet18 CIN grading accuracy from 66.3% to 71.7% on segmented epithelium patches.
-
Affine Disentangled GAN for Interpretable and Robust AV Perception
ADIS-GAN disentangles affine transformations in a GAN to achieve over 98% classification accuracy on MNIST within 30 degrees rotation and over 90% under FGSM and PGD attacks while generating rotation and scaling factors.
-
Generative Counterfactual Introspection for Explainable Deep Learning
A generative-model-driven introspection method produces counterfactual image edits to explain deep neural network predictions on MNIST and CelebA.
-
Enhancing the accuracy of under-resolved numerical simulations of atmospheric flows with super resolution
A multi-scale CNN super-resolution model outperforms baseline CNN, attention CNN, and diffusion-based approaches in reconstructing fine-scale features from under-resolved atmospheric flow simulations on standard benchmarks.
-
Improving conditional generative adversarial networks for inverse design of plasmonic structures
Adding label projection and a novel embedding network to cGANs cuts mean absolute error by up to an order of magnitude and makes training converge over three times faster for plasmonic inverse design.
-
Diving Deeper into Underwater Image Enhancement: A Survey
A comprehensive survey of deep learning-based underwater image enhancement with systematic experimental comparison of algorithms on multiple datasets.
-
Mean Spectral Normalization of Deep Neural Networks for Embedded Automation
Proposes MSN reparameterization to address mean-drift in SN, claiming ~16% faster inference than BN with fewer parameters on CNNs and GANs.
-
Synthetic data in cryptocurrencies using generative models
CGANs with LSTM generator can produce synthetic crypto price series that reproduce temporal patterns and preserve market trends and dynamics.
-
Federated Breast Cancer Detection Enhanced by Synthetic Ultrasound Image Augmentation
Balanced synthetic image augmentation via GANs and diffusion models raises average AUC from 0.9206 to 0.9362 for FedAvg and 0.9429 to 0.9574 for FedProx in federated breast ultrasound classification.