AID amortizes guidance for diffusion inpainting by training a reusable module via an auxiliary Gaussian formulation and continuous-time actor-critic algorithm, improving quality-speed trade-off with under 1% overhead.
hub
Imagenet: A large-scale hierarchical image database
17 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
roles
dataset 3polarities
use dataset 3representative citing papers
DRAPE generates query-image conditioned prompts on the fly for multimodal continual instruction tuning and reports SOTA results on MCIT benchmarks.
LPNSR derives optimal intermediate noise for diffusion SR via MLE and implements it with an LR-guided noise predictor, reaching SOTA perceptual quality in 4 steps without text priors.
IAFS is a training-free iterative inference-time scaling framework that uses adaptive frequency-aware particle fusion to resolve the perception-fidelity conflict in diffusion super-resolution models, outperforming prior scaling strategies.
sREPA enforces structural consistency in relational geometry of pre-trained vision features to accelerate DiT training and improve generation quality.
DarkLLM trains an LLM to generate language-driven adversarial perturbations that unify targeted, untargeted, segmentation, and multi-model attacks on foundation models.
InsightTok improves text and face fidelity in discrete image tokenization via content-aware perceptual losses, with gains transferring to autoregressive generation.
Intermediate layer embedding sensitivity to perturbations distinguishes AI-generated images from real ones, yielding higher AUROC on GenImage and Forensics Small benchmarks than prior methods.
Intern-Atlas constructs a methodological evolution graph with 9.4 million edges from 1.03 million AI papers to capture how methods emerge, adapt, and transition, enabling better idea evaluation and generation for AI-driven research.
HAMSA achieves 85.7% ImageNet-1K top-1 accuracy as a spectral-domain SSM with 2.2x faster inference and lower memory than transformers or scanning-based SSMs.
ExpertGen generates high-success expert policies in simulation from imperfect priors by freezing a diffusion behavior model and optimizing its initial noise via RL, then distills them for real-robot deployment.
Fixed-Point Distillation constructs one-step correction targets for discrete diffusion generators via partial corruption and single teacher refinement, lifted into continuous features with a multi-bandwidth drift loss and straight-through estimation.
MONET is an open 104.9M image-text pair dataset created via safety filtering, deduplication, and multi-VLM recaptioning from 2.9B raw pairs, validated by training a competitive 4B-parameter latent diffusion model.
A 12-step single-block recurrent ViT-B reaches accuracy comparable to a standard ViT-B on ImageNet-1K while using an order of magnitude fewer parameters.
SSMamba uses a two-stage self-supervised pretraining and fine-tuning pipeline with Mamba-based components to outperform prior pathological foundation models on ROI and WSI classification tasks.
Proposes Lipschitz regularization during fine-tuning to prevent distributional drift in personalized diffusion models, improving subject fidelity and prompt adherence.
citing papers explorer
-
Amortized Guidance for Image Inpainting with Pretrained Diffusion Models
AID amortizes guidance for diffusion inpainting by training a reusable module via an auxiliary Gaussian formulation and continuous-time actor-critic algorithm, improving quality-speed trade-off with under 1% overhead.
-
Dynamic Cross-Modal Prompt Generation for Multimodal Continual Instruction Tuning
DRAPE generates query-image conditioned prompts on the fly for multimodal continual instruction tuning and reports SOTA results on MCIT benchmarks.
-
LPNSR: Optimal Noise-Guided Diffusion Image Super-Resolution Via Learnable Noise Prediction
LPNSR derives optimal intermediate noise for diffusion SR via MLE and implements it with an LR-guided noise predictor, reaching SOTA perceptual quality in 4 steps without text priors.
-
Iterative Inference-time Scaling with Adaptive Frequency Steering for Image Super-Resolution
IAFS is a training-free iterative inference-time scaling framework that uses adaptive frequency-aware particle fusion to resolve the perception-fidelity conflict in diffusion super-resolution models, outperforming prior scaling strategies.
-
Beyond Point-Wise Matching: Structural Representation Alignment for Accelerating Diffusion Transformers
sREPA enforces structural consistency in relational geometry of pre-trained vision features to accelerate DiT training and improve generation quality.
-
DarkLLM: Learning Language-Driven Adversarial Attacks with Large Language Models
DarkLLM trains an LLM to generate language-driven adversarial perturbations that unify targeted, untargeted, segmentation, and multi-model attacks on foundation models.
-
InsightTok: Improving Text and Face Fidelity in Discrete Tokenization for Autoregressive Image Generation
InsightTok improves text and face fidelity in discrete image tokenization via content-aware perceptual losses, with gains transferring to autoregressive generation.
-
Intermediate Representations are Strong AI-Generated Image Detectors
Intermediate layer embedding sensitivity to perturbations distinguishes AI-generated images from real ones, yielding higher AUROC on GenImage and Forensics Small benchmarks than prior methods.
-
Intern-Atlas: A Methodological Evolution Graph as Research Infrastructure for AI Scientists
Intern-Atlas constructs a methodological evolution graph with 9.4 million edges from 1.03 million AI papers to capture how methods emerge, adapt, and transition, enabling better idea evaluation and generation for AI-driven research.
-
HAMSA: Scanning-Free Vision State Space Models via SpectralPulseNet
HAMSA achieves 85.7% ImageNet-1K top-1 accuracy as a spectral-domain SSM with 2.2x faster inference and lower memory than transformers or scanning-based SSMs.
-
ExpertGen: Scalable Sim-to-Real Expert Policy Learning from Imperfect Behavior Priors
ExpertGen generates high-success expert policies in simulation from imperfect priors by freezing a diffusion behavior model and optimizing its initial noise via RL, then distills them for real-robot deployment.
-
One-Step Distillation of Discrete Diffusion Image Generators via Fixed-Point Iteration
Fixed-Point Distillation constructs one-step correction targets for discrete diffusion generators via partial corruption and single teacher refinement, lifted into continuous features with a multi-bandwidth drift loss and straight-through estimation.
-
MONET: A Massive, Open, Non-redundant and Enriched Text-to-image dataset
MONET is an open 104.9M image-text pair dataset created via safety filtering, deduplication, and multi-VLM recaptioning from 2.9B raw pairs, validated by training a competitive 4B-parameter latent diffusion model.
-
bViT: Investigating Single-Block Recurrence in Vision Transformers for Image Recognition
A 12-step single-block recurrent ViT-B reaches accuracy comparable to a standard ViT-B on ImageNet-1K while using an order of magnitude fewer parameters.
-
SSMamba: A Self-Supervised Hybrid State Space Model for Pathological Image Classification
SSMamba uses a two-stage self-supervised pretraining and fine-tuning pipeline with Mamba-based components to outperform prior pathological foundation models on ROI and WSI classification tasks.
-
Preserve and Personalize: Personalized Text-to-Image Diffusion Models without Distributional Drift
Proposes Lipschitz regularization during fine-tuning to prevent distributional drift in personalized diffusion models, improving subject fidelity and prompt adherence.
- From Per-Image Low-Rank to Encoding Mismatch: Rethinking Feature Distillation in Vision Transformers