AID amortizes guidance for diffusion inpainting by training a reusable module via an auxiliary Gaussian formulation and continuous-time actor-critic algorithm, improving quality-speed trade-off with under 1% overhead.
hub
Imagenet: A large-scale hierarchical image database
17 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
roles
dataset 3polarities
use dataset 3representative citing papers
DRAPE generates query-image conditioned prompts on the fly for multimodal continual instruction tuning and reports SOTA results on MCIT benchmarks.
LPNSR derives optimal intermediate noise for diffusion SR via MLE and implements it with an LR-guided noise predictor, reaching SOTA perceptual quality in 4 steps without text priors.
IAFS is a training-free iterative inference-time scaling framework that uses adaptive frequency-aware particle fusion to resolve the perception-fidelity conflict in diffusion super-resolution models, outperforming prior scaling strategies.
sREPA enforces structural consistency in relational geometry of pre-trained vision features to accelerate DiT training and improve generation quality.
DarkLLM trains an LLM to generate language-driven adversarial perturbations that unify targeted, untargeted, segmentation, and multi-model attacks on foundation models.
InsightTok improves text and face fidelity in discrete image tokenization via content-aware perceptual losses, with gains transferring to autoregressive generation.
Intermediate layer embedding sensitivity to perturbations distinguishes AI-generated images from real ones, yielding higher AUROC on GenImage and Forensics Small benchmarks than prior methods.
Intern-Atlas constructs a methodological evolution graph with 9.4 million edges from 1.03 million AI papers to capture how methods emerge, adapt, and transition, enabling better idea evaluation and generation for AI-driven research.
HAMSA achieves 85.7% ImageNet-1K top-1 accuracy as a spectral-domain SSM with 2.2x faster inference and lower memory than transformers or scanning-based SSMs.
ExpertGen generates high-success expert policies in simulation from imperfect priors by freezing a diffusion behavior model and optimizing its initial noise via RL, then distills them for real-robot deployment.
Fixed-Point Distillation constructs one-step correction targets for discrete diffusion generators via partial corruption and single teacher refinement, lifted into continuous features with a multi-bandwidth drift loss and straight-through estimation.
MONET is an open 104.9M image-text pair dataset created via safety filtering, deduplication, and multi-VLM recaptioning from 2.9B raw pairs, validated by training a competitive 4B-parameter latent diffusion model.
A 12-step single-block recurrent ViT-B reaches accuracy comparable to a standard ViT-B on ImageNet-1K while using an order of magnitude fewer parameters.
SSMamba uses a two-stage self-supervised pretraining and fine-tuning pipeline with Mamba-based components to outperform prior pathological foundation models on ROI and WSI classification tasks.
Proposes Lipschitz regularization during fine-tuning to prevent distributional drift in personalized diffusion models, improving subject fidelity and prompt adherence.
citing papers explorer
-
Intern-Atlas: A Methodological Evolution Graph as Research Infrastructure for AI Scientists
Intern-Atlas constructs a methodological evolution graph with 9.4 million edges from 1.03 million AI papers to capture how methods emerge, adapt, and transition, enabling better idea evaluation and generation for AI-driven research.