hub

Imagenet: A large-scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, Li Fei-Fei · 2009

17 Pith papers cite this work. Polarity classification is still indexing.

17 Pith papers citing it

browse 17 citing papers

hub tools

JSON dossier citing papers JSON

citation-role summary

dataset 3

citation-polarity summary

use dataset 3

representative citing papers

Amortized Guidance for Image Inpainting with Pretrained Diffusion Models

cs.CV · 2026-05-13 · unverdicted · novelty 7.0

AID amortizes guidance for diffusion inpainting by training a reusable module via an auxiliary Gaussian formulation and continuous-time actor-critic algorithm, improving quality-speed trade-off with under 1% overhead.

Dynamic Cross-Modal Prompt Generation for Multimodal Continual Instruction Tuning

cs.CV · 2026-05-11 · unverdicted · novelty 7.0

DRAPE generates query-image conditioned prompts on the fly for multimodal continual instruction tuning and reports SOTA results on MCIT benchmarks.

LPNSR: Optimal Noise-Guided Diffusion Image Super-Resolution Via Learnable Noise Prediction

cs.CV · 2026-03-22 · conditional · novelty 7.0

LPNSR derives optimal intermediate noise for diffusion SR via MLE and implements it with an LR-guided noise predictor, reaching SOTA perceptual quality in 4 steps without text priors.

Iterative Inference-time Scaling with Adaptive Frequency Steering for Image Super-Resolution

cs.CV · 2025-12-29 · unverdicted · novelty 7.0

IAFS is a training-free iterative inference-time scaling framework that uses adaptive frequency-aware particle fusion to resolve the perception-fidelity conflict in diffusion super-resolution models, outperforming prior scaling strategies.

Beyond Point-Wise Matching: Structural Representation Alignment for Accelerating Diffusion Transformers

cs.CV · 2026-05-16 · unverdicted · novelty 6.0

sREPA enforces structural consistency in relational geometry of pre-trained vision features to accelerate DiT training and improve generation quality.

DarkLLM: Learning Language-Driven Adversarial Attacks with Large Language Models

cs.CR · 2026-05-15 · unverdicted · novelty 6.0

DarkLLM trains an LLM to generate language-driven adversarial perturbations that unify targeted, untargeted, segmentation, and multi-model attacks on foundation models.

InsightTok: Improving Text and Face Fidelity in Discrete Tokenization for Autoregressive Image Generation

cs.CV · 2026-05-14 · conditional · novelty 6.0

InsightTok improves text and face fidelity in discrete image tokenization via content-aware perceptual losses, with gains transferring to autoregressive generation.

Intermediate Representations are Strong AI-Generated Image Detectors

cs.CV · 2026-05-05 · unverdicted · novelty 6.0

Intermediate layer embedding sensitivity to perturbations distinguishes AI-generated images from real ones, yielding higher AUROC on GenImage and Forensics Small benchmarks than prior methods.

Intern-Atlas: A Methodological Evolution Graph as Research Infrastructure for AI Scientists

cs.AI · 2026-04-30 · unverdicted · novelty 6.0

Intern-Atlas constructs a methodological evolution graph with 9.4 million edges from 1.03 million AI papers to capture how methods emerge, adapt, and transition, enabling better idea evaluation and generation for AI-driven research.

HAMSA: Scanning-Free Vision State Space Models via SpectralPulseNet

cs.CV · 2026-04-16 · unverdicted · novelty 6.0

HAMSA achieves 85.7% ImageNet-1K top-1 accuracy as a spectral-domain SSM with 2.2x faster inference and lower memory than transformers or scanning-based SSMs.

ExpertGen: Scalable Sim-to-Real Expert Policy Learning from Imperfect Behavior Priors

cs.RO · 2026-03-16 · conditional · novelty 6.0

ExpertGen generates high-success expert policies in simulation from imperfect priors by freezing a diffusion behavior model and optimizing its initial noise via RL, then distills them for real-robot deployment.

One-Step Distillation of Discrete Diffusion Image Generators via Fixed-Point Iteration

cs.CV · 2026-05-20 · unverdicted · novelty 5.0

Fixed-Point Distillation constructs one-step correction targets for discrete diffusion generators via partial corruption and single teacher refinement, lifted into continuous features with a multi-bandwidth drift loss and straight-through estimation.

MONET: A Massive, Open, Non-redundant and Enriched Text-to-image dataset

cs.CV · 2026-05-20 · unverdicted · novelty 5.0

MONET is an open 104.9M image-text pair dataset created via safety filtering, deduplication, and multi-VLM recaptioning from 2.9B raw pairs, validated by training a competitive 4B-parameter latent diffusion model.

bViT: Investigating Single-Block Recurrence in Vision Transformers for Image Recognition

cs.CV · 2026-05-11 · unverdicted · novelty 5.0

A 12-step single-block recurrent ViT-B reaches accuracy comparable to a standard ViT-B on ImageNet-1K while using an order of magnitude fewer parameters.

SSMamba: A Self-Supervised Hybrid State Space Model for Pathological Image Classification

cs.CV · 2026-04-17 · unverdicted · novelty 5.0

SSMamba uses a two-stage self-supervised pretraining and fine-tuning pipeline with Mamba-based components to outperform prior pathological foundation models on ROI and WSI classification tasks.

Preserve and Personalize: Personalized Text-to-Image Diffusion Models without Distributional Drift

cs.CV · 2025-05-26 · unverdicted · novelty 5.0

Proposes Lipschitz regularization during fine-tuning to prevent distributional drift in personalized diffusion models, improving subject fidelity and prompt adherence.

From Per-Image Low-Rank to Encoding Mismatch: Rethinking Feature Distillation in Vision Transformers

cs.CV · 2025-11-19

citing papers explorer

Showing 17 of 17 citing papers.

Amortized Guidance for Image Inpainting with Pretrained Diffusion Models cs.CV · 2026-05-13 · unverdicted · none · ref 5
AID amortizes guidance for diffusion inpainting by training a reusable module via an auxiliary Gaussian formulation and continuous-time actor-critic algorithm, improving quality-speed trade-off with under 1% overhead.
Dynamic Cross-Modal Prompt Generation for Multimodal Continual Instruction Tuning cs.CV · 2026-05-11 · unverdicted · none · ref 5
DRAPE generates query-image conditioned prompts on the fly for multimodal continual instruction tuning and reports SOTA results on MCIT benchmarks.
LPNSR: Optimal Noise-Guided Diffusion Image Super-Resolution Via Learnable Noise Prediction cs.CV · 2026-03-22 · conditional · none · ref 56
LPNSR derives optimal intermediate noise for diffusion SR via MLE and implements it with an LR-guided noise predictor, reaching SOTA perceptual quality in 4 steps without text priors.
Iterative Inference-time Scaling with Adaptive Frequency Steering for Image Super-Resolution cs.CV · 2025-12-29 · unverdicted · none · ref 7
IAFS is a training-free iterative inference-time scaling framework that uses adaptive frequency-aware particle fusion to resolve the perception-fidelity conflict in diffusion super-resolution models, outperforming prior scaling strategies.
Beyond Point-Wise Matching: Structural Representation Alignment for Accelerating Diffusion Transformers cs.CV · 2026-05-16 · unverdicted · none · ref 4
sREPA enforces structural consistency in relational geometry of pre-trained vision features to accelerate DiT training and improve generation quality.
DarkLLM: Learning Language-Driven Adversarial Attacks with Large Language Models cs.CR · 2026-05-15 · unverdicted · none · ref 11
DarkLLM trains an LLM to generate language-driven adversarial perturbations that unify targeted, untargeted, segmentation, and multi-model attacks on foundation models.
InsightTok: Improving Text and Face Fidelity in Discrete Tokenization for Autoregressive Image Generation cs.CV · 2026-05-14 · conditional · none · ref 10
InsightTok improves text and face fidelity in discrete image tokenization via content-aware perceptual losses, with gains transferring to autoregressive generation.
Intermediate Representations are Strong AI-Generated Image Detectors cs.CV · 2026-05-05 · unverdicted · none · ref 11
Intermediate layer embedding sensitivity to perturbations distinguishes AI-generated images from real ones, yielding higher AUROC on GenImage and Forensics Small benchmarks than prior methods.
Intern-Atlas: A Methodological Evolution Graph as Research Infrastructure for AI Scientists cs.AI · 2026-04-30 · unverdicted · none · ref 9
Intern-Atlas constructs a methodological evolution graph with 9.4 million edges from 1.03 million AI papers to capture how methods emerge, adapt, and transition, enabling better idea evaluation and generation for AI-driven research.
HAMSA: Scanning-Free Vision State Space Models via SpectralPulseNet cs.CV · 2026-04-16 · unverdicted · none · ref 5
HAMSA achieves 85.7% ImageNet-1K top-1 accuracy as a spectral-domain SSM with 2.2x faster inference and lower memory than transformers or scanning-based SSMs.
ExpertGen: Scalable Sim-to-Real Expert Policy Learning from Imperfect Behavior Priors cs.RO · 2026-03-16 · conditional · none · ref 1
ExpertGen generates high-success expert policies in simulation from imperfect priors by freezing a diffusion behavior model and optimizing its initial noise via RL, then distills them for real-robot deployment.
One-Step Distillation of Discrete Diffusion Image Generators via Fixed-Point Iteration cs.CV · 2026-05-20 · unverdicted · none · ref 10
Fixed-Point Distillation constructs one-step correction targets for discrete diffusion generators via partial corruption and single teacher refinement, lifted into continuous features with a multi-bandwidth drift loss and straight-through estimation.
MONET: A Massive, Open, Non-redundant and Enriched Text-to-image dataset cs.CV · 2026-05-20 · unverdicted · none · ref 16
MONET is an open 104.9M image-text pair dataset created via safety filtering, deduplication, and multi-VLM recaptioning from 2.9B raw pairs, validated by training a competitive 4B-parameter latent diffusion model.
bViT: Investigating Single-Block Recurrence in Vision Transformers for Image Recognition cs.CV · 2026-05-11 · unverdicted · none · ref 6
A 12-step single-block recurrent ViT-B reaches accuracy comparable to a standard ViT-B on ImageNet-1K while using an order of magnitude fewer parameters.
SSMamba: A Self-Supervised Hybrid State Space Model for Pathological Image Classification cs.CV · 2026-04-17 · unverdicted · none · ref 8
SSMamba uses a two-stage self-supervised pretraining and fine-tuning pipeline with Mamba-based components to outperform prior pathological foundation models on ROI and WSI classification tasks.
Preserve and Personalize: Personalized Text-to-Image Diffusion Models without Distributional Drift cs.CV · 2025-05-26 · unverdicted · none · ref 58
Proposes Lipschitz regularization during fine-tuning to prevent distributional drift in personalized diffusion models, improving subject fidelity and prompt adherence.
From Per-Image Low-Rank to Encoding Mismatch: Rethinking Feature Distillation in Vision Transformers cs.CV · 2025-11-19 · unreviewed · ref 44

Imagenet: A large-scale hierarchical image database

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer