hub

The unreasonable effectiveness of deep features as a perceptual metric

Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, Oliver Wang · 2018

26 Pith papers cite this work. Polarity classification is still indexing.

26 Pith papers citing it

browse 26 citing papers

hub tools

JSON dossier citing papers JSON

citation-role summary

method 3 background 1

citation-polarity summary

use method 3 background 1

representative citing papers

GazePrior: Zero-Shot AR/VR Eye Tracking via Learned 3D Gaze Reconstruction

cs.CV · 2026-05-21 · unverdicted · novelty 7.0

GazePrior learns a 3D prior over eyes to synthesize realistic ground-truth data for training eye trackers on new devices without new real data collection.

3DEditSafe: Defending 3D Editing Pipelines from Unsafe Generation

cs.GR · 2026-05-14 · unverdicted · novelty 7.0

3DEditSafe adds generation-stage guidance, 3D safety regularization, semantic projection, residue suppression, and mask-aware preservation to reduce unsafe semantic alignment in 3D editing while noting a safety-quality tradeoff.

HASTE: Training-Free Video Diffusion Acceleration via Head-Wise Adaptive Sparse Attention

cs.CV · 2026-05-14 · unverdicted · novelty 7.0

HASTE delivers up to 1.93x speedup on Wan2.1 video DiTs via head-wise adaptive sparse attention using temporal mask reuse and error-guided per-head calibration while preserving video quality.

Amortized Guidance for Image Inpainting with Pretrained Diffusion Models

cs.CV · 2026-05-13 · unverdicted · novelty 7.0

AID amortizes guidance for diffusion inpainting by training a reusable module via an auxiliary Gaussian formulation and continuous-time actor-critic algorithm, improving quality-speed trade-off with under 1% overhead.

LPNSR: Optimal Noise-Guided Diffusion Image Super-Resolution Via Learnable Noise Prediction

cs.CV · 2026-03-22 · conditional · novelty 7.0

LPNSR derives optimal intermediate noise for diffusion SR via MLE and implements it with an LR-guided noise predictor, reaching SOTA perceptual quality in 4 steps without text priors.

Training Agents Inside of Scalable World Models

cs.AI · 2025-09-29 · conditional · novelty 7.0

Dreamer 4 is the first agent to obtain diamonds in Minecraft from only offline data by reinforcement learning inside a scalable world model that accurately predicts game mechanics.

SCOPE: Simulating Cross-game Operations in Playable Environments for FPS World Models

cs.CV · 2026-05-22 · unverdicted · novelty 6.0

SCOPE adds per-pixel action conditioning to pretrained video diffusion models and releases the CrossFPS multi-game dataset to support cross-game FPS world model simulation with zero-shot transfer.

Spatial Gram Alignment for Ultra-High-Resolution Image Synthesis

cs.CV · 2026-05-20 · unverdicted · novelty 6.0

Spatial Gram Alignment aligns internal self-similarities of LDM features with foundation priors to reconcile global structure and fine details in ultra-high-resolution text-to-image synthesis.

Accelerating Video Inverse Problem Solvers with Autoregressive Diffusion Models

cs.CV · 2026-05-20 · unverdicted · novelty 6.0

AVIS applies autoregressive diffusion models to video inverse problems by streaming restoration with measurement-consistent initialization, reducing latency from 114s to 4s and raising throughput to 1.18 FPS (or 5.91 FPS in the Flash variant).

Improved Baselines with Representation Autoencoders

cs.CV · 2026-05-18 · conditional · novelty 6.0

RAE v2 reaches gFID 1.06 on ImageNet-256 in 80 epochs by combining multi-layer encoder sums, complementary REPA targets, and free guidance via output reparameterization.

InsightTok: Improving Text and Face Fidelity in Discrete Tokenization for Autoregressive Image Generation

cs.CV · 2026-05-14 · conditional · novelty 6.0

InsightTok improves text and face fidelity in discrete image tokenization via content-aware perceptual losses, with gains transferring to autoregressive generation.

FREPix: Frequency-Heterogeneous Flow Matching for Pixel-Space Image Generation

cs.CV · 2026-05-07 · unverdicted · novelty 6.0

FREPix achieves competitive FID scores on ImageNet by decomposing image generation into separate low- and high-frequency paths within a flow matching framework.

Motion-Aware Caching for Efficient Autoregressive Video Generation

cs.CV · 2026-05-03 · conditional · novelty 6.0 · 2 refs

MotionCache accelerates autoregressive video generation up to 6.28x by motion-weighted cache reuse based on inter-frame differences, with negligible quality loss on SkyReels-V2 and MAGI-1.

Depth-Guided Privacy-Preserving Visual Localization Using 3D Sphere Clouds

cs.CV · 2026-05-01 · unverdicted · novelty 6.0

Sphere clouds neutralize density attacks on private 3D maps for visual localization while depth guidance from ToF sensors restores translation scale for accurate pose estimation.

End-to-End Autoregressive Image Generation with 1D Semantic Tokenizer

cs.CV · 2026-05-01 · unverdicted · novelty 6.0

An end-to-end autoregressive model with a jointly trained 1D semantic tokenizer achieves state-of-the-art FID 1.48 on ImageNet 256x256 generation without guidance.

PEPS: Positional Encoding Projected Sampling -- Extended

cs.CV · 2026-04-27 · unverdicted · novelty 6.0

PEPS decomposes positional encodings into projected points with unique frequency-dependent motions to support more efficient learned grid-based encodings in INRs, outperforming prior methods on image, texture, and SDF tasks with often 25% fewer parameters.

UniT: Toward a Unified Physical Language for Human-to-Humanoid Policy Learning and World Modeling

cs.RO · 2026-04-21 · unverdicted · novelty 6.0

UniT creates a unified physical language via visual anchoring and tri-branch reconstruction to enable scalable human-to-humanoid transfer for policy learning and world modeling.

Self-Supervised Super-Resolution for Sentinel-5P Hyperspectral Images

cs.CV · 2026-04-19 · conditional · novelty 6.0

A self-supervised framework using SURE and equivariant constraints produces super-resolved Sentinel-5P images comparable to supervised baselines without HR references and with physically plausible structures validated against EMIT data.

Scientific Graphics Program Synthesis via Dual Self-Consistency Reinforcement Learning

cs.CV · 2026-04-07 · unverdicted · novelty 6.0

SciTikZer-8B uses a new dataset, benchmark, and dual self-consistency RL to generate TikZ code for scientific graphics, outperforming much larger models like Gemini-2.5-Pro.

RefTon: Reference person shot assist virtual Try-on

cs.CV · 2025-11-02 · unverdicted · novelty 6.0

RefTon is a flux-based virtual try-on method that uses unpaired reference images of the target garment on different people to guide texture and detail preservation in a streamlined person-to-person pipeline without body parsing or masks.

Edit-GRPO: A Locality-Preserving Policy Optimization Framework for Image Editing

cs.CV · 2026-05-16 · unverdicted · novelty 5.0

Edit-GRPO decouples editing and preservation objectives via region-specific signals in a policy optimization framework to improve locality in image editing tasks.

SANA-WM: Efficient Minute-Scale World Modeling with Hybrid Linear Diffusion Transformer

cs.CV · 2026-05-14 · unverdicted · novelty 5.0

SANA-WM is a 2.6B-parameter efficient world model that synthesizes minute-scale 720p videos with 6-DoF camera control, trained on 213K public clips in 15 days on 64 H100s and runnable on single GPUs at 36x higher throughput than prior open baselines.

ML-CLIPSim: Multi-Layer CLIP Similarity for Machine-Oriented Image Quality

eess.IV · 2026-05-10 · unverdicted · novelty 5.0

ML-CLIPSim aggregates multi-layer patch and global similarities from frozen CLIP to approximate machine utility for images and outperforms standard IQA metrics on machine-preference tasks while staying competitive on human data.

Video Generation with Predictive Latents

cs.CV · 2026-05-04 · unverdicted · novelty 5.0

PV-VAE improves video latent spaces for generation by unifying reconstruction with future-frame prediction, reporting 52% faster convergence and 34.42 FVD gain over Wan2.2 VAE on UCF101.

citing papers explorer

Showing 26 of 26 citing papers.

GazePrior: Zero-Shot AR/VR Eye Tracking via Learned 3D Gaze Reconstruction cs.CV · 2026-05-21 · unverdicted · none · ref 93
GazePrior learns a 3D prior over eyes to synthesize realistic ground-truth data for training eye trackers on new devices without new real data collection.
3DEditSafe: Defending 3D Editing Pipelines from Unsafe Generation cs.GR · 2026-05-14 · unverdicted · none · ref 39
3DEditSafe adds generation-stage guidance, 3D safety regularization, semantic projection, residue suppression, and mask-aware preservation to reduce unsafe semantic alignment in 3D editing while noting a safety-quality tradeoff.
HASTE: Training-Free Video Diffusion Acceleration via Head-Wise Adaptive Sparse Attention cs.CV · 2026-05-14 · unverdicted · none · ref 49
HASTE delivers up to 1.93x speedup on Wan2.1 video DiTs via head-wise adaptive sparse attention using temporal mask reuse and error-guided per-head calibration while preserving video quality.
Amortized Guidance for Image Inpainting with Pretrained Diffusion Models cs.CV · 2026-05-13 · unverdicted · none · ref 40
AID amortizes guidance for diffusion inpainting by training a reusable module via an auxiliary Gaussian formulation and continuous-time actor-critic algorithm, improving quality-speed trade-off with under 1% overhead.
LPNSR: Optimal Noise-Guided Diffusion Image Super-Resolution Via Learnable Noise Prediction cs.CV · 2026-03-22 · conditional · none · ref 49
LPNSR derives optimal intermediate noise for diffusion SR via MLE and implements it with an LR-guided noise predictor, reaching SOTA perceptual quality in 4 steps without text priors.
Training Agents Inside of Scalable World Models cs.AI · 2025-09-29 · conditional · none · ref 22
Dreamer 4 is the first agent to obtain diamonds in Minecraft from only offline data by reinforcement learning inside a scalable world model that accurately predicts game mechanics.
SCOPE: Simulating Cross-game Operations in Playable Environments for FPS World Models cs.CV · 2026-05-22 · unverdicted · none · ref 67
SCOPE adds per-pixel action conditioning to pretrained video diffusion models and releases the CrossFPS multi-game dataset to support cross-game FPS world model simulation with zero-shot transfer.
Spatial Gram Alignment for Ultra-High-Resolution Image Synthesis cs.CV · 2026-05-20 · unverdicted · none · ref 41
Spatial Gram Alignment aligns internal self-similarities of LDM features with foundation priors to reconcile global structure and fine details in ultra-high-resolution text-to-image synthesis.
Accelerating Video Inverse Problem Solvers with Autoregressive Diffusion Models cs.CV · 2026-05-20 · unverdicted · none · ref 53
AVIS applies autoregressive diffusion models to video inverse problems by streaming restoration with measurement-consistent initialization, reducing latency from 114s to 4s and raising throughput to 1.18 FPS (or 5.91 FPS in the Flash variant).
Improved Baselines with Representation Autoencoders cs.CV · 2026-05-18 · conditional · none · ref 66
RAE v2 reaches gFID 1.06 on ImageNet-256 in 80 epochs by combining multi-layer encoder sums, complementary REPA targets, and free guidance via output reparameterization.
InsightTok: Improving Text and Face Fidelity in Discrete Tokenization for Autoregressive Image Generation cs.CV · 2026-05-14 · conditional · none · ref 56
InsightTok improves text and face fidelity in discrete image tokenization via content-aware perceptual losses, with gains transferring to autoregressive generation.
FREPix: Frequency-Heterogeneous Flow Matching for Pixel-Space Image Generation cs.CV · 2026-05-07 · unverdicted · none · ref 34
FREPix achieves competitive FID scores on ImageNet by decomposing image generation into separate low- and high-frequency paths within a flow matching framework.
Motion-Aware Caching for Efficient Autoregressive Video Generation cs.CV · 2026-05-03 · conditional · none · ref 44 · 2 links
MotionCache accelerates autoregressive video generation up to 6.28x by motion-weighted cache reuse based on inter-frame differences, with negligible quality loss on SkyReels-V2 and MAGI-1.
Depth-Guided Privacy-Preserving Visual Localization Using 3D Sphere Clouds cs.CV · 2026-05-01 · unverdicted · none · ref 42
Sphere clouds neutralize density attacks on private 3D maps for visual localization while depth guidance from ToF sensors restores translation scale for accurate pose estimation.
End-to-End Autoregressive Image Generation with 1D Semantic Tokenizer cs.CV · 2026-05-01 · unverdicted · none · ref 51
An end-to-end autoregressive model with a jointly trained 1D semantic tokenizer achieves state-of-the-art FID 1.48 on ImageNet 256x256 generation without guidance.
PEPS: Positional Encoding Projected Sampling -- Extended cs.CV · 2026-04-27 · unverdicted · none · ref 46
PEPS decomposes positional encodings into projected points with unique frequency-dependent motions to support more efficient learned grid-based encodings in INRs, outperforming prior methods on image, texture, and SDF tasks with often 25% fewer parameters.
UniT: Toward a Unified Physical Language for Human-to-Humanoid Policy Learning and World Modeling cs.RO · 2026-04-21 · unverdicted · none · ref 43
UniT creates a unified physical language via visual anchoring and tri-branch reconstruction to enable scalable human-to-humanoid transfer for policy learning and world modeling.
Self-Supervised Super-Resolution for Sentinel-5P Hyperspectral Images cs.CV · 2026-04-19 · conditional · none · ref 67
A self-supervised framework using SURE and equivariant constraints produces super-resolved Sentinel-5P images comparable to supervised baselines without HR references and with physically plausible structures validated against EMIT data.
Scientific Graphics Program Synthesis via Dual Self-Consistency Reinforcement Learning cs.CV · 2026-04-07 · unverdicted · none · ref 54
SciTikZer-8B uses a new dataset, benchmark, and dual self-consistency RL to generate TikZ code for scientific graphics, outperforming much larger models like Gemini-2.5-Pro.
RefTon: Reference person shot assist virtual Try-on cs.CV · 2025-11-02 · unverdicted · none · ref 56
RefTon is a flux-based virtual try-on method that uses unpaired reference images of the target garment on different people to guide texture and detail preservation in a streamlined person-to-person pipeline without body parsing or masks.
Edit-GRPO: A Locality-Preserving Policy Optimization Framework for Image Editing cs.CV · 2026-05-16 · unverdicted · none · ref 47
Edit-GRPO decouples editing and preservation objectives via region-specific signals in a policy optimization framework to improve locality in image editing tasks.
SANA-WM: Efficient Minute-Scale World Modeling with Hybrid Linear Diffusion Transformer cs.CV · 2026-05-14 · unverdicted · none · ref 97
SANA-WM is a 2.6B-parameter efficient world model that synthesizes minute-scale 720p videos with 6-DoF camera control, trained on 213K public clips in 15 days on 64 H100s and runnable on single GPUs at 36x higher throughput than prior open baselines.
ML-CLIPSim: Multi-Layer CLIP Similarity for Machine-Oriented Image Quality eess.IV · 2026-05-10 · unverdicted · none · ref 29
ML-CLIPSim aggregates multi-layer patch and global similarities from frozen CLIP to approximate machine utility for images and outperforms standard IQA metrics on machine-preference tasks while staying competitive on human data.
Video Generation with Predictive Latents cs.CV · 2026-05-04 · unverdicted · none · ref 64
PV-VAE improves video latent spaces for generation by unifying reconstruction with future-frame prediction, reporting 52% faster convergence and 34.42 FVD gain over Wan2.2 VAE on UCF101.
Seedance 1.0: Exploring the Boundaries of Video Generation Models cs.CV · 2025-06-10 · unverdicted · none · ref 34
Seedance 1.0 generates 5-second 1080p videos in about 41 seconds with claimed superior motion quality, prompt adherence, and multi-shot consistency compared to prior models.
Beyond Visual Fidelity: Benchmarking Super-Resolution Models for Large-Scale Remote Sensing Imagery via Downstream Task Integration cs.CV · 2026-05-01 · unreviewed · ref 78

The unreasonable effectiveness of deep features as a perceptual metric

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer