hub Tool reference

The unrea- sonable effectiveness of deep features as a perceptual metric

Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, Oliver Wang · 2018

Tool reference. 80% of classified Pith citations use this work as a method, library, or software dependency, not as a substantive claim.

16 Pith papers citing it

Method reference 80% of classified citations

browse 16 citing papers

hub tools

JSON dossier citing papers JSON

citation-role summary

method 8 background 2

citation-polarity summary

use method 8 background 2

representative citing papers

Beyond the Last Layer: Multi-Layer Representation Fusion for Visual Tokenization

cs.CV · 2026-05-11 · unverdicted · novelty 7.0 · 2 refs

DRoRAE adaptively fuses multi-layer features from vision encoders via energy-constrained routing to enrich visual tokens, cutting rFID from 0.57 to 0.29 and generation FID from 1.74 to 1.65 on ImageNet-256 while revealing a log-linear scaling law with fusion capacity.

Discrete Langevin-Inspired Posterior Sampling

cs.LG · 2026-05-10 · unverdicted · novelty 7.0

ΔLPS is a gradient-guided discrete posterior sampler for inverse problems that works with masked or uniform discrete diffusion priors and outperforms prior discrete methods on image restoration tasks.

What Cohort INRs Encode and Where to Freeze Them

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

Optimal INR freeze depth matches highest weight stable rank layer; SAEs reveal SIREN atoms are localized while FFMLP atoms trace cohort contours with causal impact on PSNR.

LENS: Low-Frequency Eigen Noise Shaping for Efficient Diffusion Sampling

cs.CV · 2026-05-08 · unverdicted · novelty 7.0

LENS shapes low-frequency eigen noise with a lightweight network to enable efficient, high-quality sampling in distilled diffusion models.

Hierarchical Variational Policies for Reward-Guided Diffusion

cs.LG · 2026-05-20 · conditional · novelty 6.0

A hierarchical variational formulation amortizes test-time guidance in diffusion models to achieve strong quality-speed tradeoffs with significantly reduced inference compute.

DiRotQ: Rotation-Aware Quantization for 4-bit Diffusion Transformers

cs.CV · 2026-05-16 · unverdicted · novelty 6.0

DiRotQ uses PCA-based rotation-aware activation quantization combined with GPTQ to achieve better FID and PSNR in 4-bit diffusion transformers than prior methods like SVDQuant.

TransmissiveGS: Residual-Guided Disentangled Gaussian Splatting for Transmissive Scene Reconstruction and Rendering

cs.CV · 2026-05-11 · unverdicted · novelty 6.0

TransmissiveGS disentangles reflections from transmissions in Gaussian Splatting via dual-Gaussian modeling, residual cues, and a reflection light field for improved transmissive scene reconstruction.

WorldReasonBench: Human-Aligned Stress Testing of Video Generators as Future World-State Predictors

cs.CV · 2026-05-11 · unverdicted · novelty 6.0

The paper presents WorldReasonBench, a benchmark that tests video generators on maintaining physical, social, logical, and informational consistency when predicting future states from initial conditions and actions.

SocialDirector: Training-Free Social Interaction Control for Multi-Person Video Generation

cs.CV · 2026-05-11 · unverdicted · novelty 6.0

SocialDirector uses spatiotemporal actor masking and directional reweighting on cross-attention maps to reduce actor-action mismatches and improve target-directed interactions in generated multi-person videos.

What Matters for Diffusion-Friendly Latent Manifold? Prior-Aligned Autoencoders for Latent Diffusion

cs.CV · 2026-05-08 · unverdicted · novelty 6.0

Prior-Aligned AutoEncoders shape latent manifolds with spatial coherence, local continuity, and global semantics to improve latent diffusion, achieving SOTA gFID 1.03 on ImageNet 256x256 with up to 13x faster convergence.

Toward Better Geometric Representations for Molecule Generative Models

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

LENSEs improves representation-conditioned molecule generation by jointly training a multi-level representation head, perceptual loss, and REPA alignment on pretrained encoders, yielding 97.28% validity and 98.51% stability on GEOM-DRUG.

UniISP: A Unified ISP Framework for Both Human and Machine Vision

cs.CV · 2026-05-08 · unverdicted · novelty 6.0

UniISP unifies ISP processing with a Hybrid Attention Module and Feature Adapter to produce images that are both visually pleasing for humans and informative for computer vision models.

Slot-MLLM: Object-Centric Visual Tokenization for Multimodal LLM

cs.CV · 2025-05-23 · unverdicted · novelty 6.0

Slot-MLLM introduces a slot-attention-based object-centric visual tokenizer with Q-Former encoder, diffusion decoder, and residual vector quantization for improved local visual comprehension and generation in multimodal LLMs.

MMGS: 10$\times$ Compressed 3DGS through Optimal Transport Aggregation based on Multi-view Ranking

cs.CV · 2026-05-19 · unverdicted · novelty 5.0

MMGS compresses 3DGS models to 10% primitives with preserved rendering quality and 10x training speedup by combining multi-view geometric ranking with global OT aggregation and densification.

Sword: Style-Robust World Models as Simulators via Dynamic Latent Bootstrapping for VLA Policy Post-Training

cs.CV · 2026-05-08 · unverdicted · novelty 5.0

Sword improves world model simulators for VLA policies by disentangling visual style from dynamics and bootstrapping latents for better consistency, outperforming baselines on LIBERO in generalization and RL post-training success.

VRAG: Learning World Models for Interactive Video Generation

cs.CV · 2025-05-28

citing papers explorer

Showing 16 of 16 citing papers.

Beyond the Last Layer: Multi-Layer Representation Fusion for Visual Tokenization cs.CV · 2026-05-11 · unverdicted · none · ref 31 · 2 links
DRoRAE adaptively fuses multi-layer features from vision encoders via energy-constrained routing to enrich visual tokens, cutting rFID from 0.57 to 0.29 and generation FID from 1.74 to 1.65 on ImageNet-256 while revealing a log-linear scaling law with fusion capacity.
Discrete Langevin-Inspired Posterior Sampling cs.LG · 2026-05-10 · unverdicted · none · ref 48
ΔLPS is a gradient-guided discrete posterior sampler for inverse problems that works with masked or uniform discrete diffusion priors and outperforms prior discrete methods on image restoration tasks.
What Cohort INRs Encode and Where to Freeze Them cs.LG · 2026-05-08 · unverdicted · none · ref 72
Optimal INR freeze depth matches highest weight stable rank layer; SAEs reveal SIREN atoms are localized while FFMLP atoms trace cohort contours with causal impact on PSNR.
LENS: Low-Frequency Eigen Noise Shaping for Efficient Diffusion Sampling cs.CV · 2026-05-08 · unverdicted · none · ref 48
LENS shapes low-frequency eigen noise with a lightweight network to enable efficient, high-quality sampling in distilled diffusion models.
Hierarchical Variational Policies for Reward-Guided Diffusion cs.LG · 2026-05-20 · conditional · none · ref 56
A hierarchical variational formulation amortizes test-time guidance in diffusion models to achieve strong quality-speed tradeoffs with significantly reduced inference compute.
DiRotQ: Rotation-Aware Quantization for 4-bit Diffusion Transformers cs.CV · 2026-05-16 · unverdicted · none · ref 84
DiRotQ uses PCA-based rotation-aware activation quantization combined with GPTQ to achieve better FID and PSNR in 4-bit diffusion transformers than prior methods like SVDQuant.
TransmissiveGS: Residual-Guided Disentangled Gaussian Splatting for Transmissive Scene Reconstruction and Rendering cs.CV · 2026-05-11 · unverdicted · none · ref 44
TransmissiveGS disentangles reflections from transmissions in Gaussian Splatting via dual-Gaussian modeling, residual cues, and a reflection light field for improved transmissive scene reconstruction.
WorldReasonBench: Human-Aligned Stress Testing of Video Generators as Future World-State Predictors cs.CV · 2026-05-11 · unverdicted · none · ref 29
The paper presents WorldReasonBench, a benchmark that tests video generators on maintaining physical, social, logical, and informational consistency when predicting future states from initial conditions and actions.
SocialDirector: Training-Free Social Interaction Control for Multi-Person Video Generation cs.CV · 2026-05-11 · unverdicted · none · ref 78
SocialDirector uses spatiotemporal actor masking and directional reweighting on cross-attention maps to reduce actor-action mismatches and improve target-directed interactions in generated multi-person videos.
What Matters for Diffusion-Friendly Latent Manifold? Prior-Aligned Autoencoders for Latent Diffusion cs.CV · 2026-05-08 · unverdicted · none · ref 104
Prior-Aligned AutoEncoders shape latent manifolds with spatial coherence, local continuity, and global semantics to improve latent diffusion, achieving SOTA gFID 1.03 on ImageNet 256x256 with up to 13x faster convergence.
Toward Better Geometric Representations for Molecule Generative Models cs.LG · 2026-05-08 · unverdicted · none · ref 45
LENSEs improves representation-conditioned molecule generation by jointly training a multi-level representation head, perceptual loss, and REPA alignment on pretrained encoders, yielding 97.28% validity and 98.51% stability on GEOM-DRUG.
UniISP: A Unified ISP Framework for Both Human and Machine Vision cs.CV · 2026-05-08 · unverdicted · none · ref 55
UniISP unifies ISP processing with a Hybrid Attention Module and Feature Adapter to produce images that are both visually pleasing for humans and informative for computer vision models.
Slot-MLLM: Object-Centric Visual Tokenization for Multimodal LLM cs.CV · 2025-05-23 · unverdicted · none · ref 82
Slot-MLLM introduces a slot-attention-based object-centric visual tokenizer with Q-Former encoder, diffusion decoder, and residual vector quantization for improved local visual comprehension and generation in multimodal LLMs.
MMGS: 10$\times$ Compressed 3DGS through Optimal Transport Aggregation based on Multi-view Ranking cs.CV · 2026-05-19 · unverdicted · none · ref 39
MMGS compresses 3DGS models to 10% primitives with preserved rendering quality and 10x training speedup by combining multi-view geometric ranking with global OT aggregation and densification.
Sword: Style-Robust World Models as Simulators via Dynamic Latent Bootstrapping for VLA Policy Post-Training cs.CV · 2026-05-08 · unverdicted · none · ref 35
Sword improves world model simulators for VLA policies by disentangling visual style from dynamics and bootstrapping latents for better consistency, outperforming baselines on LIBERO in generalization and RL post-training success.
VRAG: Learning World Models for Interactive Video Generation cs.CV · 2025-05-28 · unreviewed · ref 64

The unrea- sonable effectiveness of deep features as a perceptual metric

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer