hub

InProceedings of the IEEE conference on computer vision and pattern recognition

The unreasonable effectiveness of deep features as a perceptual metric

20 Pith papers cite this work. Polarity classification is still indexing.

20 Pith papers citing it

browse 20 citing papers

hub tools

JSON dossier citing papers JSON

citation-role summary

baseline 1

citation-polarity summary

baseline 1

representative citing papers

A Cross-Modal Prompt Injection Attack against Large Vision-Language Models with Image-Only Perturbation

cs.CR · 2026-05-15 · unverdicted · novelty 7.0

CrossMPI steers both visual and textual interpretations in LVLMs through image-only perturbations by optimizing in hidden-state space at selected middle layers with distance-based budget allocation.

What Concepts Lie Within? Detecting and Suppressing Risky Content in Diffusion Transformers

cs.CV · 2026-05-11 · unverdicted · novelty 7.0

A method using attention head vectors detects and suppresses risky content generation in Diffusion Transformers at inference time.

Ground4D: Spatially-Grounded Feedforward 4D Reconstruction for Unstructured Off-Road Scenes

cs.CV · 2026-05-06 · unverdicted · novelty 7.0

Ground4D resolves temporal conflicts in feedforward 4D Gaussian reconstruction for off-road scenes via voxel-grounded temporal aggregation with intra-voxel softmax and surface normal regularization, outperforming prior methods on ORAD-3D and RELLIS-3D while generalizing zero-shot.

IAD-Unify: A Region-Grounded Unified Model for Industrial Anomaly Segmentation, Understanding, and Generation

cs.CV · 2026-04-14 · unverdicted · novelty 7.0

IAD-Unify unifies industrial anomaly segmentation, region-grounded language understanding, and mask-guided generation in one framework using DINOv2 token injection into Qwen3.5, supported by the new Anomaly-56K dataset of 59,916 images.

Not All Frames Deserve Full Computation: Accelerating Autoregressive Video Generation via Selective Computation and Predictive Extrapolation

cs.CV · 2026-04-03 · conditional · novelty 7.0

SCOPE accelerates autoregressive video diffusion up to 4.73x by using a tri-modal cache-predict-recompute scheduler with Taylor extrapolation and selective active-frame computation while preserving output quality.

When Surfaces Lie: Exploiting Wrinkle-Induced Attention Shift to Attack Vision-Language Models

cs.CV · 2026-03-29 · unverdicted · novelty 7.0

A wrinkle-field perturbation method creates photorealistic non-rigid image changes that degrade state-of-the-art VLMs on image captioning and VQA more effectively than prior baselines.

Uncertainty-Aware 4D Gaussian Splatting for Monocular Occluded Human Rendering

cs.CV · 2026-02-06 · unverdicted · novelty 7.0

U-4DGS reformulates occluded dynamic human rendering as MAP estimation under heteroscedastic noise, using a Probabilistic Deformation Network and uncertainty-modulated joint rasterization plus confidence-aware regularizations to deliver SOTA fidelity and robustness on ZJU-MoCap and OcMotion.

SandSim: Curve-Guided Gaussian Splatting for Reconstructing Sand Painting Processes

cs.GR · 2026-04-30 · unverdicted · novelty 6.0

SandSim reconstructs temporally coherent sand painting processes from single images using curve-guided Gaussian splatting, subtractive compositing for accumulation, and semantic-guided stroke planning.

EAD-Net: Emotion-Aware Talking Head Generation with Spatial Refinement and Temporal Coherence

cs.CV · 2026-04-25 · unverdicted · novelty 6.0

EAD-Net uses a diffusion model with new spatio-temporal attention, graph-based temporal reasoning, and LLM-derived semantic descriptions to generate emotionally expressive talking head videos with improved lip-sync and coherence over prior methods.

Rethinking Where to Edit: Task-Aware Localization for Instruction-Based Image Editing

cs.CV · 2026-04-22 · unverdicted · novelty 6.0

Task-aware localization via attention cues and feature centroids from source/target streams in IIE models improves non-edit consistency while preserving instruction following.

Cross-Modal Generation: From Commodity WiFi to High-Fidelity mmWave and RFID Sensing

cs.LG · 2026-04-17 · unverdicted · novelty 6.0

RF-CMG synthesizes high-quality mmWave and RFID signals from WiFi using a diffusion model with Modality-Guided Embedding for high-frequency details and Low-Frequency Modality Consistency to preserve physical structure.

VersaVogue: Visual Expert Orchestration and Preference Alignment for Unified Fashion Synthesis

cs.CV · 2026-04-08 · unverdicted · novelty 6.0

VersaVogue unifies garment generation and virtual dressing via trait-routing attention with mixture-of-experts and an automated multi-perspective preference optimization pipeline that uses DPO without human labels.

Improving Random Testing via LLM-powered UI Tarpit Escaping for Mobile Apps

cs.SE · 2026-04-08 · conditional · novelty 6.0

LLM-powered monitoring of UI similarity allows random testing tools to escape tarpits, yielding 45-55% higher coverage and more unique bugs across 12 apps.

Rethinking Exposure Correction for Spatially Non-uniform Degradation

cs.CV · 2026-04-05 · unverdicted · novelty 6.0

Introduces spatially adaptive modulation with a signal encoder and uncertainty-inspired loss for correcting non-uniform exposure degradations in images.

TIQA: Human-Aligned Perceptual Text Quality Assessment in Generated Images

cs.CV · 2026-03-07 · unverdicted · novelty 6.0

TIQA introduces datasets and a model that predict human perceptual quality of rendered text in AI images, achieving PLCC 0.942 on crops and improving selected image text quality by 0.36 MOS.

Generator-Refiner-Examiner: A Tri-Module Data Augmentation Framework for 3D Human Avatar Learning from Monocular Videos

cs.CV · 2026-05-22 · unverdicted · novelty 5.0

TrioMan is a tri-module data augmentation framework using a Generator for pose/camera perturbations, a Refiner with one-step diffusion, and an Examiner with dual-branch attention to improve 3D avatar learning from monocular videos, claiming better results than prior methods on two benchmarks.

SAMIC: A Lightweight Semantic-Aware Mamba for Efficient Perceptual Image Compression

cs.CV · 2026-05-06 · unverdicted · novelty 5.0

SAMIC introduces semantic-aware Mamba blocks and SVD-based redundancy reduction to achieve efficient perceptual image compression with improved rate-distortion-perception tradeoffs.

Do Protective Perturbations Really Protect Portrait Privacy under Real-world Image Transformations?

cs.CV · 2026-04-26 · conditional · novelty 5.0

Pixel-level protective perturbations for portrait privacy are ineffective against common image transformations, and a low-cost purification framework can strip them out.

Discrete Preference Learning for Personalized Multimodal Generation

cs.IR · 2026-04-22 · unverdicted · novelty 5.0

DPPMG learns discrete modal-specific preferences via a dedicated GNN from multimodal user data, quantizes them into tokens, and feeds them into generators with a consistency reward to produce personalized text and images.

Ride the Wave: Precision-Allocated Sparse Attention for Smooth Video Generation

cs.CV · 2026-04-14 · unverdicted · novelty 5.0

PASA uses curvature-aware dynamic budgeting, grouped approximations, and stochastic attention routing to accelerate video diffusion transformers while eliminating temporal flickering from sparse patterns.

citing papers explorer

Showing 20 of 20 citing papers.

A Cross-Modal Prompt Injection Attack against Large Vision-Language Models with Image-Only Perturbation cs.CR · 2026-05-15 · unverdicted · none · ref 79
CrossMPI steers both visual and textual interpretations in LVLMs through image-only perturbations by optimizing in hidden-state space at selected middle layers with distance-based budget allocation.
What Concepts Lie Within? Detecting and Suppressing Risky Content in Diffusion Transformers cs.CV · 2026-05-11 · unverdicted · none · ref 51
A method using attention head vectors detects and suppresses risky content generation in Diffusion Transformers at inference time.
Ground4D: Spatially-Grounded Feedforward 4D Reconstruction for Unstructured Off-Road Scenes cs.CV · 2026-05-06 · unverdicted · none · ref 61
Ground4D resolves temporal conflicts in feedforward 4D Gaussian reconstruction for off-road scenes via voxel-grounded temporal aggregation with intra-voxel softmax and surface normal regularization, outperforming prior methods on ORAD-3D and RELLIS-3D while generalizing zero-shot.
IAD-Unify: A Region-Grounded Unified Model for Industrial Anomaly Segmentation, Understanding, and Generation cs.CV · 2026-04-14 · unverdicted · none · ref 41
IAD-Unify unifies industrial anomaly segmentation, region-grounded language understanding, and mask-guided generation in one framework using DINOv2 token injection into Qwen3.5, supported by the new Anomaly-56K dataset of 59,916 images.
Not All Frames Deserve Full Computation: Accelerating Autoregressive Video Generation via Selective Computation and Predictive Extrapolation cs.CV · 2026-04-03 · conditional · none · ref 61
SCOPE accelerates autoregressive video diffusion up to 4.73x by using a tri-modal cache-predict-recompute scheduler with Taylor extrapolation and selective active-frame computation while preserving output quality.
When Surfaces Lie: Exploiting Wrinkle-Induced Attention Shift to Attack Vision-Language Models cs.CV · 2026-03-29 · unverdicted · none · ref 40
A wrinkle-field perturbation method creates photorealistic non-rigid image changes that degrade state-of-the-art VLMs on image captioning and VQA more effectively than prior baselines.
Uncertainty-Aware 4D Gaussian Splatting for Monocular Occluded Human Rendering cs.CV · 2026-02-06 · unverdicted · none · ref 53
U-4DGS reformulates occluded dynamic human rendering as MAP estimation under heteroscedastic noise, using a Probabilistic Deformation Network and uncertainty-modulated joint rasterization plus confidence-aware regularizations to deliver SOTA fidelity and robustness on ZJU-MoCap and OcMotion.
SandSim: Curve-Guided Gaussian Splatting for Reconstructing Sand Painting Processes cs.GR · 2026-04-30 · unverdicted · none · ref 60
SandSim reconstructs temporally coherent sand painting processes from single images using curve-guided Gaussian splatting, subtractive compositing for accumulation, and semantic-guided stroke planning.
EAD-Net: Emotion-Aware Talking Head Generation with Spatial Refinement and Temporal Coherence cs.CV · 2026-04-25 · unverdicted · none · ref 54
EAD-Net uses a diffusion model with new spatio-temporal attention, graph-based temporal reasoning, and LLM-derived semantic descriptions to generate emotionally expressive talking head videos with improved lip-sync and coherence over prior methods.
Rethinking Where to Edit: Task-Aware Localization for Instruction-Based Image Editing cs.CV · 2026-04-22 · unverdicted · none · ref 43
Task-aware localization via attention cues and feature centroids from source/target streams in IIE models improves non-edit consistency while preserving instruction following.
Cross-Modal Generation: From Commodity WiFi to High-Fidelity mmWave and RFID Sensing cs.LG · 2026-04-17 · unverdicted · none · ref 62
RF-CMG synthesizes high-quality mmWave and RFID signals from WiFi using a diffusion model with Modality-Guided Embedding for high-frequency details and Low-Frequency Modality Consistency to preserve physical structure.
VersaVogue: Visual Expert Orchestration and Preference Alignment for Unified Fashion Synthesis cs.CV · 2026-04-08 · unverdicted · none · ref 53
VersaVogue unifies garment generation and virtual dressing via trait-routing attention with mixture-of-experts and an automated multi-perspective preference optimization pipeline that uses DPO without human labels.
Improving Random Testing via LLM-powered UI Tarpit Escaping for Mobile Apps cs.SE · 2026-04-08 · conditional · none · ref 84
LLM-powered monitoring of UI similarity allows random testing tools to escape tarpits, yielding 45-55% higher coverage and more unique bugs across 12 apps.
Rethinking Exposure Correction for Spatially Non-uniform Degradation cs.CV · 2026-04-05 · unverdicted · none · ref 52
Introduces spatially adaptive modulation with a signal encoder and uncertainty-inspired loss for correcting non-uniform exposure degradations in images.
TIQA: Human-Aligned Perceptual Text Quality Assessment in Generated Images cs.CV · 2026-03-07 · unverdicted · none · ref 68
TIQA introduces datasets and a model that predict human perceptual quality of rendered text in AI images, achieving PLCC 0.942 on crops and improving selected image text quality by 0.36 MOS.
Generator-Refiner-Examiner: A Tri-Module Data Augmentation Framework for 3D Human Avatar Learning from Monocular Videos cs.CV · 2026-05-22 · unverdicted · none · ref 85
TrioMan is a tri-module data augmentation framework using a Generator for pose/camera perturbations, a Refiner with one-step diffusion, and an Examiner with dual-branch attention to improve 3D avatar learning from monocular videos, claiming better results than prior methods on two benchmarks.
SAMIC: A Lightweight Semantic-Aware Mamba for Efficient Perceptual Image Compression cs.CV · 2026-05-06 · unverdicted · none · ref 47
SAMIC introduces semantic-aware Mamba blocks and SVD-based redundancy reduction to achieve efficient perceptual image compression with improved rate-distortion-perception tradeoffs.
Do Protective Perturbations Really Protect Portrait Privacy under Real-world Image Transformations? cs.CV · 2026-04-26 · conditional · none · ref 47
Pixel-level protective perturbations for portrait privacy are ineffective against common image transformations, and a low-cost purification framework can strip them out.
Discrete Preference Learning for Personalized Multimodal Generation cs.IR · 2026-04-22 · unverdicted · none · ref 65
DPPMG learns discrete modal-specific preferences via a dedicated GNN from multimodal user data, quantizes them into tokens, and feeds them into generators with a consistency reward to produce personalized text and images.
Ride the Wave: Precision-Allocated Sparse Attention for Smooth Video Generation cs.CV · 2026-04-14 · unverdicted · none · ref 32
PASA uses curvature-aware dynamic budgeting, grouped approximations, and stochastic attention routing to accelerate video diffusion transformers while eliminating temporal flickering from sparse patterns.

InProceedings of the IEEE conference on computer vision and pattern recognition

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer