hub Canonical reference

In: Proceedings of the Computer Vision and Pattern Recognition Conference

Wang, J · 2025

Canonical reference. 71% of citing Pith papers cite this work as background.

12 Pith papers citing it

Background 71% of classified citations

browse 12 citing papers

hub tools

JSON dossier citing papers JSON

citation-role summary

background 5 baseline 1 method 1

citation-polarity summary

background 5 baseline 1 use method 1

representative citing papers

UniGeo: Unifying Geometric Guidance for Camera-Controllable Image Editing via Video Models

cs.CV · 2026-04-19 · unverdicted · novelty 7.0 · 2 refs

UniGeo unifies geometric guidance across three levels in video models to reduce geometric drift and improve consistency in camera-controllable image editing.

Geometrically Consistent Multi-View Scene Generation from Freehand Sketches

cs.CV · 2026-04-15 · unverdicted · novelty 7.0

A framework generates consistent multi-view scenes from one freehand sketch via a ~9k-sample dataset, Parallel Camera-Aware Attention Adapters, and Sparse Correspondence Supervision Loss, outperforming baselines in realism and consistency.

Novel View Synthesis as Video Completion

cs.CV · 2026-04-09 · unverdicted · novelty 7.0

Video diffusion models can be adapted into permutation-invariant generators for sparse novel view synthesis by treating the problem as video completion and removing temporal order cues.

EndoVGGT: GNN-Enhanced Depth Estimation for Surgical 3D Reconstruction

cs.CV · 2026-03-25 · unverdicted · novelty 7.0

EndoVGGT uses a dynamic DeGAT graph attention module to improve depth estimation and non-rigid 3D reconstruction in surgery, reporting 24.6% PSNR and 9.1% SSIM gains on SCARED with zero-shot generalization to new domains.

3AM: 3egment Anything with Geometric Consistency in Videos

cs.CV · 2026-01-13 · unverdicted · novelty 7.0

3AM integrates MUSt3R 3D features into SAM2 via a Feature Merger and FOV-aware sampling to deliver geometry-consistent video object segmentation from RGB alone, with large gains on wide-baseline datasets.

Last-Layer-Centric Feature Recombination: Unleashing 3D Geometric Knowledge in DINOv3 for Monocular Depth Estimation

cs.CV · 2026-04-29 · unverdicted · novelty 6.0

Layer analysis of DINOv3 shows non-uniform 3D geometric knowledge concentrated in deeper layers, enabling a last-layer-centric recombination module that improves monocular depth estimation accuracy to state-of-the-art levels.

SS3D: End2End Self-Supervised 3D from Web Videos

cs.CV · 2026-04-24 · unverdicted · novelty 6.0 · 3 refs

SS3D pretrains an end-to-end feed-forward 3D estimator on filtered YouTube-8M videos via SfM self-supervision, MVS filtering, and expert distillation, delivering stronger zero-shot transfer and fine-tuning than prior self-supervised baselines.

LSRM: High-Fidelity Object-Centric Reconstruction via Scaled Context Windows

cs.CV · 2026-04-06 · conditional · novelty 6.0

LSRM scales transformer context windows with native sparse attention and geometric routing to deliver high-fidelity feed-forward 3D reconstruction and inverse rendering that approaches dense optimization quality.

Feeling the Space: Egomotion-Aware Video Representation for Efficient and Accurate 3D Scene Understanding

cs.CV · 2026-03-18 · unverdicted · novelty 6.0

Motion-MLLM integrates IMU egomotion data into MLLMs using cascaded filtering and asymmetric fusion to ground visual content in physical trajectories for scale-aware 3D understanding, achieving competitive accuracy at higher speed.

X-Imitator: Spatial-Aware Imitation Learning via Bidirectional Action-Pose Interaction

cs.RO · 2026-05-12 · unverdicted · novelty 5.0

X-Imitator is a bidirectional action-pose interaction framework for spatial-aware imitation learning that outperforms vanilla policies and explicit pose guidance on 24 simulated and 3 real-world robotic tasks.

Embody4D: A Generalist 4D World Model for Embodied AI

cs.CV · 2026-05-03 · unverdicted · novelty 5.0

Embody4D generates high-fidelity, view-consistent novel views from monocular videos for embodied scenarios via 3D-aware data synthesis, adaptive noise injection, and interaction-aware attention.

Pose-Aware Diffusion for 3D Generation

cs.CV · 2026-05-01 · unverdicted · novelty 5.0

PAD synthesizes 3D geometry in observation space via depth unprojection as anchor to eliminate pose ambiguity in image-to-3D generation.

citing papers explorer

Showing 12 of 12 citing papers.

UniGeo: Unifying Geometric Guidance for Camera-Controllable Image Editing via Video Models cs.CV · 2026-04-19 · unverdicted · none · ref 70 · 2 links
UniGeo unifies geometric guidance across three levels in video models to reduce geometric drift and improve consistency in camera-controllable image editing.
Geometrically Consistent Multi-View Scene Generation from Freehand Sketches cs.CV · 2026-04-15 · unverdicted · none · ref 47
A framework generates consistent multi-view scenes from one freehand sketch via a ~9k-sample dataset, Parallel Camera-Aware Attention Adapters, and Sparse Correspondence Supervision Loss, outperforming baselines in realism and consistency.
Novel View Synthesis as Video Completion cs.CV · 2026-04-09 · unverdicted · none · ref 40
Video diffusion models can be adapted into permutation-invariant generators for sparse novel view synthesis by treating the problem as video completion and removing temporal order cues.
EndoVGGT: GNN-Enhanced Depth Estimation for Surgical 3D Reconstruction cs.CV · 2026-03-25 · unverdicted · none · ref 24
EndoVGGT uses a dynamic DeGAT graph attention module to improve depth estimation and non-rigid 3D reconstruction in surgery, reporting 24.6% PSNR and 9.1% SSIM gains on SCARED with zero-shot generalization to new domains.
3AM: 3egment Anything with Geometric Consistency in Videos cs.CV · 2026-01-13 · unverdicted · none · ref 83
3AM integrates MUSt3R 3D features into SAM2 via a Feature Merger and FOV-aware sampling to deliver geometry-consistent video object segmentation from RGB alone, with large gains on wide-baseline datasets.
Last-Layer-Centric Feature Recombination: Unleashing 3D Geometric Knowledge in DINOv3 for Monocular Depth Estimation cs.CV · 2026-04-29 · unverdicted · none · ref 39
Layer analysis of DINOv3 shows non-uniform 3D geometric knowledge concentrated in deeper layers, enabling a last-layer-centric recombination module that improves monocular depth estimation accuracy to state-of-the-art levels.
SS3D: End2End Self-Supervised 3D from Web Videos cs.CV · 2026-04-24 · unverdicted · none · ref 53 · 3 links
SS3D pretrains an end-to-end feed-forward 3D estimator on filtered YouTube-8M videos via SfM self-supervision, MVS filtering, and expert distillation, delivering stronger zero-shot transfer and fine-tuning than prior self-supervised baselines.
LSRM: High-Fidelity Object-Centric Reconstruction via Scaled Context Windows cs.CV · 2026-04-06 · conditional · none · ref 61
LSRM scales transformer context windows with native sparse attention and geometric routing to deliver high-fidelity feed-forward 3D reconstruction and inverse rendering that approaches dense optimization quality.
Feeling the Space: Egomotion-Aware Video Representation for Efficient and Accurate 3D Scene Understanding cs.CV · 2026-03-18 · unverdicted · none · ref 53
Motion-MLLM integrates IMU egomotion data into MLLMs using cascaded filtering and asymmetric fusion to ground visual content in physical trajectories for scale-aware 3D understanding, achieving competitive accuracy at higher speed.
X-Imitator: Spatial-Aware Imitation Learning via Bidirectional Action-Pose Interaction cs.RO · 2026-05-12 · unverdicted · none · ref 60
X-Imitator is a bidirectional action-pose interaction framework for spatial-aware imitation learning that outperforms vanilla policies and explicit pose guidance on 24 simulated and 3 real-world robotic tasks.
Embody4D: A Generalist 4D World Model for Embodied AI cs.CV · 2026-05-03 · unverdicted · none · ref 50
Embody4D generates high-fidelity, view-consistent novel views from monocular videos for embodied scenarios via 3D-aware data synthesis, adaptive noise injection, and interaction-aware attention.
Pose-Aware Diffusion for 3D Generation cs.CV · 2026-05-01 · unverdicted · none · ref 45
PAD synthesizes 3D geometry in observation space via depth unprojection as anchor to eliminate pose ambiguity in image-to-3D generation.

In: Proceedings of the Computer Vision and Pattern Recognition Conference

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer