Camera-Relative Object Coordinates (CROCS) as an intermediate geometry representation in two-stage image-to-3D models yields superior novel-view quality, geometric accuracy, and multiview consistency over depth maps, visual features, and other pointmap alternatives.
Improved denoising diffusion probabilistic models
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
method 1polarities
use method 1representative citing papers
DiffClean applies text-guided diffusion to erase makeup from faces, boosting age estimation and verification accuracy over makeup-affected images.
BadRDM is a backdoor attack on retrieval-augmented diffusion models that poisons the retrieval database with toxicity surrogates and uses multimodal contrastive learning to force toxic generations from text triggers while preserving benign performance.
CogACT is a new VLA model that uses a conditioned diffusion action transformer to achieve over 35% higher average success rates than OpenVLA in simulation and 55% in real-robot experiments while generalizing to new robots and objects.
citing papers explorer
-
How to Spin an Object: First, Get the Shape Right
Camera-Relative Object Coordinates (CROCS) as an intermediate geometry representation in two-stage image-to-3D models yields superior novel-view quality, geometric accuracy, and multiview consistency over depth maps, visual features, and other pointmap alternatives.
-
DiffClean: Diffusion-based Makeup Removal for Accurate Age Estimation
DiffClean applies text-guided diffusion to erase makeup from faces, boosting age estimation and verification accuracy over makeup-affected images.
-
Retrievals Can Be Detrimental: Unveiling the Backdoor Vulnerability of Retrieval-Augmented Diffusion Models
BadRDM is a backdoor attack on retrieval-augmented diffusion models that poisons the retrieval database with toxicity surrogates and uses multimodal contrastive learning to force toxic generations from text triggers while preserving benign performance.
-
CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation
CogACT is a new VLA model that uses a conditioned diffusion action transformer to achieve over 35% higher average success rates than OpenVLA in simulation and 55% in real-robot experiments while generalizing to new robots and objects.