ViPS learns a universal, controllable pose space for auto-rigged meshes by transferring motion priors from video diffusion models, matching SOTA performance on plausibility and diversity while enabling zero-shot generalization.
Transactions on Machine Learning Research Journal pp
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 3years
2026 3verdicts
UNVERDICTED 3representative citing papers
ROAR-3D adds a token-wise view router and dual-stream attention to pretrained single-view 3D generators so they can use arbitrary unposed images for higher-fidelity output.
AdaVFM integrates neural architecture search into vision foundation model backbones and uses a cloud multimodal LLM agent to enable runtime-adaptive lightweight subnet execution, delivering up to 7.9% higher accuracy and 77.9% lower FLOPs than fixed-size baselines on edge devices.
citing papers explorer
-
ViPS: Video-informed Pose Spaces for Auto-Rigged Meshes
ViPS learns a universal, controllable pose space for auto-rigged meshes by transferring motion priors from video diffusion models, matching SOTA performance on plausibility and diversity while enabling zero-shot generalization.
-
ROAR-3D: Routing Arbitrary Views for High-Fidelity 3D Generation
ROAR-3D adds a token-wise view router and dual-stream attention to pretrained single-view 3D generators so they can use arbitrary unposed images for higher-fidelity output.
-
AdaVFM: Adaptive Vision Foundation Models for Edge Intelligence via LLM-Guided Execution
AdaVFM integrates neural architecture search into vision foundation model backbones and uses a cloud multimodal LLM agent to enable runtime-adaptive lightweight subnet execution, delivering up to 7.9% higher accuracy and 77.9% lower FLOPs than fixed-size baselines on edge devices.