Server constructs and streams evolving 3D Gaussian Splatting models of rendered scenes to clients for local viewpoint rendering and better multi-user amortization than video streaming.
Artigrasp: Physically plausible synthesis of bi-manual dexterous grasping and articulation
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 6roles
background 2polarities
background 2representative citing papers
KITScenes Multimodal presents a new multimodal autonomous driving dataset with complete HD maps and four benchmarks for spatial learning tasks including online map construction and end-to-end driving.
AFUN predicts task-conditional functional masks and 3D post-contact motion curves from RGB-D and language, trained via a standardized multi-source data pipeline, and reports large gains over baselines on segmentation, contact prediction, and motion tasks.
Vector Scaffolding uses Interior Gradient Aggregation, Progressive Stratification, and Rapid Inflation Scheduling to achieve 2.5x faster optimization and up to 1.4 dB higher PSNR in differentiable image vectorization.
ViTacFormer learns a cross-modal visuo-tactile latent space with autoregressive tactile prediction and an easy-to-hard curriculum, then uses the representation for imitation learning that yields ~50% higher success and the first reported 11-stage, 2.5-minute autonomous dexterous tasks.
VFM-SDM enables accurate multi-directional structural displacement measurement from video using pre-trained vision models for camera estimation and point tracking, combined with geometry constraints, without task-specific training or preparation.
citing papers explorer
-
Streaming Real-Time Rendered Scenes as 3D Gaussians
Server constructs and streams evolving 3D Gaussian Splatting models of rendered scenes to clients for local viewpoint rendering and better multi-user amortization than video streaming.
-
The Road Ahead in Autonomous Driving: The KITScenes Multimodal Dataset
KITScenes Multimodal presents a new multimodal autonomous driving dataset with complete HD maps and four benchmarks for spatial learning tasks including online map construction and end-to-end driving.
-
AFUN: Towards an Affordance Foundation Model for Functionality Understanding
AFUN predicts task-conditional functional masks and 3D post-contact motion curves from RGB-D and language, trained via a standardized multi-source data pipeline, and reports large gains over baselines on segmentation, contact prediction, and motion tasks.
-
Vector Scaffolding: Inter-Scale Orchestration for Differentiable Image Vectorization
Vector Scaffolding uses Interior Gradient Aggregation, Progressive Stratification, and Rapid Inflation Scheduling to achieve 2.5x faster optimization and up to 1.4 dB higher PSNR in differentiable image vectorization.
-
ViTacFormer: Learning Cross-Modal Representation for Visuo-Tactile Dexterous Manipulation
ViTacFormer learns a cross-modal visuo-tactile latent space with autoregressive tactile prediction and an easy-to-hard curriculum, then uses the representation for imitation learning that yields ~50% higher success and the first reported 11-stage, 2.5-minute autonomous dexterous tasks.
-
VFM-SDM: A vision foundation model-based framework for training-free, marker-free, and calibration-free structural displacement measurement
VFM-SDM enables accurate multi-directional structural displacement measurement from video using pre-trained vision models for camera estimation and point tracking, combined with geometry constraints, without task-specific training or preparation.