hub

Sam 3d body: Robust full-body hu- man mesh recovery

Xitong Yang, Devansh Kukreja, Don Pinkus, Anushka Sagar, Taosha Fan, Jinhyung Park, Soyong Shin, Jinkun Cao, Jiawei Liu, Nicolas Ugrinovic, Matt Feiszli, Jitendra Malik, Piotr Dollar, Kris Kitani · 2026 · arXiv 2602.15989

14 Pith papers cite this work. Polarity classification is still indexing.

14 Pith papers citing it

read on arXiv browse 14 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

method 2 background 1

citation-polarity summary

use method 2 background 1

representative citing papers

H-Flow: Self-supervised Human Scene Flow via Physics-inspired Joint Multi-modal Learning

cs.CV · 2026-05-21 · unverdicted · novelty 7.0

H-Flow learns dense human scene flow from monocular video via joint pose and depth prediction in a multi-head transformer, using physics-inspired geometric and biomechanical priors for self-supervision, and introduces the DynAct4D synthetic benchmark.

LEXIS: LatEnt ProXimal Interaction Signatures for 3D HOI from an Image

cs.CV · 2026-04-22 · unverdicted · novelty 7.0

LEXIS-Flow uses VQ-VAE-learned interaction signatures to guide diffusion-based reconstruction of 3D human-object meshes and dense proximity fields from single RGB images, outperforming SOTA on benchmarks.

DanceCrafter: Fine-Grained Text-Driven Controllable Dance Generation via Choreographic Syntax

cs.CV · 2026-04-20 · unverdicted · novelty 7.0

DanceCrafter generates high-fidelity, text-controlled dance sequences using a new Choreographic Syntax framework and a large fine-grained motion dataset.

Can Vision Language Models Judge Action Quality? An Empirical Evaluation

cs.CV · 2026-04-09 · conditional · novelty 7.0

Vision-language models perform only marginally above random on action quality assessment and retain systematic biases even after targeted prompting and contrastive reformulation.

SUGAR: A Scalable Human-Video-Driven Generalizable Humanoid Loco-Manipulation Learning Framework

cs.RO · 2026-05-19 · unverdicted · novelty 6.0

SUGAR turns diverse human videos into deployable humanoid loco-manipulation policies via automated prior extraction, physics refinement, and hierarchical distillation, showing scaling with data volume and zero-shot real-world transfer on six tasks.

EgoExo-WM: Unlocking Exo Video for Ego World Models

cs.CV · 2026-05-14 · unverdicted · novelty 6.0

Converting exocentric video to egocentric format via body-pose extraction and kinematics prior enables training of action-conditioned egocentric world models that improve prediction quality and goal-directed planning.

Real2Sim in HOI: Toward Physically Plausible HOI Reconstruction from Monocular Videos

cs.CV · 2026-05-14 · unverdicted · novelty 6.0

HA-HOI produces physically plausible 4D HOI animations from monocular videos by anchoring object reconstruction to human motion and refining the result in a physics-based humanoid-object simulator.

SAMe: A Semantic Anatomy Mapping Engine for Robotic Ultrasound

cs.CV · 2026-04-28 · unverdicted · novelty 6.0 · 2 refs

SAMe grounds complaints to organs, builds a lightweight patient anatomy model from one body image, and outputs probe initialization poses, outperforming keypoint baselines in real-robot liver and kidney trials.

CoInteract: Physically-Consistent Human-Object Interaction Video Synthesis via Spatially-Structured Co-Generation

cs.CV · 2026-04-21 · unverdicted · novelty 6.0

CoInteract adds a human-aware mixture-of-experts and spatially-structured co-generation to a diffusion transformer to synthesize videos with stable structures and physically plausible human-object contacts.

Pi-HOC: Pairwise 3D Human-Object Contact Estimation

cs.CV · 2026-04-14 · unverdicted · novelty 6.0 · 2 refs

Pi-HOC predicts dense 3D semantic contacts for all human-object pairs in an image via instance-aware tokens and an InteractionFormer, achieving higher accuracy and 20x throughput than prior methods.

RoSHI: A Versatile Robot-oriented Suit for Human Data In-the-Wild

cs.RO · 2026-04-08 · unverdicted · novelty 6.0

RoSHI is a hybrid wearable that combines sparse IMUs and egocentric SLAM to capture accurate full-body 3D pose and shape data in natural environments for robot learning.

One Sentence, One Drama: Personalized Short-Form Drama Generation via Multi-Agent Systems

cs.CV · 2026-05-21 · unverdicted · novelty 5.0

A hierarchical multi-agent framework converts a single sentence into a short drama using debate-based scripting, 3D-grounded first frames for spatial consistency, and multi-stage reviewer loops.

Asset Harvester: Extracting 3D Assets from Autonomous Driving Logs for Simulation

cs.CV · 2026-04-20 · unverdicted · novelty 5.0

Asset Harvester converts sparse in-the-wild object observations from AV driving logs into complete simulation-ready 3D assets via data curation, geometry-aware preprocessing, and a SparseViewDiT model that couples sparse-view multiview generation with 3D Gaussian lifting.

Better Rigs, Not Bigger Networks: A Body Model Ablation for Gaussian Avatars

cs.CV · 2026-04-01 · unverdicted · novelty 5.0

A minimal Gaussian splatting avatar pipeline using the Momentum Human Rig achieves the highest reported PSNR on PeopleSnapshot and ZJU-MoCap without learned deformations.

citing papers explorer

Showing 14 of 14 citing papers.

H-Flow: Self-supervised Human Scene Flow via Physics-inspired Joint Multi-modal Learning cs.CV · 2026-05-21 · unverdicted · none · ref 8
H-Flow learns dense human scene flow from monocular video via joint pose and depth prediction in a multi-head transformer, using physics-inspired geometric and biomechanical priors for self-supervision, and introduces the DynAct4D synthetic benchmark.
LEXIS: LatEnt ProXimal Interaction Signatures for 3D HOI from an Image cs.CV · 2026-04-22 · unverdicted · none · ref 87
LEXIS-Flow uses VQ-VAE-learned interaction signatures to guide diffusion-based reconstruction of 3D human-object meshes and dense proximity fields from single RGB images, outperforming SOTA on benchmarks.
DanceCrafter: Fine-Grained Text-Driven Controllable Dance Generation via Choreographic Syntax cs.CV · 2026-04-20 · unverdicted · none · ref 47
DanceCrafter generates high-fidelity, text-controlled dance sequences using a new Choreographic Syntax framework and a large fine-grained motion dataset.
Can Vision Language Models Judge Action Quality? An Empirical Evaluation cs.CV · 2026-04-09 · conditional · none · ref 38
Vision-language models perform only marginally above random on action quality assessment and retain systematic biases even after targeted prompting and contrastive reformulation.
SUGAR: A Scalable Human-Video-Driven Generalizable Humanoid Loco-Manipulation Learning Framework cs.RO · 2026-05-19 · unverdicted · none · ref 46
SUGAR turns diverse human videos into deployable humanoid loco-manipulation policies via automated prior extraction, physics refinement, and hierarchical distillation, showing scaling with data volume and zero-shot real-world transfer on six tasks.
EgoExo-WM: Unlocking Exo Video for Ego World Models cs.CV · 2026-05-14 · unverdicted · none · ref 75
Converting exocentric video to egocentric format via body-pose extraction and kinematics prior enables training of action-conditioned egocentric world models that improve prediction quality and goal-directed planning.
Real2Sim in HOI: Toward Physically Plausible HOI Reconstruction from Monocular Videos cs.CV · 2026-05-14 · unverdicted · none · ref 11
HA-HOI produces physically plausible 4D HOI animations from monocular videos by anchoring object reconstruction to human motion and refining the result in a physics-based humanoid-object simulator.
SAMe: A Semantic Anatomy Mapping Engine for Robotic Ultrasound cs.CV · 2026-04-28 · unverdicted · none · ref 60 · 2 links
SAMe grounds complaints to organs, builds a lightweight patient anatomy model from one body image, and outputs probe initialization poses, outperforming keypoint baselines in real-robot liver and kidney trials.
CoInteract: Physically-Consistent Human-Object Interaction Video Synthesis via Spatially-Structured Co-Generation cs.CV · 2026-04-21 · unverdicted · none · ref 55
CoInteract adds a human-aware mixture-of-experts and spatially-structured co-generation to a diffusion transformer to synthesize videos with stable structures and physically plausible human-object contacts.
Pi-HOC: Pairwise 3D Human-Object Contact Estimation cs.CV · 2026-04-14 · unverdicted · none · ref 32 · 2 links
Pi-HOC predicts dense 3D semantic contacts for all human-object pairs in an image via instance-aware tokens and an InteractionFormer, achieving higher accuracy and 20x throughput than prior methods.
RoSHI: A Versatile Robot-oriented Suit for Human Data In-the-Wild cs.RO · 2026-04-08 · unverdicted · none · ref 23
RoSHI is a hybrid wearable that combines sparse IMUs and egocentric SLAM to capture accurate full-body 3D pose and shape data in natural environments for robot learning.
One Sentence, One Drama: Personalized Short-Form Drama Generation via Multi-Agent Systems cs.CV · 2026-05-21 · unverdicted · none · ref 49
A hierarchical multi-agent framework converts a single sentence into a short drama using debate-based scripting, 3D-grounded first frames for spatial consistency, and multi-stage reviewer loops.
Asset Harvester: Extracting 3D Assets from Autonomous Driving Logs for Simulation cs.CV · 2026-04-20 · unverdicted · none · ref 27
Asset Harvester converts sparse in-the-wild object observations from AV driving logs into complete simulation-ready 3D assets via data curation, geometry-aware preprocessing, and a SparseViewDiT model that couples sparse-view multiview generation with 3D Gaussian lifting.
Better Rigs, Not Bigger Networks: A Body Model Ablation for Gaussian Avatars cs.CV · 2026-04-01 · unverdicted · none · ref 21
A minimal Gaussian splatting avatar pipeline using the Momentum Human Rig achieves the highest reported PSNR on PeopleSnapshot and ZJU-MoCap without learned deformations.

Sam 3d body: Robust full-body hu- man mesh recovery

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer