hub

Sam 3d body: Robust full-body human mesh recovery

Yang, X · 2026 · arXiv 2602.15989

20 Pith papers cite this work. Polarity classification is still indexing.

20 Pith papers citing it

read on arXiv browse 20 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

method 2 background 1

citation-polarity summary

use method 2 background 1

representative citing papers

Scene and Human in One World: Reconstruction in a Feedforward Pass

cs.CV · 2026-06-26 · unverdicted · novelty 7.0

SHOW is a mask-promptable framework coupling feed-forward scene reconstruction with human mesh recovery in a unified metric space to resolve scale ambiguity and improve human-scene alignment from monocular video.

H-Flow: Self-supervised Human Scene Flow via Physics-inspired Joint Multi-modal Learning

cs.CV · 2026-05-21 · unverdicted · novelty 7.0

H-Flow learns dense human scene flow from monocular video via joint pose and depth prediction in a multi-head transformer, using physics-inspired geometric and biomechanical priors for self-supervision, and introduces the DynAct4D synthetic benchmark.

LEXIS: LatEnt ProXimal Interaction Signatures for 3D HOI from an Image

cs.CV · 2026-04-22 · unverdicted · novelty 7.0

LEXIS-Flow uses VQ-VAE-learned interaction signatures to guide diffusion-based reconstruction of 3D human-object meshes and dense proximity fields from single RGB images, outperforming SOTA on benchmarks.

DanceCrafter: Fine-Grained Text-Driven Controllable Dance Generation via Choreographic Syntax

cs.CV · 2026-04-20 · unverdicted · novelty 7.0

DanceCrafter generates high-fidelity, text-controlled dance sequences using a new Choreographic Syntax framework and a large fine-grained motion dataset.

Can Vision Language Models Judge Action Quality? An Empirical Evaluation

cs.CV · 2026-04-09 · conditional · novelty 7.0

Vision-language models perform only marginally above random on action quality assessment and retain systematic biases even after targeted prompting and contrastive reformulation.

PRIMA: Boosting Animal Mesh Recovery with Biological Priors and Test-Time Adaptation

cs.CV · 2026-06-01 · unverdicted · novelty 6.0

PRIMA boosts 3D quadruped mesh recovery by injecting BioCLIP biological priors and using test-time adaptation with 2D constraints to build the Quadruped3D pseudo-3D dataset and reach SOTA on imbalanced animal benchmarks.

Mesh-Aware Epipolar Matching for Multi-View Multi-Person 3D Pose Estimation in Basketball

cs.CV · 2026-05-28 · unverdicted · novelty 6.0

MAEM is a training-free framework that combines monocular 3D mesh recovery with a two-stage epipolar matching strategy using disjoint-set-union clustering and per-joint triangulation for multi-view multi-person 3D pose estimation in basketball.

SUGAR: A Scalable Human-Video-Driven Generalizable Humanoid Loco-Manipulation Learning Framework

cs.RO · 2026-05-19 · unverdicted · novelty 6.0

SUGAR turns diverse human videos into deployable humanoid loco-manipulation policies via automated prior extraction, physics refinement, and hierarchical distillation, showing scaling with data volume and zero-shot real-world transfer on six tasks.

EgoExo-WM: Unlocking Exo Video for Ego World Models

cs.CV · 2026-05-14 · unverdicted · novelty 6.0 · 2 refs

Method converts exocentric videos to egocentric format via body-pose extraction and kinematics to improve egocentric world-model prediction and planning.

Real2Sim in HOI: Toward Physically Plausible HOI Reconstruction from Monocular Videos

cs.CV · 2026-05-14 · unverdicted · novelty 6.0

HA-HOI produces physically plausible 4D HOI animations from monocular videos by anchoring object reconstruction to human motion and refining the result in a physics-based humanoid-object simulator.

SAMe: A Semantic Anatomy Mapping Engine for Robotic Ultrasound

cs.CV · 2026-04-28 · unverdicted · novelty 6.0 · 2 refs

SAMe grounds complaints to organs, builds a lightweight patient anatomy model from one body image, and outputs probe initialization poses, outperforming keypoint baselines in real-robot liver and kidney trials.

CoInteract: Physically-Consistent Human-Object Interaction Video Synthesis via Spatially-Structured Co-Generation

cs.CV · 2026-04-21 · unverdicted · novelty 6.0

CoInteract adds a human-aware mixture-of-experts and spatially-structured co-generation to a diffusion transformer to synthesize videos with stable structures and physically plausible human-object contacts.

Pi-HOC: Pairwise 3D Human-Object Contact Estimation

cs.CV · 2026-04-14 · unverdicted · novelty 6.0 · 2 refs

Pi-HOC predicts dense 3D semantic contacts for all human-object pairs in an image via instance-aware tokens and an InteractionFormer, achieving higher accuracy and 20x throughput than prior methods.

RoSHI: A Versatile Robot-oriented Suit for Human Data In-the-Wild

cs.RO · 2026-04-08 · unverdicted · novelty 6.0

RoSHI is a hybrid wearable that combines sparse IMUs and egocentric SLAM to capture accurate full-body 3D pose and shape data in natural environments for robot learning.

World Narrative Model for Highly Controllable Video Generation: A Paradigm Shift from Pixel Sampling to Physical World Orchestration

cs.CV · 2026-06-30 · unverdicted · novelty 5.0

WNM introduces a 4D world narrative representation orchestrated by agents to drive video foundation models for high controllability.

One Sentence, One Drama: Personalized Short-Form Drama Generation via Multi-Agent Systems

cs.CV · 2026-05-21 · unverdicted · novelty 5.0

A hierarchical multi-agent framework converts a single sentence into a short drama using debate-based scripting, 3D-grounded first frames for spatial consistency, and multi-stage reviewer loops.

Asset Harvester: Extracting 3D Assets from Autonomous Driving Logs for Simulation

cs.CV · 2026-04-20 · unverdicted · novelty 5.0

Asset Harvester converts sparse in-the-wild object observations from AV driving logs into complete simulation-ready 3D assets via data curation, geometry-aware preprocessing, and a SparseViewDiT model that couples sparse-view multiview generation with 3D Gaussian lifting.

Better Rigs, Not Bigger Networks: A Body Model Ablation for Gaussian Avatars

cs.CV · 2026-04-01 · unverdicted · novelty 5.0

A minimal Gaussian splatting avatar pipeline using the Momentum Human Rig achieves the highest reported PSNR on PeopleSnapshot and ZJU-MoCap without learned deformations.

LUNA: Learning Universal 3D Human Animation Beyond Skinning

cs.CV · 2026-06-30 · unverdicted · novelty 4.0

LUNA is an LBS-free neural animation model that maps 2D controls to 3D Gaussian deformations via a transformer motion regressor and hybrid supervision for realistic motion and zero-shot generalization.

SMART: SMPLest-X Mesh Adaptation and RAFT Tracking for Soccer Pose Estimation

cs.CV · 2026-05-29 · unverdicted · novelty 2.0

SMART adapts SMPLest-X via stratified finetuning and RAFT tracking to achieve 0.647 validation and 0.593 test scores on the FIFA 2026 skeletal tracking challenge, versus baseline 1.053.

citing papers explorer

Showing 2 of 2 citing papers after filters.

CoInteract: Physically-Consistent Human-Object Interaction Video Synthesis via Spatially-Structured Co-Generation cs.CV · 2026-04-21 · unverdicted · none · ref 55
CoInteract adds a human-aware mixture-of-experts and spatially-structured co-generation to a diffusion transformer to synthesize videos with stable structures and physically plausible human-object contacts.
Asset Harvester: Extracting 3D Assets from Autonomous Driving Logs for Simulation cs.CV · 2026-04-20 · unverdicted · none · ref 27
Asset Harvester converts sparse in-the-wild object observations from AV driving logs into complete simulation-ready 3D assets via data curation, geometry-aware preprocessing, and a SparseViewDiT model that couples sparse-view multiview generation with 3D Gaussian lifting.

Sam 3d body: Robust full-body human mesh recovery

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer