hub

Mim- icmotion: High-quality human motion video generation with confidence-aware pose guidance

Yuang Zhang, Jiaxi Gu, Li-Wen Wang, Han Wang, Junqi Cheng, Yuefeng Zhu, Fangyuan Zou · 2024 · arXiv 2406.19680

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

read on arXiv browse 10 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 4

citation-polarity summary

background 4

representative citing papers

ReImagine: Rethinking Controllable High-Quality Human Video Generation via Image-First Synthesis

cs.CV · 2026-04-21 · unverdicted · novelty 7.0

ReImagine decouples human appearance from temporal consistency via pretrained image backbones, SMPL-X motion guidance, and training-free video diffusion refinement to generate high-quality controllable videos.

HumANDiff: Articulated Noise Diffusion for Motion-Consistent Human Video Generation

cs.CV · 2026-04-07 · unverdicted · novelty 7.0

HumANDiff improves motion consistency in human video generation by sampling diffusion noise on an articulated human body template and adding joint appearance-motion prediction plus a geometric consistency loss.

CoMoVi: Co-Generation of 3D Human Motions and Realistic Videos

cs.CV · 2026-01-15 · unverdicted · novelty 7.0

CoMoVi co-generates 3D human motions and 2D videos synchronously in a single diffusion denoising loop using 3D-to-2D projection and dual-branch diffusion with 3D-2D cross attentions.

One-to-All Animation: Alignment-Free Character Animation and Image Pose Transfer

cs.CV · 2025-11-28 · unverdicted · novelty 7.0

One-to-All Animation enables alignment-free character animation and image pose transfer via self-supervised outpainting reformulation, reference extraction, hybrid fusion attention, identity-robust pose control, and token replacement for long videos.

Bridging the Embodiment Gap: Disentangled Cross-Embodiment Video Editing

cs.RO · 2026-05-05 · unverdicted · novelty 6.0

A dual-contrastive disentanglement method factorizes videos into independent task and embodiment latents, then uses a parameter-efficient adapter on a frozen video diffusion model to synthesize robot executions from single human demonstrations without paired data.

SignVerse-2M: A Two-Million-Clip Pose-Native Universe of 55+ Sign Languages

cs.CV · 2026-05-03 · unverdicted · novelty 6.0

SignVerse-2M provides a 2-million-clip multilingual pose-native dataset for sign language derived from public videos via DWPose preprocessing to enable robust modeling in real-world conditions.

Exploring the Role of Synthetic Data Augmentation in Controllable Human-Centric Video Generation

cs.CV · 2026-04-23 · unverdicted · novelty 6.0

Synthetic data complements real data in diffusion-based controllable human video generation, with effective sample selection improving motion realism, temporal consistency, and identity preservation.

HVG-3D: Bridging Real and Simulation Domains for 3D-Conditional Hand-Object Interaction Video Synthesis

cs.CV · 2026-03-31 · unverdicted · novelty 6.0

HVG-3D uses a 3D-aware diffusion architecture with ControlNet to synthesize high-fidelity hand-object interaction videos from 3D control signals, achieving state-of-the-art spatial fidelity and temporal coherence on the TASTE-Rob dataset.

SteadyDancer: Harmonized and Coherent Human Image Animation with First-Frame Preservation

cs.CV · 2025-11-24 · unverdicted · novelty 6.0

SteadyDancer is an I2V framework using condition reconciliation, synergistic pose modulation, and staged training to achieve robust first-frame preservation and coherent motion control in human image animation.

Adding Thermal Awareness to Visual Systems in Real-Time via Distilled Diffusion Models

cs.CV · 2026-05-07 · unverdicted · novelty 5.0

FusionProxy is a distilled diffusion-based fusion module that adds thermal awareness to RGB vision systems in real time as an independent plug-and-play component.

citing papers explorer

Showing 10 of 10 citing papers.

ReImagine: Rethinking Controllable High-Quality Human Video Generation via Image-First Synthesis cs.CV · 2026-04-21 · unverdicted · none · ref 57
ReImagine decouples human appearance from temporal consistency via pretrained image backbones, SMPL-X motion guidance, and training-free video diffusion refinement to generate high-quality controllable videos.
HumANDiff: Articulated Noise Diffusion for Motion-Consistent Human Video Generation cs.CV · 2026-04-07 · unverdicted · none · ref 82
HumANDiff improves motion consistency in human video generation by sampling diffusion noise on an articulated human body template and adding joint appearance-motion prediction plus a geometric consistency loss.
CoMoVi: Co-Generation of 3D Human Motions and Realistic Videos cs.CV · 2026-01-15 · unverdicted · none · ref 112
CoMoVi co-generates 3D human motions and 2D videos synchronously in a single diffusion denoising loop using 3D-to-2D projection and dual-branch diffusion with 3D-2D cross attentions.
One-to-All Animation: Alignment-Free Character Animation and Image Pose Transfer cs.CV · 2025-11-28 · unverdicted · none · ref 62
One-to-All Animation enables alignment-free character animation and image pose transfer via self-supervised outpainting reformulation, reference extraction, hybrid fusion attention, identity-robust pose control, and token replacement for long videos.
Bridging the Embodiment Gap: Disentangled Cross-Embodiment Video Editing cs.RO · 2026-05-05 · unverdicted · none · ref 30
A dual-contrastive disentanglement method factorizes videos into independent task and embodiment latents, then uses a parameter-efficient adapter on a frozen video diffusion model to synthesize robot executions from single human demonstrations without paired data.
SignVerse-2M: A Two-Million-Clip Pose-Native Universe of 55+ Sign Languages cs.CV · 2026-05-03 · unverdicted · none · ref 22
SignVerse-2M provides a 2-million-clip multilingual pose-native dataset for sign language derived from public videos via DWPose preprocessing to enable robust modeling in real-world conditions.
Exploring the Role of Synthetic Data Augmentation in Controllable Human-Centric Video Generation cs.CV · 2026-04-23 · unverdicted · none · ref 48
Synthetic data complements real data in diffusion-based controllable human video generation, with effective sample selection improving motion realism, temporal consistency, and identity preservation.
HVG-3D: Bridging Real and Simulation Domains for 3D-Conditional Hand-Object Interaction Video Synthesis cs.CV · 2026-03-31 · unverdicted · none · ref 82
HVG-3D uses a 3D-aware diffusion architecture with ControlNet to synthesize high-fidelity hand-object interaction videos from 3D control signals, achieving state-of-the-art spatial fidelity and temporal coherence on the TASTE-Rob dataset.
SteadyDancer: Harmonized and Coherent Human Image Animation with First-Frame Preservation cs.CV · 2025-11-24 · unverdicted · none · ref 41
SteadyDancer is an I2V framework using condition reconciliation, synergistic pose modulation, and staged training to achieve robust first-frame preservation and coherent motion control in human image animation.
Adding Thermal Awareness to Visual Systems in Real-Time via Distilled Diffusion Models cs.CV · 2026-05-07 · unverdicted · none · ref 40
FusionProxy is a distilled diffusion-based fusion module that adds thermal awareness to RGB vision systems in real time as an independent plug-and-play component.

Mim- icmotion: High-quality human motion video generation with confidence-aware pose guidance

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer