BiTDiff: Fine-Grained 3D Conducting Motion Generation via BiMamba-Transformer Diffusion

· 2026 · cs.CV · arXiv 2604.04395

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open full Pith review browse 3 citing papers arXiv PDF

abstract

3D conducting motion generation aims to synthesize fine-grained conductor motions from music, with broad potential in music education, virtual performance, digital human animation, and human-AI co-creation. However, this task remains underexplored due to two major challenges: (1) the lack of large-scale fine-grained 3D conducting datasets and (2) the absence of effective methods that can jointly support long-sequence generation with high quality and efficiency. To address the data limitation, we develop a quality-oriented 3D conducting motion collection pipeline and construct CM-Data, a fine-grained SMPL-X dataset with about 10 hours of conducting motion data. To the best of our knowledge, CM-Data is the first and largest public dataset for 3D conducting motion generation. To address the methodological limitation, we propose BiTDiff, a novel framework for 3D conducting motion generation, built upon a BiMamba-Transformer hybrid model architecture for efficient long-sequence modeling and a Diffusion-based generative strategy with human-kinematic decomposition for high-quality motion synthesis. Specifically, BiTDiff introduces auxiliary physical-consistency losses and a hand-/body-specific forward-kinematics design for better fine-grained motion modeling, while leveraging BiMamba for memory-efficient long-sequence temporal modeling and Transformer for cross-modal semantic alignment. In addition, BiTDiff supports training-free joint-level motion editing, enabling downstream human-AI interaction design. Extensive quantitative and qualitative experiments demonstrate that BiTDiff achieves state-of-the-art (SOTA) performance for 3D conducting motion generation on the CM-Data dataset. Code will be available upon acceptance.

representative citing papers

Interactive Multi-Turn Retrieval for Health Videos

cs.IR · 2026-05-02 · unverdicted · novelty 6.0

DATR combines coarse CLIP-based retrieval with multi-turn query fusion and cross-encoder re-ranking to improve health video retrieval, supported by the new MHVRC corpus.

CustomDancer: Customized Dance Recommendation by Text-Dance Retrieval

cs.MM · 2026-05-01 · unverdicted · novelty 6.0

CustomDancer achieves state-of-the-art text-to-dance retrieval with 10.23% Recall@1 on the new TD-Data dataset by aligning text, music, and motion features through a CLIP-based framework.

MG-Former: A Transformer-Based Framework for Music-Driven 3D Conducting Gesture Generation

cs.SD · 2026-05-02 · unverdicted · novelty 5.0

TransConductor generates 3D conducting gestures from music via a Trans-Temporal Music Encoder and Gesture Decoder, outperforming baselines on retrieval-based alignment metrics with a new ConductorMotion dataset.

citing papers explorer

Showing 3 of 3 citing papers.

Interactive Multi-Turn Retrieval for Health Videos cs.IR · 2026-05-02 · unverdicted · none · ref 14 · internal anchor
DATR combines coarse CLIP-based retrieval with multi-turn query fusion and cross-encoder re-ranking to improve health video retrieval, supported by the new MHVRC corpus.
CustomDancer: Customized Dance Recommendation by Text-Dance Retrieval cs.MM · 2026-05-01 · unverdicted · none · ref 32 · internal anchor
CustomDancer achieves state-of-the-art text-to-dance retrieval with 10.23% Recall@1 on the new TD-Data dataset by aligning text, music, and motion features through a CLIP-based framework.
MG-Former: A Transformer-Based Framework for Music-Driven 3D Conducting Gesture Generation cs.SD · 2026-05-02 · unverdicted · none · ref 10 · internal anchor
TransConductor generates 3D conducting gestures from music via a Trans-Temporal Music Encoder and Gesture Decoder, outperforming baselines on retrieval-based alignment metrics with a new ConductorMotion dataset.

BiTDiff: Fine-Grained 3D Conducting Motion Generation via BiMamba-Transformer Diffusion

fields

years

verdicts

representative citing papers

citing papers explorer