Introduces a standardized human evaluation protocol for speech-driven gesture generation on BEAT2 and benchmarks six models, revealing saturated motion realism and unreliable prior alignment scores.
Convofusion: Multi-modal conversational dif- fusion for co-speech gesture synthesis
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Towards Reliable Human Evaluations in Gesture Generation: Insights from a Community-Driven State-of-the-Art Benchmark
Introduces a standardized human evaluation protocol for speech-driven gesture generation on BEAT2 and benchmarks six models, revealing saturated motion realism and unreliable prior alignment scores.