Elo uncovered: Robustness and best practices in language model evaluation.Advances in Neural Information Processing Systems, 37:106135–106161, 2024

Meriem Boubdir, Edward Kim, Beyza Ermis, Sara Hooker, Marzieh Fadaee · 2024

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Towards Reliable Human Evaluations in Gesture Generation: Insights from a Community-Driven State-of-the-Art Benchmark

cs.CV · 2025-11-03 · unverdicted · novelty 7.0

Introduces a standardized human evaluation protocol for speech-driven gesture generation on BEAT2 and benchmarks six models, revealing saturated motion realism and unreliable prior alignment scores.

citing papers explorer

Showing 1 of 1 citing paper.

Towards Reliable Human Evaluations in Gesture Generation: Insights from a Community-Driven State-of-the-Art Benchmark cs.CV · 2025-11-03 · unverdicted · none · ref 11
Introduces a standardized human evaluation protocol for speech-driven gesture generation on BEAT2 and benchmarks six models, revealing saturated motion realism and unreliable prior alignment scores.

Elo uncovered: Robustness and best practices in language model evaluation.Advances in Neural Information Processing Systems, 37:106135–106161, 2024

fields

years

verdicts

representative citing papers

citing papers explorer