Automatic translation metrics show lower agreement with humans on unseen technical domains than humans show with each other, and their robustness claims weaken when benchmarked against inter-annotator agreement instead of raw scores.
Ties matter: Meta-evaluating modern metrics with pairwise accuracy and tie calibration
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
method 1
citation-polarity summary
verdicts
UNVERDICTED 2roles
method 1polarities
use method 1representative citing papers
A human preference dataset and VideoReward model enable Flow-DPO and Flow-NRG to produce smoother, better-aligned videos from text prompts in flow-based generators.
citing papers explorer
-
Improving Video Generation with Human Feedback
A human preference dataset and VideoReward model enable Flow-DPO and Flow-NRG to produce smoother, better-aligned videos from text prompts in flow-based generators.