ComplexityMT benchmark finds higher CEFR levels increase translation difficulty and MT systems often shift target CEFR levels versus source texts in most of six languages tested.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , pages=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CL 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Introduces DOSEBENCH benchmark and shows four LLMs often fail at rolling 24-hour dose calculations and constraint adherence in OTC dosing decisions despite appearing confident.
citing papers explorer
-
ComplexityMT: Benchmarking the Interaction Between Text Complexity and Machine Translation
ComplexityMT benchmark finds higher CEFR levels increase translation difficulty and MT systems often shift target CEFR levels versus source texts in most of six languages tested.
-
Can I Take Another Dose? Evaluating LLM Decision-Making Under Temporal Uncertainty in OTC Dosing QA
Introduces DOSEBENCH benchmark and shows four LLMs often fail at rolling 24-hour dose calculations and constraint adherence in OTC dosing decisions despite appearing confident.