ForMaT is a new parallel corpus of 3,956 PDFs across 15 language pairs that preserves original layout metadata as a benchmark for visually-grounded multilingual translation.
In Chiruzzo, Luis, Alan Ritter, and Lu Wang, edi- tors,Findings of the Association for Computational Linguistics: NAACL 2025, pages 761–778, Albu- querque, New Mexico, April
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
ForMaT: Dataset for Visually-Grounded Multilingual PDF Translation
ForMaT is a new parallel corpus of 3,956 PDFs across 15 language pairs that preserves original layout metadata as a benchmark for visually-grounded multilingual translation.