An Expanded Massive Multilingual Dataset for High-Performance Language Technologies ( HPLT )

Burchell, Laurie, de Gibert, Ona, Arefyev, Nikolay, Aulamo, Mikko, Ba · 2025 · DOI 10.18653/v1/2025.acl-long.854

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open at publisher browse 2 citing papers

representative citing papers

CHALIS: A Challenge Dataset for Language Identification in Difficult Scenarios

cs.CL · 2026-06-04 · unverdicted · novelty 6.0

Introduces CHALIS benchmark dataset testing language ID on mutually intelligible cousin language pairs and orthographically noisy inputs, with evaluation showing existing systems struggle substantially.

CAT-Translate: Building Compact Open-Source Models for Japanese-English Translation

cs.CL · 2026-06-19 · unverdicted · novelty 3.0

Compact 0.8B-7B models for bidirectional Japanese-English translation outperform large multilingual models on real-world domain benchmarks.

citing papers explorer

Showing 2 of 2 citing papers.

CHALIS: A Challenge Dataset for Language Identification in Difficult Scenarios cs.CL · 2026-06-04 · unverdicted · none · ref 61
Introduces CHALIS benchmark dataset testing language ID on mutually intelligible cousin language pairs and orthographically noisy inputs, with evaluation showing existing systems struggle substantially.
CAT-Translate: Building Compact Open-Source Models for Japanese-English Translation cs.CL · 2026-06-19 · unverdicted · none · ref 5
Compact 0.8B-7B models for bidirectional Japanese-English translation outperform large multilingual models on real-world domain benchmarks.

An Expanded Massive Multilingual Dataset for High-Performance Language Technologies ( HPLT )

fields

years

verdicts

representative citing papers

citing papers explorer