SPLIT benchmark finds Gemini-2.5-Flash and LLaMA-3.3-70B degrade in Ukrainian while DeepSeek-V3 stays stable, with weak human-AI agreement on cultural grounding.
Improbable bigrams expose vulnerabilities of incomplete to- kens in byte-level tokenizers.arXiv preprint arXiv:2410.23684, 2024
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
SPLIT: Cross-Lingual Empathy and Cultural Grounding in English and Ukrainian LLM Responses
SPLIT benchmark finds Gemini-2.5-Flash and LLaMA-3.3-70B degrade in Ukrainian while DeepSeek-V3 stays stable, with weak human-AI agreement on cultural grounding.