Chen Liu, Fajri Koto, Timothy Baldwin, and Iryna Gurevych

Liu, Chen Cecilia, Koto, Fajri, Baldwin, Timothy, Gurevych, Iryna , booktitle = · 2024 · DOI 10.18653/v1/2024.naacl-long.112

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

open at publisher browse 4 citing papers

citation-role summary

method 1

citation-polarity summary

use method 1

representative citing papers

Multilingual Idioms in Sentences and Conversations Across High-, Medium-, and Low-Resource Languages

cs.CL · 2026-06-01 · unverdicted · novelty 7.0

MIDI is a new multilingual idiom dataset with sentence and conversational contexts; benchmarking reveals worse performance in low-resource languages and on literal vs. figurative uses.

MultiSynt/MT: Trillion-Token Multi-Parallel Pre-Training Data Translated Across 36 Languages

cs.CL · 2026-07-01 · unverdicted · novelty 6.0

MultiSynt/MT supplies 4.8 trillion translated tokens in 36 languages from 100B English tokens, letting LLMs match native-data baselines with 72% fewer tokens and beat them by 15% at equal budget.

Understanding and Mitigating Bias Inheritance in LLM-based Data Augmentation on Downstream Tasks

cs.LG · 2025-02-06 · unverdicted · novelty 6.0

Empirical study across 10 tasks showing bias inheritance from LLM-augmented data harms related downstream performance, with three misalignment factors and three mitigation strategies identified.

Anthropogenic Regional Adaptation in Multimodal Vision-Language Model

cs.AI · 2026-04-13 · unverdicted · novelty 5.0

Anthropogenic Regional Adaptation with GG-EZ improves cultural relevance in multimodal vision-language models for Southeast Asia by 5-15% while retaining over 98% of global performance.

citing papers explorer

Showing 4 of 4 citing papers after filters.

Multilingual Idioms in Sentences and Conversations Across High-, Medium-, and Low-Resource Languages cs.CL · 2026-06-01 · unverdicted · none · ref 17
MIDI is a new multilingual idiom dataset with sentence and conversational contexts; benchmarking reveals worse performance in low-resource languages and on literal vs. figurative uses.
MultiSynt/MT: Trillion-Token Multi-Parallel Pre-Training Data Translated Across 36 Languages cs.CL · 2026-07-01 · unverdicted · none · ref 72
MultiSynt/MT supplies 4.8 trillion translated tokens in 36 languages from 100B English tokens, letting LLMs match native-data baselines with 72% fewer tokens and beat them by 15% at equal budget.
Understanding and Mitigating Bias Inheritance in LLM-based Data Augmentation on Downstream Tasks cs.LG · 2025-02-06 · unverdicted · none · ref 24
Empirical study across 10 tasks showing bias inheritance from LLM-augmented data harms related downstream performance, with three misalignment factors and three mitigation strategies identified.
Anthropogenic Regional Adaptation in Multimodal Vision-Language Model cs.AI · 2026-04-13 · unverdicted · none · ref 18
Anthropogenic Regional Adaptation with GG-EZ improves cultural relevance in multimodal vision-language models for Southeast Asia by 5-15% while retaining over 98% of global performance.

Chen Liu, Fajri Koto, Timothy Baldwin, and Iryna Gurevych

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer