Language ID in the Wild: Unexpected Challenges on the Path to a Thousand-Language Web Text Corpus

Caswell, Isaac, Breiner, Theresa, van Esch, Daan, Bapna, Ankur · 2020 · DOI 10.18653/v1/2020.coling-main.579

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open at publisher browse 2 citing papers

representative citing papers

Dango: A Strictly L1-Only Large Language Model for Studying Second Language Acquisition

cs.CL · 2026-06-17 · unverdicted · novelty 6.0

Introduces Dango, a 1.8B strictly L1-only LLM using corpus filtering and lesson fine-tuning to simulate Japanese-to-English SLA and produce human-like L2 output patterns.

Are you speaking my languages? On spoken language adherence in multimodal LLMs

cs.CL · 2026-06-15 · unverdicted · novelty 4.0

Defines language adherence failures in multimodal ASR LLMs and compares soft prompting, SFT, and CoT strategies for reducing violations across languages.

citing papers explorer

Showing 2 of 2 citing papers.

Dango: A Strictly L1-Only Large Language Model for Studying Second Language Acquisition cs.CL · 2026-06-17 · unverdicted · none · ref 36
Introduces Dango, a 1.8B strictly L1-only LLM using corpus filtering and lesson fine-tuning to simulate Japanese-to-English SLA and produce human-like L2 output patterns.
Are you speaking my languages? On spoken language adherence in multimodal LLMs cs.CL · 2026-06-15 · unverdicted · none · ref 8
Defines language adherence failures in multimodal ASR LLMs and compares soft prompting, SFT, and CoT strategies for reducing violations across languages.

Language ID in the Wild: Unexpected Challenges on the Path to a Thousand-Language Web Text Corpus

fields

years

verdicts

representative citing papers

citing papers explorer