Lius: Translation Model Based Instructional Lingustic Using Continual Instruction Tuning In Kupang Malay

Guntur Budi Herwanto; Joanito Agili Lopo; Yunita Sari

arxiv: 2606.11786 · v1 · pith:IYGP75IMnew · submitted 2026-06-10 · 💻 cs.CL

Lius: Translation Model Based Instructional Lingustic Using Continual Instruction Tuning In Kupang Malay

Joanito Agili Lopo , Yunita Sari , Guntur Budi Herwanto This is my paper

Pith reviewed 2026-06-27 09:49 UTC · model grok-4.3

classification 💻 cs.CL

keywords low-resource translationcontinual instruction tuningKupang Malayinstruction tuningmachine translationbilingual dictionarylarge language models

0 comments

The pith

Continual instruction tuning with dictionary-derived features improves Kupang Malay translation without large parallel datasets.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a method to adapt large language models for translation tasks involving Kupang Malay by first pulling lexical and semantic details from a bilingual dictionary to shape instructions. It then applies continual instruction tuning, an iterative training process that repeatedly exposes the model to these instructions. The resulting Lius model records gains of 4-6 points against ordinary instruction-tuned baselines and 10-13 points against neural machine translation systems and multilingual models on standard metrics. The work shows how instruction design can substitute for volume of parallel text in low-resource settings.

Core claim

By constructing instructions that embed explicit lexical and semantic features extracted from a bilingual dictionary and training via Continual Instruction Tuning, the Lius model delivers measurable gains in Kupang Malay translation accuracy over standard instruction-tuned, neural machine translation, and multilingual LLM baselines while avoiding dependence on large-scale parallel data.

What carries the argument

Continual Instruction Tuning (CIT), an iterative training loop that repeatedly applies dictionary-derived instructions to adapt an LLM for a target low-resource language pair.

If this is right

Low-resource translation can proceed with far smaller parallel corpora when instructions encode dictionary features.
Instruction-tuned models gain measurable accuracy from iterative rather than one-shot training on language-specific instructions.
Performance advantages over both dedicated NMT systems and general multilingual LLMs become attainable through the same dictionary-plus-CIT pipeline.
The approach supplies a concrete route to reduce data-collection costs for additional low-resource language pairs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same dictionary-to-instruction pipeline could be tested on other Austronesian or creole languages that possess modest bilingual resources.
Automating the extraction of lexical and semantic features might allow the method to scale without manual dictionary curation.
Combining CIT with existing multilingual models could narrow the gap between general-purpose LLMs and language-specific systems at lower compute cost.

Load-bearing premise

Explicit lexical and semantic features taken from a bilingual dictionary are enough to produce instructions that let continual tuning succeed where large parallel corpora are unavailable.

What would settle it

Evaluate the Lius model on another low-resource language that has no high-quality bilingual dictionary and check whether the 4-13 point gains over the same baselines disappear.

read the original abstract

Large Language Models (LLMs) offer new potential for translation tasks but often experience performance degradation when handling low-resource languages. To address this limitation, we propose an approach for fine-tuning LLMs on a low-resource language, Kupang Malay. Our approach involves designing a set of instructions by leveraging explicit lexical and semantic features from a bilingual dictionary, and introducing Continual Instruction Tuning (CIT), a training paradigm that enables iterative instruction-based training. Experimental results demonstrate that our model, named Lius, yields notable improvements over standard instruction-tuned models by outperforming 4-6 points, and surpassing both Neural Machine Translation (NMT) and Multilingual LLM models by 10-13 points on several evaluation metrics. These findings highlight the potential of our approach to mitigate the reliance on large-scale parallel data in low-resource language translation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a standard application of instruction tuning plus continual learning to Kupang Malay translation that adds dictionary features to prompts, but the abstract supplies no metrics, test sets, base models, or ablations so the claimed gains cannot be checked.

read the letter

The paper takes existing instruction-tuning methods, pulls lexical and semantic entries from a bilingual dictionary to build prompt templates, and runs continual instruction tuning on a low-resource language pair. That is the core of what they do.

It targets a practical gap: translation for Kupang Malay where large parallel corpora are scarce. The idea of turning dictionary content into instructions is straightforward and could be useful if it works.

The problem is the evidence. The abstract states 4-6 point gains over standard instruction-tuned models and 10-13 points over NMT and multilingual LLMs, yet it gives no numbers for BLEU, chrF, COMET or whatever metric was used, no test-set size or domain, no base model name, no count of training examples, and no ablation that isolates the dictionary component. Without those, the deltas cannot be attributed to the proposed mechanism.

The central assumption—that dictionary-derived instructions alone make continual tuning effective without large parallel data—remains untested in the description we have. The stress-test note is right on this point.

This work is mainly for researchers already focused on Indonesian or Malay varieties who want to see one more data point on low-resource fine-tuning. It does not introduce new methods or reproducible results that would change how the field thinks about the problem.

I would not bring it to a reading group or cite it. It does not yet deserve peer review; the experimental section needs to be written and the controls shown before a referee should spend time on it.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes a model named Lius for low-resource machine translation involving Kupang Malay. It designs instructions by extracting explicit lexical and semantic features from a bilingual dictionary and introduces Continual Instruction Tuning (CIT) as an iterative fine-tuning paradigm for LLMs. The central claim is that Lius yields 4-6 point gains over standard instruction-tuned models and 10-13 point gains over NMT and multilingual LLM baselines on several evaluation metrics, thereby reducing reliance on large-scale parallel data.

Significance. If the experimental claims hold under proper controls, the work could offer a practical route for low-resource translation by showing how dictionary-derived instructions enable effective CIT. This addresses a genuine need in the field for methods that operate with limited parallel corpora.

major comments (2)

[Abstract] Abstract: The headline result (4-6 pt gains over instruction-tuned models; 10-13 pt over NMT/multilingual LLMs) is stated without any information on the evaluation metrics, test sets, statistical significance, baseline implementations, or experimental controls. This absence prevents verification that the data support the claims.
[§4] §4 (Experiments): The central claim that bilingual-dictionary-derived instructions suffice for CIT gains rests on an untested assumption; the section supplies no details on dictionary size/coverage, the mapping from entries to instruction templates, the base model, volume of any parallel data still used, number of CIT stages, or ablations isolating the dictionary component. Without these, the reported deltas cannot be attributed to the proposed mechanism.

minor comments (1)

[Title] Title: 'Lingustic' is a typographical error and should read 'Linguistic'.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. The comments highlight areas where additional detail will improve verifiability of the claims. We address each point below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: The headline result (4-6 pt gains over instruction-tuned models; 10-13 pt over NMT/multilingual LLMs) is stated without any information on the evaluation metrics, test sets, statistical significance, baseline implementations, or experimental controls. This absence prevents verification that the data support the claims.

Authors: We agree that the abstract would benefit from greater specificity. In the revised manuscript we will expand the abstract to name the primary metrics (BLEU and chrF), identify the test sets, note that reported gains include statistical significance testing, and briefly characterize the baseline implementations and controls. These additions will make the headline claims directly verifiable from the abstract. revision: yes
Referee: [§4] §4 (Experiments): The central claim that bilingual-dictionary-derived instructions suffice for CIT gains rests on an untested assumption; the section supplies no details on dictionary size/coverage, the mapping from entries to instruction templates, the base model, volume of any parallel data still used, number of CIT stages, or ablations isolating the dictionary component. Without these, the reported deltas cannot be attributed to the proposed mechanism.

Authors: We accept that §4 currently omits several implementation details required to attribute gains to the dictionary-derived instructions and CIT procedure. We will revise the section to report dictionary size and coverage statistics, the exact template-mapping procedure, the base LLM, the quantity of parallel data retained, the number of CIT stages, and ablation experiments that isolate the dictionary component. These additions will allow readers to evaluate the contribution of the proposed mechanism. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical method proposal with no derivation chain

full rationale

The paper describes an empirical NLP approach: designing instructions from bilingual dictionary lexical/semantic features, then applying Continual Instruction Tuning (CIT) to fine-tune an LLM for Kupang Malay translation. No equations, parameters, or mathematical derivations are present in the provided abstract or described claims. Experimental deltas (4-6 pts over instruction tuning, 10-13 over NMT/LLMs) are reported as outcomes, not as quantities forced by construction from fitted inputs. No self-citations, uniqueness theorems, or ansatzes are invoked in the given text. This matches the reader's assessment of score 1.0; absence of detail on dictionary size or ablations is a reproducibility concern, not circularity. The derivation chain is empty, so no reduction to inputs occurs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract does not specify any free parameters, axioms, or invented entities; the method relies on standard LLM fine-tuning practices.

pith-pipeline@v0.9.1-grok · 5678 in / 1230 out tokens · 22399 ms · 2026-06-27T09:49:53.529762+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

87 extracted references · 49 canonical work pages

[1]

Sequence to Sequence Learning with Neural Networks , url =

Sutskever, Ilya and Vinyals, Oriol and Le, Quoc V , booktitle =. Sequence to Sequence Learning with Neural Networks , url =
[2]

Learning phrase representations using RNN encoder ⚶decoder for statistical machine translation

Cho, Kyunghyun and van Merri. Learning Phrase Representations using RNN Encoder -- Decoder for Statistical Machine Translation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing ( EMNLP ). 2014. doi:10.3115/v1/D14-1179

work page doi:10.3115/v1/d14-1179 2014
[3]

Attention is All you Need , url =

Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, ukasz and Polosukhin, Illia , booktitle =. Attention is All you Need , url =
[4]

A Convolutional Encoder Model for Neural Machine Translation

Gehring, Jonas and Auli, Michael and Grangier, David and Dauphin, Yann. A Convolutional Encoder Model for Neural Machine Translation. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2017. doi:10.18653/v1/P17-1012

work page doi:10.18653/v1/p17-1012 2017
[5]

Survey of Low-Resource Machine Translation

Haddow, Barry and Bawden, Rachel and Miceli Barone, Antonio Valerio and Helcl, Jind r ich and Birch, Alexandra. Survey of Low-Resource Machine Translation. Computational Linguistics. 2022. doi:10.1162/coli_a_00446

work page doi:10.1162/coli_a_00446 2022
[6]

In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies ( NAACL-HLT )

Hedderich, Michael A. and Lange, Lukas and Adel, Heike and Str. A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021. doi:10.18653/v1/2021.naacl-main.201

work page doi:10.18653/v1/2021.naacl-main.201 2021
[7]

N usa W rites: Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource Languages

Cahyawijaya, Samuel and Lovenia, Holy and Koto, Fajri and Adhista, Dea and Dave, Emmanuel and Oktavianti, Sarah and Akbar, Salsabil and Lee, Jhonson and Shadieq, Nuur and Cenggoro, Tjeng Wawan and Linuwih, Hanung and Wilie, Bryan and Muridan, Galih and Winata, Genta and Moeljadi, David and Aji, Alham Fikri and Purwarianti, Ayu and Fung, Pascale. N usa W r...

work page doi:10.18653/v1/2023.ijcnlp-main.60 2023
[8]

N usa X : Multilingual Parallel Sentiment Dataset for 10 I ndonesian Local Languages

Winata, Genta Indra and Aji, Alham Fikri and Cahyawijaya, Samuel and Mahendra, Rahmad and Koto, Fajri and Romadhony, Ade and Kurniawan, Kemal and Moeljadi, David and Prasojo, Radityo Eko and Fung, Pascale and Baldwin, Timothy and Lau, Jey Han and Sennrich, Rico and Ruder, Sebastian. N usa X : Multilingual Parallel Sentiment Dataset for 10 I ndonesian Loca...

work page doi:10.18653/v1/2023.eacl-main.57 2023
[9]

Parallel Data, Tools and Interfaces in OPUS

Tiedemann, J. Parallel Data, Tools and Interfaces in OPUS. Proceedings of the Eighth International Conference on Language Resources and Evaluation ( LREC `12). 2012

2012
[10]

One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in I ndonesia

Aji, Alham Fikri and Winata, Genta Indra and Koto, Fajri and Cahyawijaya, Samuel and Romadhony, Ade and Mahendra, Rahmad and Kurniawan, Kemal and Moeljadi, David and Prasojo, Radityo Eko and Baldwin, Timothy and Lau, Jey Han and Ruder, Sebastian. One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in I ndonesia. Proceed...

work page doi:10.18653/v1/2022.acl-long.500 2022
[11]

Cross-Lingual Machine Speech Chain for J avanese, S undanese, B alinese, and B ataks Speech Recognition and Synthesis

Novitasari, Sashi and Tjandra, Andros and Sakti, Sakriani and Nakamura, Satoshi. Cross-Lingual Machine Speech Chain for J avanese, S undanese, B alinese, and B ataks Speech Recognition and Synthesis. Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resource...

2020
[12]

Language Model Prior for Low-Resource Neural Machine Translation

Baziotis, Christos and Haddow, Barry and Birch, Alexandra. Language Model Prior for Low-Resource Neural Machine Translation. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020. doi:10.18653/v1/2020.emnlp-main.615

work page doi:10.18653/v1/2020.emnlp-main.615 2020
[13]

Improving the Lexical Ability of Pretrained Language Models for Unsupervised Neural Machine Translation

Chronopoulou, Alexandra and Stojanovski, Dario and Fraser, Alexander. Improving the Lexical Ability of Pretrained Language Models for Unsupervised Neural Machine Translation. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021. doi:10.18653/v1/2021.naacl-main.16

work page doi:10.18653/v1/2021.naacl-main.16 2021
[14]

Bilingual Dictionary Based Neural Machine Translation without Using Parallel Sentences

Duan, Xiangyu and Ji, Baijun and Jia, Hao and Tan, Min and Zhang, Min and Chen, Boxing and Luo, Weihua and Zhang, Yue. Bilingual Dictionary Based Neural Machine Translation without Using Parallel Sentences. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. doi:10.18653/v1/2020.acl-main.143

work page doi:10.18653/v1/2020.acl-main.143 2020
[15]

Translating Translationese: A Two-Step Approach to Unsupervised Machine Translation

Pourdamghani, Nima and Aldarrab, Nada and Ghazvininejad, Marjan and Knight, Kevin and May, Jonathan. Translating Translationese: A Two-Step Approach to Unsupervised Machine Translation. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019. doi:10.18653/v1/P19-1293

work page doi:10.18653/v1/p19-1293 2019
[16]

ACM Comput

Dabre, Raj and Chu, Chenhui and Kunchukuttan, Anoop , title =. ACM Comput. Surv. , month = sep, articleno =. 2020 , issue_date =. doi:10.1145/3406095 , abstract =

work page doi:10.1145/3406095 2020
[17]

Unsupervised Pivot Translation for Distant Languages

Leng, Yichong and Tan, Xu and Qin, Tao and Li, Xiang-Yang and Liu, Tie-Yan. Unsupervised Pivot Translation for Distant Languages. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019. doi:10.18653/v1/P19-1017

work page doi:10.18653/v1/p19-1017 2019
[18]

Transfer Learning for Low-Resource Neural Machine Translation

Zoph, Barret and Yuret, Deniz and May, Jonathan and Knight, Kevin. Transfer Learning for Low-Resource Neural Machine Translation. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2016. doi:10.18653/v1/D16-1163

work page doi:10.18653/v1/d16-1163 2016
[19]

Language Models are Few-Shot Learners , url =

Brown, Tom and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared D and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and Agarwal, Sandhini and Herbert-Voss, Ariel and Krueger, Gretchen and Henighan, Tom and Child, Rewon and Ramesh, Aditya and Ziegler, Daniel and Wu, Jeffrey and Winte...
[20]

2024 , eprint=

GPT-4 Technical Report , author=. 2024 , eprint=

2024
[21]

and Wang, Longyue

Lyu, Chenyang and Du, Zefeng and Xu, Jitao and Duan, Yitao and Wu, Minghao and Lynn, Teresa and Aji, Alham Fikri and Wong, Derek F. and Wang, Longyue. A Paradigm Shift: The Future of Machine Translation Lies with Large Language Models. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (L...

2024
[22]

and Xu, Yan and Fung, Pascale

Bang, Yejin and Cahyawijaya, Samuel and Lee, Nayeon and Dai, Wenliang and Su, Dan and Wilie, Bryan and Lovenia, Holy and Ji, Ziwei and Yu, Tiezheng and Chung, Willy and Do, Quyet V. and Xu, Yan and Fung, Pascale. A Multitask, Multilingual, Multimodal Evaluation of C hat GPT on Reasoning, Hallucination, and Interactivity. Proceedings of the 13th Internatio...

work page doi:10.18653/v1/2023.ijcnlp-main.45 2023
[23]

2023 , eprint=

Is ChatGPT A Good Translator? Yes With GPT-4 As The Engine , author=. 2023 , eprint=

2023
[24]

and Neubig, Graham

Robinson, Nathaniel and Ogayo, Perez and Mortensen, David R. and Neubig, Graham. C hat GPT MT : Competitive for High- (but Not Low-) Resource Languages. Proceedings of the Eighth Conference on Machine Translation. 2023. doi:10.18653/v1/2023.wmt-1.40

work page doi:10.18653/v1/2023.wmt-1.40 2023
[25]

Exploring Very Low-Resource Translation with LLM s: The U niversity of E dinburgh`s Submission to A mericas NLP 2024 Translation Task

Iyer, Vivek and Malik, Bhavitvya and Zhu, Wenhao and Stepachev, Pavel and Chen, Pinzhen and Haddow, Barry and Birch, Alexandra. Exploring Very Low-Resource Translation with LLM s: The U niversity of E dinburgh`s Submission to A mericas NLP 2024 Translation Task. Proceedings of the 4th Workshop on Natural Language Processing for Indigenous Languages of the...

work page doi:10.18653/v1/2024.americasnlp-1.25 2024
[26]

2023 , eprint=

Llama 2: Open Foundation and Fine-Tuned Chat Models , author=. 2023 , eprint=

2023
[27]

2024 , eprint=

MaLA-500: Massive Language Adaptation of Large Language Models , author=. 2024 , eprint=

2024
[28]

2023 , eprint=

Mistral 7B , author=. 2023 , eprint=

2023
[29]

I nstruct A lign: High-and-Low Resource Language Alignment via Continual Crosslingual Instruction Tuning

Cahyawijaya, Samuel and Lovenia, Holy and Yu, Tiezheng and Chung, Willy and Fung, Pascale. I nstruct A lign: High-and-Low Resource Language Alignment via Continual Crosslingual Instruction Tuning. Proceedings of the First Workshop in South East Asian Language Processing. 2023. doi:10.18653/v1/2023.sealp-1.5

work page doi:10.18653/v1/2023.sealp-1.5 2023
[30]

Crosslingual Generalization through Multitask Finetuning

Muennighoff, Niklas and Wang, Thomas and Sutawika, Lintang and Roberts, Adam and Biderman, Stella and Le Scao, Teven and Bari, M Saiful and Shen, Sheng and Yong, Zheng Xin and Schoelkopf, Hailey and Tang, Xiangru and Radev, Dragomir and Aji, Alham Fikri and Almubarak, Khalid and Albanie, Samuel and Alyafeai, Zaid and Webson, Albert and Raff, Edward and Ra...

work page doi:10.18653/v1/2023.acl-long.891 2023
[31]

Tuning LLM s with Contrastive Alignment Instructions for Machine Translation in Unseen, Low-resource Languages

Mao, Zhuoyuan and Yu, Yen. Tuning LLM s with Contrastive Alignment Instructions for Machine Translation in Unseen, Low-resource Languages. Proceedings of the Seventh Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2024). 2024. doi:10.18653/v1/2024.loresmt-1.1

work page doi:10.18653/v1/2024.loresmt-1.1 2024
[32]

, editor =

Dyer, Chris and Chahuneau, Victor and Smith, Noah A. , editor =. A. Proceedings of the 2013. 2013 , pages =

2013
[33]

Transactions of the Association for Computational Linguistics , author =

Eliciting the. Transactions of the Association for Computational Linguistics , author =. 2024 , note =. doi:10.1162/tacl_a_00655 , abstract =

work page doi:10.1162/tacl_a_00655 2024
[34]

2022 , eprint=

Few-shot Learning with Multilingual Language Models , author=. 2022 , eprint=

2022
[35]

Teaching Large Language Models to Translate on Low-resource Languages with Textbook Prompting

Guo, Ping and Ren, Yubing and Hu, Yue and Li, Yunpeng and Zhang, Jiarui and Zhang, Xingsheng and Huang, Heyan. Teaching Large Language Models to Translate on Low-resource Languages with Textbook Prompting. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). 2024

2024
[36]

Low-Resource Machine Translation through Retrieval-Augmented LLM Prompting: A Study on the M ambai Language

Merx, Rapha. Low-Resource Machine Translation through Retrieval-Augmented LLM Prompting: A Study on the M ambai Language. Proceedings of the 2nd Workshop on Resources and Technologies for Indigenous, Endangered and Lesser-resourced Languages in Eurasia (EURALI) @ LREC-COLING 2024. 2024

2024
[37]

Bitext Mining Using Distilled Sentence Representations for Low-Resource Languages

Heffernan, Kevin and C elebi, Onur and Schwenk, Holger. Bitext Mining Using Distilled Sentence Representations for Low-Resource Languages. Findings of the Association for Computational Linguistics: EMNLP 2022. 2022. doi:10.18653/v1/2022.findings-emnlp.154

work page doi:10.18653/v1/2022.findings-emnlp.154 2022
[38]

Towards Making the Most of C hat GPT for Machine Translation

Peng, Keqin and Ding, Liang and Zhong, Qihuang and Shen, Li and Liu, Xuebo and Zhang, Min and Ouyang, Yuanxin and Tao, Dacheng. Towards Making the Most of C hat GPT for Machine Translation. Findings of the Association for Computational Linguistics: EMNLP 2023. 2023. doi:10.18653/v1/2023.findings-emnlp.373

work page doi:10.18653/v1/2023.findings-emnlp.373 2023
[39]

Hire a Linguist!: Learning Endangered Languages in LLM s with In-Context Linguistic Descriptions

Zhang, Kexun and Choi, Yee and Song, Zhenqiao and He, Taiqi and Wang, William Yang and Li, Lei. Hire a Linguist!: Learning Endangered Languages in LLM s with In-Context Linguistic Descriptions. Findings of the Association for Computational Linguistics: ACL 2024. 2024. doi:10.18653/v1/2024.findings-acl.925

work page doi:10.18653/v1/2024.findings-acl.925 2024
[40]

Not All Languages Are Created Equal in LLM s: Improving Multilingual Capability by Cross-Lingual-Thought Prompting

Huang, Haoyang and Tang, Tianyi and Zhang, Dongdong and Zhao, Xin and Song, Ting and Xia, Yan and Wei, Furu. Not All Languages Are Created Equal in LLM s: Improving Multilingual Capability by Cross-Lingual-Thought Prompting. Findings of the Association for Computational Linguistics: EMNLP 2023. 2023. doi:10.18653/v1/2023.findings-emnlp.826

work page doi:10.18653/v1/2023.findings-emnlp.826 2023
[41]

Exploring Human-Like Translation Strategy with Large Language Models

He, Zhiwei and Liang, Tian and Jiao, Wenxiang and Zhang, Zhuosheng and Yang, Yujiu and Wang, Rui and Tu, Zhaopeng and Shi, Shuming and Wang, Xing. Exploring Human-Like Translation Strategy with Large Language Models. Transactions of the Association for Computational Linguistics. 2024. doi:10.1162/tacl_a_00642

work page doi:10.1162/tacl_a_00642 2024
[42]

Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis , articleno =

Rajbhandari, Samyam and Rasley, Jeff and Ruwase, Olatunji and He, Yuxiong , title =. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis , articleno =. 2020 , isbn =

2020
[43]

1980--2014 , howpublished =

Jakarta Field Station , title =. 1980--2014 , howpublished =

1980
[44]

2005 , note =

Yohanes Manhitu , title =. 2005 , note =

2005
[45]

Transactions of the Association for Computational Linguistics , author =

Bojanowski, Piotr and Grave, Edouard and Joulin, Armand and Mikolov, Tomas. Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics. 2017. doi:10.1162/tacl_a_00051

work page doi:10.1162/tacl_a_00051 2017
[46]

Cendol: Open Instruction-tuned Generative Large Language Models for I ndonesian Languages

Cahyawijaya, Samuel and Lovenia, Holy and Koto, Fajri and Putri, Rifki and Cenggoro, Wawan and Lee, Jhonson and Akbar, Salsabil and Dave, Emmanuel and Nuurshadieq, Nuurshadieq and Mahendra, Muhammad and Putri, Rr and Wilie, Bryan and Winata, Genta and Aji, Alham and Purwarianti, Ayu and Fung, Pascale. Cendol: Open Instruction-tuned Generative Large Langua...

work page doi:10.18653/v1/2024.acl-long.796 2024
[47]

Sailor: Open Language Models for South- E ast A sia

Dou, Longxu and Liu, Qian and Zeng, Guangtao and Guo, Jia and Zhou, Jiahui and Mao, Xin and Jin, Ziqi and Lu, Wei and Lin, Min. Sailor: Open Language Models for South- E ast A sia. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. 2024. doi:10.18653/v1/2024.emnlp-demo.45

work page doi:10.18653/v1/2024.emnlp-demo.45 2024
[48]

2024 , eprint=

Aya Expanse: Combining Research Breakthroughs for a New Multilingual Frontier , author=. 2024 , eprint=

2024
[49]

2024 , eprint=

SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages , author=. 2024 , eprint=

2024
[50]

2023 , eprint=

MADLAD-400: A Multilingual And Document-Level Large Audited Dataset , author=. 2023 , eprint=

2023
[51]

2020 , eprint=

Scaling Laws for Neural Language Models , author=. 2020 , eprint=

2020
[52]

W iki M atrix: Mining 135 M Parallel Sentences in 1620 Language Pairs from W ikipedia

Schwenk, Holger and Chaudhary, Vishrav and Sun, Shuo and Gong, Hongyu and Guzm \'a n, Francisco. W iki M atrix: Mining 135 M Parallel Sentences in 1620 Language Pairs from W ikipedia. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. 2021. doi:10.18653/v1/2021.eacl-main.115

work page doi:10.18653/v1/2021.eacl-main.115 2021
[53]

CCM atrix: Mining Billions of High-Quality Parallel Sentences on the Web

Schwenk, Holger and Wenzek, Guillaume and Edunov, Sergey and Grave, Edouard and Joulin, Armand and Fan, Angela. CCM atrix: Mining Billions of High-Quality Parallel Sentences on the Web. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume ...

work page doi:10.18653/v1/2021.acl-long.507 2021
[54]

2024 , eprint=

Taxi1500: A Multilingual Dataset for Text Classification in 1500 Languages , author=. 2024 , eprint=

2024
[55]

P an L ex: Building a Resource for Panlingual Lexical Translation

Kamholz, David and Pool, Jonathan and Colowick, Susan. P an L ex: Building a Resource for Panlingual Lexical Translation. Proceedings of the Ninth International Conference on Language Resources and Evaluation ( LREC `14). 2014

2014
[56]

2024 , eprint=

Constructing and Expanding Low-Resource and Underrepresented Parallel Datasets for Indonesian Local Languages , author=. 2024 , eprint=

2024
[57]

INTERFERENSI FONOLOGIS PENUTUR BAHASA MELAYU KUPANG KE DALAM BAHASA INDONESIA DI KOTA KUPANG , volume =

Rafael, Agnes Maria Diana , year =. INTERFERENSI FONOLOGIS PENUTUR BAHASA MELAYU KUPANG KE DALAM BAHASA INDONESIA DI KOTA KUPANG , volume =. Jurnal Penelitian Humaniora , publisher =. doi:10.23917/humaniora.v20i1.7225 , number =

work page doi:10.23917/humaniora.v20i1.7225
[58]

2003 , publisher=

Kamus pengantar bahasa Kupang , author=. 2003 , publisher=

2003
[59]

2023 , eprint=

PolyLM: An Open Source Polyglot Large Language Model , author=. 2023 , eprint=

2023
[60]

Kontektualisasi Direct Instruction Dalam Pembelajaran Sains , volume =

Zahriani, Zahriani , year =. Kontektualisasi Direct Instruction Dalam Pembelajaran Sains , volume =. Lantanida Journal , publisher =. doi:10.22373/lj.v2i1.667 , number =

work page doi:10.22373/lj.v2i1.667
[61]

and Morris, Jared R

Hughes, Charles A. and Morris, Jared R. and Therrien, William J. and Benson, Sarah K. , year =. Explicit Instruction: Historical and Contemporary Contexts , volume =. Learning Disabilities Research &; Practice , publisher =. doi:10.1111/ldrp.12142 , number =

work page doi:10.1111/ldrp.12142
[62]

Theory in Second Language Acquisition (Recognition of Concepts Toward Krashen’s Second Language Acquisition Theory for Five Main Hypotheses) , volume =

Pauzan, Pauzan , year =. Theory in Second Language Acquisition (Recognition of Concepts Toward Krashen’s Second Language Acquisition Theory for Five Main Hypotheses) , volume =. Journal on Education , publisher =. doi:10.31004/joe.v6i4.6210 , number =

work page doi:10.31004/joe.v6i4.6210
[63]

Nelson , journal =

Deanna L. Nelson , journal =. A Context-Based Strategy for Teaching Vocabulary , urldate =
[64]

, year =

Graves, Michael F. , year =. Vocabulary Learning and Instruction , volume =. doi:10.2307/1167219 , journal =

work page doi:10.2307/1167219
[65]

2023 , eprint=

Dictionary-based Phrase-level Prompting of Large Language Models for Machine Translation , author=. 2023 , eprint=

2023
[66]

and Raugh, Michael R

Atkinson, Richard C. and Raugh, Michael R. , year =. An application of the mnemonic keyword method to the acquisition of a Russian vocabulary. , volume =. Journal of Experimental Psychology: Human Learning and Memory , publisher =. doi:10.1037/0278-7393.1.2.126 , number =

work page doi:10.1037/0278-7393.1.2.126
[67]

Rekrut , journal =

Martha D. Rekrut , journal =. Effective Vocabulary Instruction , urldate =
[68]

and Ding, Liang and Chao, Lidia S

Liu, Xuebo and Wang, Longyue and Wong, Derek F. and Ding, Liang and Chao, Lidia S. and Shi, Shuming and Tu, Zhaopeng. On the Copying Behaviors of Pre-Training for Neural Machine Translation. Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. 2021. doi:10.18653/v1/2021.findings-acl.373

work page doi:10.18653/v1/2021.findings-acl.373 2021
[69]

Richard Landis and Gary G

J. Richard Landis and Gary G. Koch , journal =. An Application of Hierarchical Kappa-type Statistics in the Assessment of Majority Agreement among Multiple Observers , urldate =
[70]

doi:10.5281/zenodo.4461265 , file =

Grootendorst, Maarten , month = jan, year =. doi:10.5281/zenodo.4461265 , file =

work page doi:10.5281/zenodo.4461265
[71]

doi:10.20944/preprints201908.0073.v1 , url =

Prafull Sharma and Yingbo Li , title =. doi:10.20944/preprints201908.0073.v1 , url =

work page doi:10.20944/preprints201908.0073.v1
[72]

I ndo NLU : Benchmark and Resources for Evaluating I ndonesian Natural Language Understanding

Wilie, Bryan and Vincentio, Karissa and Winata, Genta Indra and Cahyawijaya, Samuel and Li, Xiaohong and Lim, Zhi Yuan and Soleman, Sidik and Mahendra, Rahmad and Fung, Pascale and Bahar, Syafri and Purwarianti, Ayu. I ndo NLU : Benchmark and Resources for Evaluating I ndonesian Natural Language Understanding. Proceedings of the 1st Conference of the Asia...

work page doi:10.18653/v1/2020.aacl-main.85 2020
[73]

Automatic

Rose, Stuart and Engel, Dave and Cramer, Nick and Cowley, Wendy , year =. Automatic. Text. doi:10.1002/9780470689646.ch1 , note =

work page doi:10.1002/9780470689646.ch1
[74]

doi:https://doi.org/10.1016/j.ins.2019.09.013 , journal =

Ricardo Campos and Vítor Mangaravite and Arian Pasquali and Alípio Jorge and Célia Nunes and Adam Jatowt , keywords =. YAKE! Keyword extraction from single documents using multiple local features , journal =. 2020 , issn =. doi:https://doi.org/10.1016/j.ins.2019.09.013 , url =

work page doi:10.1016/j.ins.2019.09.013 2020
[75]

Bag of Tricks for Efficient Text Classification

Joulin, Armand and Grave, Edouard and Bojanowski, Piotr and Mikolov, Tomas. Bag of Tricks for Efficient Text Classification. Proceedings of the 15th Conference of the E uropean Chapter of the Association for Computational Linguistics: Volume 2, Short Papers. 2017

2017
[76]

B leu: a Method for Automatic Evaluation of Machine Translation

Papineni, Kishore and Roukos, Salim and Ward, Todd and Zhu, Wei-Jing. B leu: a Method for Automatic Evaluation of Machine Translation. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. 2002. doi:10.3115/1073083.1073135

work page doi:10.3115/1073083.1073135 2002
[77]

A Study of Translation Edit Rate with Targeted Human Annotation

Snover, Matthew and Dorr, Bonnie and Schwartz, Rich and Micciulla, Linnea and Makhoul, John. A Study of Translation Edit Rate with Targeted Human Annotation. Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers. 2006

2006
[78]

(2015) ’chrF: Character N-Gram F-Score for Automatic MT Evaluation’ in Proceedings of the Tenth Workshop on Statistical Machine Translation

Popovi \'c , Maja. chr F : character n-gram F -score for automatic MT evaluation. Proceedings of the Tenth Workshop on Statistical Machine Translation. 2015. doi:10.18653/v1/W15-3049

work page doi:10.18653/v1/w15-3049 2015
[79]

(2018) ’ A Call for Clarity in Reporting BLEU Scores’ in Proceedings of the Third Conference on Machine Translation

Post, Matt. A Call for Clarity in Reporting BLEU Scores. Proceedings of the Third Conference on Machine Translation: Research Papers. 2018. doi:10.18653/v1/W18-6319

work page doi:10.18653/v1/w18-6319 2018
[80]

ROUGE : A Package for Automatic Evaluation of Summaries

Lin, Chin-Yew. ROUGE : A Package for Automatic Evaluation of Summaries. Text Summarization Branches Out. 2004

2004

Showing first 80 references.

[1] [1]

Sequence to Sequence Learning with Neural Networks , url =

Sutskever, Ilya and Vinyals, Oriol and Le, Quoc V , booktitle =. Sequence to Sequence Learning with Neural Networks , url =

[2] [2]

Learning phrase representations using RNN encoder ⚶decoder for statistical machine translation

Cho, Kyunghyun and van Merri. Learning Phrase Representations using RNN Encoder -- Decoder for Statistical Machine Translation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing ( EMNLP ). 2014. doi:10.3115/v1/D14-1179

work page doi:10.3115/v1/d14-1179 2014

[3] [3]

Attention is All you Need , url =

Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, ukasz and Polosukhin, Illia , booktitle =. Attention is All you Need , url =

[4] [4]

A Convolutional Encoder Model for Neural Machine Translation

Gehring, Jonas and Auli, Michael and Grangier, David and Dauphin, Yann. A Convolutional Encoder Model for Neural Machine Translation. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2017. doi:10.18653/v1/P17-1012

work page doi:10.18653/v1/p17-1012 2017

[5] [5]

Survey of Low-Resource Machine Translation

Haddow, Barry and Bawden, Rachel and Miceli Barone, Antonio Valerio and Helcl, Jind r ich and Birch, Alexandra. Survey of Low-Resource Machine Translation. Computational Linguistics. 2022. doi:10.1162/coli_a_00446

work page doi:10.1162/coli_a_00446 2022

[6] [6]

In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies ( NAACL-HLT )

Hedderich, Michael A. and Lange, Lukas and Adel, Heike and Str. A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021. doi:10.18653/v1/2021.naacl-main.201

work page doi:10.18653/v1/2021.naacl-main.201 2021

[7] [7]

N usa W rites: Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource Languages

Cahyawijaya, Samuel and Lovenia, Holy and Koto, Fajri and Adhista, Dea and Dave, Emmanuel and Oktavianti, Sarah and Akbar, Salsabil and Lee, Jhonson and Shadieq, Nuur and Cenggoro, Tjeng Wawan and Linuwih, Hanung and Wilie, Bryan and Muridan, Galih and Winata, Genta and Moeljadi, David and Aji, Alham Fikri and Purwarianti, Ayu and Fung, Pascale. N usa W r...

work page doi:10.18653/v1/2023.ijcnlp-main.60 2023

[8] [8]

N usa X : Multilingual Parallel Sentiment Dataset for 10 I ndonesian Local Languages

Winata, Genta Indra and Aji, Alham Fikri and Cahyawijaya, Samuel and Mahendra, Rahmad and Koto, Fajri and Romadhony, Ade and Kurniawan, Kemal and Moeljadi, David and Prasojo, Radityo Eko and Fung, Pascale and Baldwin, Timothy and Lau, Jey Han and Sennrich, Rico and Ruder, Sebastian. N usa X : Multilingual Parallel Sentiment Dataset for 10 I ndonesian Loca...

work page doi:10.18653/v1/2023.eacl-main.57 2023

[9] [9]

Parallel Data, Tools and Interfaces in OPUS

Tiedemann, J. Parallel Data, Tools and Interfaces in OPUS. Proceedings of the Eighth International Conference on Language Resources and Evaluation ( LREC `12). 2012

2012

[10] [10]

One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in I ndonesia

Aji, Alham Fikri and Winata, Genta Indra and Koto, Fajri and Cahyawijaya, Samuel and Romadhony, Ade and Mahendra, Rahmad and Kurniawan, Kemal and Moeljadi, David and Prasojo, Radityo Eko and Baldwin, Timothy and Lau, Jey Han and Ruder, Sebastian. One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in I ndonesia. Proceed...

work page doi:10.18653/v1/2022.acl-long.500 2022

[11] [11]

Cross-Lingual Machine Speech Chain for J avanese, S undanese, B alinese, and B ataks Speech Recognition and Synthesis

Novitasari, Sashi and Tjandra, Andros and Sakti, Sakriani and Nakamura, Satoshi. Cross-Lingual Machine Speech Chain for J avanese, S undanese, B alinese, and B ataks Speech Recognition and Synthesis. Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resource...

2020

[12] [12]

Language Model Prior for Low-Resource Neural Machine Translation

Baziotis, Christos and Haddow, Barry and Birch, Alexandra. Language Model Prior for Low-Resource Neural Machine Translation. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020. doi:10.18653/v1/2020.emnlp-main.615

work page doi:10.18653/v1/2020.emnlp-main.615 2020

[13] [13]

Improving the Lexical Ability of Pretrained Language Models for Unsupervised Neural Machine Translation

Chronopoulou, Alexandra and Stojanovski, Dario and Fraser, Alexander. Improving the Lexical Ability of Pretrained Language Models for Unsupervised Neural Machine Translation. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021. doi:10.18653/v1/2021.naacl-main.16

work page doi:10.18653/v1/2021.naacl-main.16 2021

[14] [14]

Bilingual Dictionary Based Neural Machine Translation without Using Parallel Sentences

Duan, Xiangyu and Ji, Baijun and Jia, Hao and Tan, Min and Zhang, Min and Chen, Boxing and Luo, Weihua and Zhang, Yue. Bilingual Dictionary Based Neural Machine Translation without Using Parallel Sentences. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. doi:10.18653/v1/2020.acl-main.143

work page doi:10.18653/v1/2020.acl-main.143 2020

[15] [15]

Translating Translationese: A Two-Step Approach to Unsupervised Machine Translation

Pourdamghani, Nima and Aldarrab, Nada and Ghazvininejad, Marjan and Knight, Kevin and May, Jonathan. Translating Translationese: A Two-Step Approach to Unsupervised Machine Translation. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019. doi:10.18653/v1/P19-1293

work page doi:10.18653/v1/p19-1293 2019

[16] [16]

ACM Comput

Dabre, Raj and Chu, Chenhui and Kunchukuttan, Anoop , title =. ACM Comput. Surv. , month = sep, articleno =. 2020 , issue_date =. doi:10.1145/3406095 , abstract =

work page doi:10.1145/3406095 2020

[17] [17]

Unsupervised Pivot Translation for Distant Languages

Leng, Yichong and Tan, Xu and Qin, Tao and Li, Xiang-Yang and Liu, Tie-Yan. Unsupervised Pivot Translation for Distant Languages. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019. doi:10.18653/v1/P19-1017

work page doi:10.18653/v1/p19-1017 2019

[18] [18]

Transfer Learning for Low-Resource Neural Machine Translation

Zoph, Barret and Yuret, Deniz and May, Jonathan and Knight, Kevin. Transfer Learning for Low-Resource Neural Machine Translation. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2016. doi:10.18653/v1/D16-1163

work page doi:10.18653/v1/d16-1163 2016

[19] [19]

Language Models are Few-Shot Learners , url =

Brown, Tom and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared D and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and Agarwal, Sandhini and Herbert-Voss, Ariel and Krueger, Gretchen and Henighan, Tom and Child, Rewon and Ramesh, Aditya and Ziegler, Daniel and Wu, Jeffrey and Winte...

[20] [20]

2024 , eprint=

GPT-4 Technical Report , author=. 2024 , eprint=

2024

[21] [21]

and Wang, Longyue

Lyu, Chenyang and Du, Zefeng and Xu, Jitao and Duan, Yitao and Wu, Minghao and Lynn, Teresa and Aji, Alham Fikri and Wong, Derek F. and Wang, Longyue. A Paradigm Shift: The Future of Machine Translation Lies with Large Language Models. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (L...

2024

[22] [22]

and Xu, Yan and Fung, Pascale

Bang, Yejin and Cahyawijaya, Samuel and Lee, Nayeon and Dai, Wenliang and Su, Dan and Wilie, Bryan and Lovenia, Holy and Ji, Ziwei and Yu, Tiezheng and Chung, Willy and Do, Quyet V. and Xu, Yan and Fung, Pascale. A Multitask, Multilingual, Multimodal Evaluation of C hat GPT on Reasoning, Hallucination, and Interactivity. Proceedings of the 13th Internatio...

work page doi:10.18653/v1/2023.ijcnlp-main.45 2023

[23] [23]

2023 , eprint=

Is ChatGPT A Good Translator? Yes With GPT-4 As The Engine , author=. 2023 , eprint=

2023

[24] [24]

and Neubig, Graham

Robinson, Nathaniel and Ogayo, Perez and Mortensen, David R. and Neubig, Graham. C hat GPT MT : Competitive for High- (but Not Low-) Resource Languages. Proceedings of the Eighth Conference on Machine Translation. 2023. doi:10.18653/v1/2023.wmt-1.40

work page doi:10.18653/v1/2023.wmt-1.40 2023

[25] [25]

Exploring Very Low-Resource Translation with LLM s: The U niversity of E dinburgh`s Submission to A mericas NLP 2024 Translation Task

Iyer, Vivek and Malik, Bhavitvya and Zhu, Wenhao and Stepachev, Pavel and Chen, Pinzhen and Haddow, Barry and Birch, Alexandra. Exploring Very Low-Resource Translation with LLM s: The U niversity of E dinburgh`s Submission to A mericas NLP 2024 Translation Task. Proceedings of the 4th Workshop on Natural Language Processing for Indigenous Languages of the...

work page doi:10.18653/v1/2024.americasnlp-1.25 2024

[26] [26]

2023 , eprint=

Llama 2: Open Foundation and Fine-Tuned Chat Models , author=. 2023 , eprint=

2023

[27] [27]

2024 , eprint=

MaLA-500: Massive Language Adaptation of Large Language Models , author=. 2024 , eprint=

2024

[28] [28]

2023 , eprint=

Mistral 7B , author=. 2023 , eprint=

2023

[29] [29]

I nstruct A lign: High-and-Low Resource Language Alignment via Continual Crosslingual Instruction Tuning

Cahyawijaya, Samuel and Lovenia, Holy and Yu, Tiezheng and Chung, Willy and Fung, Pascale. I nstruct A lign: High-and-Low Resource Language Alignment via Continual Crosslingual Instruction Tuning. Proceedings of the First Workshop in South East Asian Language Processing. 2023. doi:10.18653/v1/2023.sealp-1.5

work page doi:10.18653/v1/2023.sealp-1.5 2023

[30] [30]

Crosslingual Generalization through Multitask Finetuning

Muennighoff, Niklas and Wang, Thomas and Sutawika, Lintang and Roberts, Adam and Biderman, Stella and Le Scao, Teven and Bari, M Saiful and Shen, Sheng and Yong, Zheng Xin and Schoelkopf, Hailey and Tang, Xiangru and Radev, Dragomir and Aji, Alham Fikri and Almubarak, Khalid and Albanie, Samuel and Alyafeai, Zaid and Webson, Albert and Raff, Edward and Ra...

work page doi:10.18653/v1/2023.acl-long.891 2023

[31] [31]

Tuning LLM s with Contrastive Alignment Instructions for Machine Translation in Unseen, Low-resource Languages

Mao, Zhuoyuan and Yu, Yen. Tuning LLM s with Contrastive Alignment Instructions for Machine Translation in Unseen, Low-resource Languages. Proceedings of the Seventh Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2024). 2024. doi:10.18653/v1/2024.loresmt-1.1

work page doi:10.18653/v1/2024.loresmt-1.1 2024

[32] [32]

, editor =

Dyer, Chris and Chahuneau, Victor and Smith, Noah A. , editor =. A. Proceedings of the 2013. 2013 , pages =

2013

[33] [33]

Transactions of the Association for Computational Linguistics , author =

Eliciting the. Transactions of the Association for Computational Linguistics , author =. 2024 , note =. doi:10.1162/tacl_a_00655 , abstract =

work page doi:10.1162/tacl_a_00655 2024

[34] [34]

2022 , eprint=

Few-shot Learning with Multilingual Language Models , author=. 2022 , eprint=

2022

[35] [35]

Teaching Large Language Models to Translate on Low-resource Languages with Textbook Prompting

Guo, Ping and Ren, Yubing and Hu, Yue and Li, Yunpeng and Zhang, Jiarui and Zhang, Xingsheng and Huang, Heyan. Teaching Large Language Models to Translate on Low-resource Languages with Textbook Prompting. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). 2024

2024

[36] [36]

Low-Resource Machine Translation through Retrieval-Augmented LLM Prompting: A Study on the M ambai Language

Merx, Rapha. Low-Resource Machine Translation through Retrieval-Augmented LLM Prompting: A Study on the M ambai Language. Proceedings of the 2nd Workshop on Resources and Technologies for Indigenous, Endangered and Lesser-resourced Languages in Eurasia (EURALI) @ LREC-COLING 2024. 2024

2024

[37] [37]

Bitext Mining Using Distilled Sentence Representations for Low-Resource Languages

Heffernan, Kevin and C elebi, Onur and Schwenk, Holger. Bitext Mining Using Distilled Sentence Representations for Low-Resource Languages. Findings of the Association for Computational Linguistics: EMNLP 2022. 2022. doi:10.18653/v1/2022.findings-emnlp.154

work page doi:10.18653/v1/2022.findings-emnlp.154 2022

[38] [38]

Towards Making the Most of C hat GPT for Machine Translation

Peng, Keqin and Ding, Liang and Zhong, Qihuang and Shen, Li and Liu, Xuebo and Zhang, Min and Ouyang, Yuanxin and Tao, Dacheng. Towards Making the Most of C hat GPT for Machine Translation. Findings of the Association for Computational Linguistics: EMNLP 2023. 2023. doi:10.18653/v1/2023.findings-emnlp.373

work page doi:10.18653/v1/2023.findings-emnlp.373 2023

[39] [39]

Hire a Linguist!: Learning Endangered Languages in LLM s with In-Context Linguistic Descriptions

Zhang, Kexun and Choi, Yee and Song, Zhenqiao and He, Taiqi and Wang, William Yang and Li, Lei. Hire a Linguist!: Learning Endangered Languages in LLM s with In-Context Linguistic Descriptions. Findings of the Association for Computational Linguistics: ACL 2024. 2024. doi:10.18653/v1/2024.findings-acl.925

work page doi:10.18653/v1/2024.findings-acl.925 2024

[40] [40]

Not All Languages Are Created Equal in LLM s: Improving Multilingual Capability by Cross-Lingual-Thought Prompting

Huang, Haoyang and Tang, Tianyi and Zhang, Dongdong and Zhao, Xin and Song, Ting and Xia, Yan and Wei, Furu. Not All Languages Are Created Equal in LLM s: Improving Multilingual Capability by Cross-Lingual-Thought Prompting. Findings of the Association for Computational Linguistics: EMNLP 2023. 2023. doi:10.18653/v1/2023.findings-emnlp.826

work page doi:10.18653/v1/2023.findings-emnlp.826 2023

[41] [41]

Exploring Human-Like Translation Strategy with Large Language Models

He, Zhiwei and Liang, Tian and Jiao, Wenxiang and Zhang, Zhuosheng and Yang, Yujiu and Wang, Rui and Tu, Zhaopeng and Shi, Shuming and Wang, Xing. Exploring Human-Like Translation Strategy with Large Language Models. Transactions of the Association for Computational Linguistics. 2024. doi:10.1162/tacl_a_00642

work page doi:10.1162/tacl_a_00642 2024

[42] [42]

Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis , articleno =

Rajbhandari, Samyam and Rasley, Jeff and Ruwase, Olatunji and He, Yuxiong , title =. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis , articleno =. 2020 , isbn =

2020

[43] [43]

1980--2014 , howpublished =

Jakarta Field Station , title =. 1980--2014 , howpublished =

1980

[44] [44]

2005 , note =

Yohanes Manhitu , title =. 2005 , note =

2005

[45] [45]

Transactions of the Association for Computational Linguistics , author =

Bojanowski, Piotr and Grave, Edouard and Joulin, Armand and Mikolov, Tomas. Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics. 2017. doi:10.1162/tacl_a_00051

work page doi:10.1162/tacl_a_00051 2017

[46] [46]

Cendol: Open Instruction-tuned Generative Large Language Models for I ndonesian Languages

Cahyawijaya, Samuel and Lovenia, Holy and Koto, Fajri and Putri, Rifki and Cenggoro, Wawan and Lee, Jhonson and Akbar, Salsabil and Dave, Emmanuel and Nuurshadieq, Nuurshadieq and Mahendra, Muhammad and Putri, Rr and Wilie, Bryan and Winata, Genta and Aji, Alham and Purwarianti, Ayu and Fung, Pascale. Cendol: Open Instruction-tuned Generative Large Langua...

work page doi:10.18653/v1/2024.acl-long.796 2024

[47] [47]

Sailor: Open Language Models for South- E ast A sia

Dou, Longxu and Liu, Qian and Zeng, Guangtao and Guo, Jia and Zhou, Jiahui and Mao, Xin and Jin, Ziqi and Lu, Wei and Lin, Min. Sailor: Open Language Models for South- E ast A sia. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. 2024. doi:10.18653/v1/2024.emnlp-demo.45

work page doi:10.18653/v1/2024.emnlp-demo.45 2024

[48] [48]

2024 , eprint=

Aya Expanse: Combining Research Breakthroughs for a New Multilingual Frontier , author=. 2024 , eprint=

2024

[49] [49]

2024 , eprint=

SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages , author=. 2024 , eprint=

2024

[50] [50]

2023 , eprint=

MADLAD-400: A Multilingual And Document-Level Large Audited Dataset , author=. 2023 , eprint=

2023

[51] [51]

2020 , eprint=

Scaling Laws for Neural Language Models , author=. 2020 , eprint=

2020

[52] [52]

W iki M atrix: Mining 135 M Parallel Sentences in 1620 Language Pairs from W ikipedia

Schwenk, Holger and Chaudhary, Vishrav and Sun, Shuo and Gong, Hongyu and Guzm \'a n, Francisco. W iki M atrix: Mining 135 M Parallel Sentences in 1620 Language Pairs from W ikipedia. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. 2021. doi:10.18653/v1/2021.eacl-main.115

work page doi:10.18653/v1/2021.eacl-main.115 2021

[53] [53]

CCM atrix: Mining Billions of High-Quality Parallel Sentences on the Web

Schwenk, Holger and Wenzek, Guillaume and Edunov, Sergey and Grave, Edouard and Joulin, Armand and Fan, Angela. CCM atrix: Mining Billions of High-Quality Parallel Sentences on the Web. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume ...

work page doi:10.18653/v1/2021.acl-long.507 2021

[54] [54]

2024 , eprint=

Taxi1500: A Multilingual Dataset for Text Classification in 1500 Languages , author=. 2024 , eprint=

2024

[55] [55]

P an L ex: Building a Resource for Panlingual Lexical Translation

Kamholz, David and Pool, Jonathan and Colowick, Susan. P an L ex: Building a Resource for Panlingual Lexical Translation. Proceedings of the Ninth International Conference on Language Resources and Evaluation ( LREC `14). 2014

2014

[56] [56]

2024 , eprint=

Constructing and Expanding Low-Resource and Underrepresented Parallel Datasets for Indonesian Local Languages , author=. 2024 , eprint=

2024

[57] [57]

INTERFERENSI FONOLOGIS PENUTUR BAHASA MELAYU KUPANG KE DALAM BAHASA INDONESIA DI KOTA KUPANG , volume =

Rafael, Agnes Maria Diana , year =. INTERFERENSI FONOLOGIS PENUTUR BAHASA MELAYU KUPANG KE DALAM BAHASA INDONESIA DI KOTA KUPANG , volume =. Jurnal Penelitian Humaniora , publisher =. doi:10.23917/humaniora.v20i1.7225 , number =

work page doi:10.23917/humaniora.v20i1.7225

[58] [58]

2003 , publisher=

Kamus pengantar bahasa Kupang , author=. 2003 , publisher=

2003

[59] [59]

2023 , eprint=

PolyLM: An Open Source Polyglot Large Language Model , author=. 2023 , eprint=

2023

[60] [60]

Kontektualisasi Direct Instruction Dalam Pembelajaran Sains , volume =

Zahriani, Zahriani , year =. Kontektualisasi Direct Instruction Dalam Pembelajaran Sains , volume =. Lantanida Journal , publisher =. doi:10.22373/lj.v2i1.667 , number =

work page doi:10.22373/lj.v2i1.667

[61] [61]

and Morris, Jared R

Hughes, Charles A. and Morris, Jared R. and Therrien, William J. and Benson, Sarah K. , year =. Explicit Instruction: Historical and Contemporary Contexts , volume =. Learning Disabilities Research &; Practice , publisher =. doi:10.1111/ldrp.12142 , number =

work page doi:10.1111/ldrp.12142

[62] [62]

Theory in Second Language Acquisition (Recognition of Concepts Toward Krashen’s Second Language Acquisition Theory for Five Main Hypotheses) , volume =

Pauzan, Pauzan , year =. Theory in Second Language Acquisition (Recognition of Concepts Toward Krashen’s Second Language Acquisition Theory for Five Main Hypotheses) , volume =. Journal on Education , publisher =. doi:10.31004/joe.v6i4.6210 , number =

work page doi:10.31004/joe.v6i4.6210

[63] [63]

Nelson , journal =

Deanna L. Nelson , journal =. A Context-Based Strategy for Teaching Vocabulary , urldate =

[64] [64]

, year =

Graves, Michael F. , year =. Vocabulary Learning and Instruction , volume =. doi:10.2307/1167219 , journal =

work page doi:10.2307/1167219

[65] [65]

2023 , eprint=

Dictionary-based Phrase-level Prompting of Large Language Models for Machine Translation , author=. 2023 , eprint=

2023

[66] [66]

and Raugh, Michael R

Atkinson, Richard C. and Raugh, Michael R. , year =. An application of the mnemonic keyword method to the acquisition of a Russian vocabulary. , volume =. Journal of Experimental Psychology: Human Learning and Memory , publisher =. doi:10.1037/0278-7393.1.2.126 , number =

work page doi:10.1037/0278-7393.1.2.126

[67] [67]

Rekrut , journal =

Martha D. Rekrut , journal =. Effective Vocabulary Instruction , urldate =

[68] [68]

and Ding, Liang and Chao, Lidia S

Liu, Xuebo and Wang, Longyue and Wong, Derek F. and Ding, Liang and Chao, Lidia S. and Shi, Shuming and Tu, Zhaopeng. On the Copying Behaviors of Pre-Training for Neural Machine Translation. Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. 2021. doi:10.18653/v1/2021.findings-acl.373

work page doi:10.18653/v1/2021.findings-acl.373 2021

[69] [69]

Richard Landis and Gary G

J. Richard Landis and Gary G. Koch , journal =. An Application of Hierarchical Kappa-type Statistics in the Assessment of Majority Agreement among Multiple Observers , urldate =

[70] [70]

doi:10.5281/zenodo.4461265 , file =

Grootendorst, Maarten , month = jan, year =. doi:10.5281/zenodo.4461265 , file =

work page doi:10.5281/zenodo.4461265

[71] [71]

doi:10.20944/preprints201908.0073.v1 , url =

Prafull Sharma and Yingbo Li , title =. doi:10.20944/preprints201908.0073.v1 , url =

work page doi:10.20944/preprints201908.0073.v1

[72] [72]

I ndo NLU : Benchmark and Resources for Evaluating I ndonesian Natural Language Understanding

Wilie, Bryan and Vincentio, Karissa and Winata, Genta Indra and Cahyawijaya, Samuel and Li, Xiaohong and Lim, Zhi Yuan and Soleman, Sidik and Mahendra, Rahmad and Fung, Pascale and Bahar, Syafri and Purwarianti, Ayu. I ndo NLU : Benchmark and Resources for Evaluating I ndonesian Natural Language Understanding. Proceedings of the 1st Conference of the Asia...

work page doi:10.18653/v1/2020.aacl-main.85 2020

[73] [73]

Automatic

Rose, Stuart and Engel, Dave and Cramer, Nick and Cowley, Wendy , year =. Automatic. Text. doi:10.1002/9780470689646.ch1 , note =

work page doi:10.1002/9780470689646.ch1

[74] [74]

doi:https://doi.org/10.1016/j.ins.2019.09.013 , journal =

Ricardo Campos and Vítor Mangaravite and Arian Pasquali and Alípio Jorge and Célia Nunes and Adam Jatowt , keywords =. YAKE! Keyword extraction from single documents using multiple local features , journal =. 2020 , issn =. doi:https://doi.org/10.1016/j.ins.2019.09.013 , url =

work page doi:10.1016/j.ins.2019.09.013 2020

[75] [75]

Bag of Tricks for Efficient Text Classification

Joulin, Armand and Grave, Edouard and Bojanowski, Piotr and Mikolov, Tomas. Bag of Tricks for Efficient Text Classification. Proceedings of the 15th Conference of the E uropean Chapter of the Association for Computational Linguistics: Volume 2, Short Papers. 2017

2017

[76] [76]

B leu: a Method for Automatic Evaluation of Machine Translation

Papineni, Kishore and Roukos, Salim and Ward, Todd and Zhu, Wei-Jing. B leu: a Method for Automatic Evaluation of Machine Translation. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. 2002. doi:10.3115/1073083.1073135

work page doi:10.3115/1073083.1073135 2002

[77] [77]

A Study of Translation Edit Rate with Targeted Human Annotation

Snover, Matthew and Dorr, Bonnie and Schwartz, Rich and Micciulla, Linnea and Makhoul, John. A Study of Translation Edit Rate with Targeted Human Annotation. Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers. 2006

2006

[78] [78]

(2015) ’chrF: Character N-Gram F-Score for Automatic MT Evaluation’ in Proceedings of the Tenth Workshop on Statistical Machine Translation

Popovi \'c , Maja. chr F : character n-gram F -score for automatic MT evaluation. Proceedings of the Tenth Workshop on Statistical Machine Translation. 2015. doi:10.18653/v1/W15-3049

work page doi:10.18653/v1/w15-3049 2015

[79] [79]

(2018) ’ A Call for Clarity in Reporting BLEU Scores’ in Proceedings of the Third Conference on Machine Translation

Post, Matt. A Call for Clarity in Reporting BLEU Scores. Proceedings of the Third Conference on Machine Translation: Research Papers. 2018. doi:10.18653/v1/W18-6319

work page doi:10.18653/v1/w18-6319 2018

[80] [80]

ROUGE : A Package for Automatic Evaluation of Summaries

Lin, Chin-Yew. ROUGE : A Package for Automatic Evaluation of Summaries. Text Summarization Branches Out. 2004

2004