pith. sign in

arxiv: 2606.11786 · v1 · pith:IYGP75IMnew · submitted 2026-06-10 · 💻 cs.CL

Lius: Translation Model Based Instructional Lingustic Using Continual Instruction Tuning In Kupang Malay

Pith reviewed 2026-06-27 09:49 UTC · model grok-4.3

classification 💻 cs.CL
keywords low-resource translationcontinual instruction tuningKupang Malayinstruction tuningmachine translationbilingual dictionarylarge language models
0
0 comments X

The pith

Continual instruction tuning with dictionary-derived features improves Kupang Malay translation without large parallel datasets.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a method to adapt large language models for translation tasks involving Kupang Malay by first pulling lexical and semantic details from a bilingual dictionary to shape instructions. It then applies continual instruction tuning, an iterative training process that repeatedly exposes the model to these instructions. The resulting Lius model records gains of 4-6 points against ordinary instruction-tuned baselines and 10-13 points against neural machine translation systems and multilingual models on standard metrics. The work shows how instruction design can substitute for volume of parallel text in low-resource settings.

Core claim

By constructing instructions that embed explicit lexical and semantic features extracted from a bilingual dictionary and training via Continual Instruction Tuning, the Lius model delivers measurable gains in Kupang Malay translation accuracy over standard instruction-tuned, neural machine translation, and multilingual LLM baselines while avoiding dependence on large-scale parallel data.

What carries the argument

Continual Instruction Tuning (CIT), an iterative training loop that repeatedly applies dictionary-derived instructions to adapt an LLM for a target low-resource language pair.

If this is right

  • Low-resource translation can proceed with far smaller parallel corpora when instructions encode dictionary features.
  • Instruction-tuned models gain measurable accuracy from iterative rather than one-shot training on language-specific instructions.
  • Performance advantages over both dedicated NMT systems and general multilingual LLMs become attainable through the same dictionary-plus-CIT pipeline.
  • The approach supplies a concrete route to reduce data-collection costs for additional low-resource language pairs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same dictionary-to-instruction pipeline could be tested on other Austronesian or creole languages that possess modest bilingual resources.
  • Automating the extraction of lexical and semantic features might allow the method to scale without manual dictionary curation.
  • Combining CIT with existing multilingual models could narrow the gap between general-purpose LLMs and language-specific systems at lower compute cost.

Load-bearing premise

Explicit lexical and semantic features taken from a bilingual dictionary are enough to produce instructions that let continual tuning succeed where large parallel corpora are unavailable.

What would settle it

Evaluate the Lius model on another low-resource language that has no high-quality bilingual dictionary and check whether the 4-13 point gains over the same baselines disappear.

read the original abstract

Large Language Models (LLMs) offer new potential for translation tasks but often experience performance degradation when handling low-resource languages. To address this limitation, we propose an approach for fine-tuning LLMs on a low-resource language, Kupang Malay. Our approach involves designing a set of instructions by leveraging explicit lexical and semantic features from a bilingual dictionary, and introducing Continual Instruction Tuning (CIT), a training paradigm that enables iterative instruction-based training. Experimental results demonstrate that our model, named Lius, yields notable improvements over standard instruction-tuned models by outperforming 4-6 points, and surpassing both Neural Machine Translation (NMT) and Multilingual LLM models by 10-13 points on several evaluation metrics. These findings highlight the potential of our approach to mitigate the reliance on large-scale parallel data in low-resource language translation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes a model named Lius for low-resource machine translation involving Kupang Malay. It designs instructions by extracting explicit lexical and semantic features from a bilingual dictionary and introduces Continual Instruction Tuning (CIT) as an iterative fine-tuning paradigm for LLMs. The central claim is that Lius yields 4-6 point gains over standard instruction-tuned models and 10-13 point gains over NMT and multilingual LLM baselines on several evaluation metrics, thereby reducing reliance on large-scale parallel data.

Significance. If the experimental claims hold under proper controls, the work could offer a practical route for low-resource translation by showing how dictionary-derived instructions enable effective CIT. This addresses a genuine need in the field for methods that operate with limited parallel corpora.

major comments (2)
  1. [Abstract] Abstract: The headline result (4-6 pt gains over instruction-tuned models; 10-13 pt over NMT/multilingual LLMs) is stated without any information on the evaluation metrics, test sets, statistical significance, baseline implementations, or experimental controls. This absence prevents verification that the data support the claims.
  2. [§4] §4 (Experiments): The central claim that bilingual-dictionary-derived instructions suffice for CIT gains rests on an untested assumption; the section supplies no details on dictionary size/coverage, the mapping from entries to instruction templates, the base model, volume of any parallel data still used, number of CIT stages, or ablations isolating the dictionary component. Without these, the reported deltas cannot be attributed to the proposed mechanism.
minor comments (1)
  1. [Title] Title: 'Lingustic' is a typographical error and should read 'Linguistic'.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. The comments highlight areas where additional detail will improve verifiability of the claims. We address each point below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The headline result (4-6 pt gains over instruction-tuned models; 10-13 pt over NMT/multilingual LLMs) is stated without any information on the evaluation metrics, test sets, statistical significance, baseline implementations, or experimental controls. This absence prevents verification that the data support the claims.

    Authors: We agree that the abstract would benefit from greater specificity. In the revised manuscript we will expand the abstract to name the primary metrics (BLEU and chrF), identify the test sets, note that reported gains include statistical significance testing, and briefly characterize the baseline implementations and controls. These additions will make the headline claims directly verifiable from the abstract. revision: yes

  2. Referee: [§4] §4 (Experiments): The central claim that bilingual-dictionary-derived instructions suffice for CIT gains rests on an untested assumption; the section supplies no details on dictionary size/coverage, the mapping from entries to instruction templates, the base model, volume of any parallel data still used, number of CIT stages, or ablations isolating the dictionary component. Without these, the reported deltas cannot be attributed to the proposed mechanism.

    Authors: We accept that §4 currently omits several implementation details required to attribute gains to the dictionary-derived instructions and CIT procedure. We will revise the section to report dictionary size and coverage statistics, the exact template-mapping procedure, the base LLM, the quantity of parallel data retained, the number of CIT stages, and ablation experiments that isolate the dictionary component. These additions will allow readers to evaluate the contribution of the proposed mechanism. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical method proposal with no derivation chain

full rationale

The paper describes an empirical NLP approach: designing instructions from bilingual dictionary lexical/semantic features, then applying Continual Instruction Tuning (CIT) to fine-tune an LLM for Kupang Malay translation. No equations, parameters, or mathematical derivations are present in the provided abstract or described claims. Experimental deltas (4-6 pts over instruction tuning, 10-13 over NMT/LLMs) are reported as outcomes, not as quantities forced by construction from fitted inputs. No self-citations, uniqueness theorems, or ansatzes are invoked in the given text. This matches the reader's assessment of score 1.0; absence of detail on dictionary size or ablations is a reproducibility concern, not circularity. The derivation chain is empty, so no reduction to inputs occurs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract does not specify any free parameters, axioms, or invented entities; the method relies on standard LLM fine-tuning practices.

pith-pipeline@v0.9.1-grok · 5678 in / 1230 out tokens · 22399 ms · 2026-06-27T09:49:53.529762+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

87 extracted references · 49 canonical work pages

  1. [1]

    Sequence to Sequence Learning with Neural Networks , url =

    Sutskever, Ilya and Vinyals, Oriol and Le, Quoc V , booktitle =. Sequence to Sequence Learning with Neural Networks , url =

  2. [2]

    Learning phrase representations using RNN encoder ⚶decoder for statistical machine translation

    Cho, Kyunghyun and van Merri. Learning Phrase Representations using RNN Encoder -- Decoder for Statistical Machine Translation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing ( EMNLP ). 2014. doi:10.3115/v1/D14-1179

  3. [3]

    Attention is All you Need , url =

    Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, ukasz and Polosukhin, Illia , booktitle =. Attention is All you Need , url =

  4. [4]

    A Convolutional Encoder Model for Neural Machine Translation

    Gehring, Jonas and Auli, Michael and Grangier, David and Dauphin, Yann. A Convolutional Encoder Model for Neural Machine Translation. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2017. doi:10.18653/v1/P17-1012

  5. [5]

    Survey of Low-Resource Machine Translation

    Haddow, Barry and Bawden, Rachel and Miceli Barone, Antonio Valerio and Helcl, Jind r ich and Birch, Alexandra. Survey of Low-Resource Machine Translation. Computational Linguistics. 2022. doi:10.1162/coli_a_00446

  6. [6]

    In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies ( NAACL-HLT )

    Hedderich, Michael A. and Lange, Lukas and Adel, Heike and Str. A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021. doi:10.18653/v1/2021.naacl-main.201

  7. [7]

    N usa W rites: Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource Languages

    Cahyawijaya, Samuel and Lovenia, Holy and Koto, Fajri and Adhista, Dea and Dave, Emmanuel and Oktavianti, Sarah and Akbar, Salsabil and Lee, Jhonson and Shadieq, Nuur and Cenggoro, Tjeng Wawan and Linuwih, Hanung and Wilie, Bryan and Muridan, Galih and Winata, Genta and Moeljadi, David and Aji, Alham Fikri and Purwarianti, Ayu and Fung, Pascale. N usa W r...

  8. [8]

    N usa X : Multilingual Parallel Sentiment Dataset for 10 I ndonesian Local Languages

    Winata, Genta Indra and Aji, Alham Fikri and Cahyawijaya, Samuel and Mahendra, Rahmad and Koto, Fajri and Romadhony, Ade and Kurniawan, Kemal and Moeljadi, David and Prasojo, Radityo Eko and Fung, Pascale and Baldwin, Timothy and Lau, Jey Han and Sennrich, Rico and Ruder, Sebastian. N usa X : Multilingual Parallel Sentiment Dataset for 10 I ndonesian Loca...

  9. [9]

    Parallel Data, Tools and Interfaces in OPUS

    Tiedemann, J. Parallel Data, Tools and Interfaces in OPUS. Proceedings of the Eighth International Conference on Language Resources and Evaluation ( LREC `12). 2012

  10. [10]

    One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in I ndonesia

    Aji, Alham Fikri and Winata, Genta Indra and Koto, Fajri and Cahyawijaya, Samuel and Romadhony, Ade and Mahendra, Rahmad and Kurniawan, Kemal and Moeljadi, David and Prasojo, Radityo Eko and Baldwin, Timothy and Lau, Jey Han and Ruder, Sebastian. One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in I ndonesia. Proceed...

  11. [11]

    Cross-Lingual Machine Speech Chain for J avanese, S undanese, B alinese, and B ataks Speech Recognition and Synthesis

    Novitasari, Sashi and Tjandra, Andros and Sakti, Sakriani and Nakamura, Satoshi. Cross-Lingual Machine Speech Chain for J avanese, S undanese, B alinese, and B ataks Speech Recognition and Synthesis. Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resource...

  12. [12]

    Language Model Prior for Low-Resource Neural Machine Translation

    Baziotis, Christos and Haddow, Barry and Birch, Alexandra. Language Model Prior for Low-Resource Neural Machine Translation. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020. doi:10.18653/v1/2020.emnlp-main.615

  13. [13]

    Improving the Lexical Ability of Pretrained Language Models for Unsupervised Neural Machine Translation

    Chronopoulou, Alexandra and Stojanovski, Dario and Fraser, Alexander. Improving the Lexical Ability of Pretrained Language Models for Unsupervised Neural Machine Translation. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021. doi:10.18653/v1/2021.naacl-main.16

  14. [14]

    Bilingual Dictionary Based Neural Machine Translation without Using Parallel Sentences

    Duan, Xiangyu and Ji, Baijun and Jia, Hao and Tan, Min and Zhang, Min and Chen, Boxing and Luo, Weihua and Zhang, Yue. Bilingual Dictionary Based Neural Machine Translation without Using Parallel Sentences. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. doi:10.18653/v1/2020.acl-main.143

  15. [15]

    Translating Translationese: A Two-Step Approach to Unsupervised Machine Translation

    Pourdamghani, Nima and Aldarrab, Nada and Ghazvininejad, Marjan and Knight, Kevin and May, Jonathan. Translating Translationese: A Two-Step Approach to Unsupervised Machine Translation. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019. doi:10.18653/v1/P19-1293

  16. [16]

    ACM Comput

    Dabre, Raj and Chu, Chenhui and Kunchukuttan, Anoop , title =. ACM Comput. Surv. , month = sep, articleno =. 2020 , issue_date =. doi:10.1145/3406095 , abstract =

  17. [17]

    Unsupervised Pivot Translation for Distant Languages

    Leng, Yichong and Tan, Xu and Qin, Tao and Li, Xiang-Yang and Liu, Tie-Yan. Unsupervised Pivot Translation for Distant Languages. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019. doi:10.18653/v1/P19-1017

  18. [18]

    Transfer Learning for Low-Resource Neural Machine Translation

    Zoph, Barret and Yuret, Deniz and May, Jonathan and Knight, Kevin. Transfer Learning for Low-Resource Neural Machine Translation. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2016. doi:10.18653/v1/D16-1163

  19. [19]

    Language Models are Few-Shot Learners , url =

    Brown, Tom and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared D and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and Agarwal, Sandhini and Herbert-Voss, Ariel and Krueger, Gretchen and Henighan, Tom and Child, Rewon and Ramesh, Aditya and Ziegler, Daniel and Wu, Jeffrey and Winte...

  20. [20]

    2024 , eprint=

    GPT-4 Technical Report , author=. 2024 , eprint=

  21. [21]

    and Wang, Longyue

    Lyu, Chenyang and Du, Zefeng and Xu, Jitao and Duan, Yitao and Wu, Minghao and Lynn, Teresa and Aji, Alham Fikri and Wong, Derek F. and Wang, Longyue. A Paradigm Shift: The Future of Machine Translation Lies with Large Language Models. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (L...

  22. [22]

    and Xu, Yan and Fung, Pascale

    Bang, Yejin and Cahyawijaya, Samuel and Lee, Nayeon and Dai, Wenliang and Su, Dan and Wilie, Bryan and Lovenia, Holy and Ji, Ziwei and Yu, Tiezheng and Chung, Willy and Do, Quyet V. and Xu, Yan and Fung, Pascale. A Multitask, Multilingual, Multimodal Evaluation of C hat GPT on Reasoning, Hallucination, and Interactivity. Proceedings of the 13th Internatio...

  23. [23]

    2023 , eprint=

    Is ChatGPT A Good Translator? Yes With GPT-4 As The Engine , author=. 2023 , eprint=

  24. [24]

    and Neubig, Graham

    Robinson, Nathaniel and Ogayo, Perez and Mortensen, David R. and Neubig, Graham. C hat GPT MT : Competitive for High- (but Not Low-) Resource Languages. Proceedings of the Eighth Conference on Machine Translation. 2023. doi:10.18653/v1/2023.wmt-1.40

  25. [25]

    Exploring Very Low-Resource Translation with LLM s: The U niversity of E dinburgh`s Submission to A mericas NLP 2024 Translation Task

    Iyer, Vivek and Malik, Bhavitvya and Zhu, Wenhao and Stepachev, Pavel and Chen, Pinzhen and Haddow, Barry and Birch, Alexandra. Exploring Very Low-Resource Translation with LLM s: The U niversity of E dinburgh`s Submission to A mericas NLP 2024 Translation Task. Proceedings of the 4th Workshop on Natural Language Processing for Indigenous Languages of the...

  26. [26]

    2023 , eprint=

    Llama 2: Open Foundation and Fine-Tuned Chat Models , author=. 2023 , eprint=

  27. [27]

    2024 , eprint=

    MaLA-500: Massive Language Adaptation of Large Language Models , author=. 2024 , eprint=

  28. [28]

    2023 , eprint=

    Mistral 7B , author=. 2023 , eprint=

  29. [29]

    I nstruct A lign: High-and-Low Resource Language Alignment via Continual Crosslingual Instruction Tuning

    Cahyawijaya, Samuel and Lovenia, Holy and Yu, Tiezheng and Chung, Willy and Fung, Pascale. I nstruct A lign: High-and-Low Resource Language Alignment via Continual Crosslingual Instruction Tuning. Proceedings of the First Workshop in South East Asian Language Processing. 2023. doi:10.18653/v1/2023.sealp-1.5

  30. [30]

    Crosslingual Generalization through Multitask Finetuning

    Muennighoff, Niklas and Wang, Thomas and Sutawika, Lintang and Roberts, Adam and Biderman, Stella and Le Scao, Teven and Bari, M Saiful and Shen, Sheng and Yong, Zheng Xin and Schoelkopf, Hailey and Tang, Xiangru and Radev, Dragomir and Aji, Alham Fikri and Almubarak, Khalid and Albanie, Samuel and Alyafeai, Zaid and Webson, Albert and Raff, Edward and Ra...

  31. [31]

    Tuning LLM s with Contrastive Alignment Instructions for Machine Translation in Unseen, Low-resource Languages

    Mao, Zhuoyuan and Yu, Yen. Tuning LLM s with Contrastive Alignment Instructions for Machine Translation in Unseen, Low-resource Languages. Proceedings of the Seventh Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2024). 2024. doi:10.18653/v1/2024.loresmt-1.1

  32. [32]

    , editor =

    Dyer, Chris and Chahuneau, Victor and Smith, Noah A. , editor =. A. Proceedings of the 2013. 2013 , pages =

  33. [33]

    Transactions of the Association for Computational Linguistics , author =

    Eliciting the. Transactions of the Association for Computational Linguistics , author =. 2024 , note =. doi:10.1162/tacl_a_00655 , abstract =

  34. [34]

    2022 , eprint=

    Few-shot Learning with Multilingual Language Models , author=. 2022 , eprint=

  35. [35]

    Teaching Large Language Models to Translate on Low-resource Languages with Textbook Prompting

    Guo, Ping and Ren, Yubing and Hu, Yue and Li, Yunpeng and Zhang, Jiarui and Zhang, Xingsheng and Huang, Heyan. Teaching Large Language Models to Translate on Low-resource Languages with Textbook Prompting. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). 2024

  36. [36]

    Low-Resource Machine Translation through Retrieval-Augmented LLM Prompting: A Study on the M ambai Language

    Merx, Rapha. Low-Resource Machine Translation through Retrieval-Augmented LLM Prompting: A Study on the M ambai Language. Proceedings of the 2nd Workshop on Resources and Technologies for Indigenous, Endangered and Lesser-resourced Languages in Eurasia (EURALI) @ LREC-COLING 2024. 2024

  37. [37]

    Bitext Mining Using Distilled Sentence Representations for Low-Resource Languages

    Heffernan, Kevin and C elebi, Onur and Schwenk, Holger. Bitext Mining Using Distilled Sentence Representations for Low-Resource Languages. Findings of the Association for Computational Linguistics: EMNLP 2022. 2022. doi:10.18653/v1/2022.findings-emnlp.154

  38. [38]

    Towards Making the Most of C hat GPT for Machine Translation

    Peng, Keqin and Ding, Liang and Zhong, Qihuang and Shen, Li and Liu, Xuebo and Zhang, Min and Ouyang, Yuanxin and Tao, Dacheng. Towards Making the Most of C hat GPT for Machine Translation. Findings of the Association for Computational Linguistics: EMNLP 2023. 2023. doi:10.18653/v1/2023.findings-emnlp.373

  39. [39]

    Hire a Linguist!: Learning Endangered Languages in LLM s with In-Context Linguistic Descriptions

    Zhang, Kexun and Choi, Yee and Song, Zhenqiao and He, Taiqi and Wang, William Yang and Li, Lei. Hire a Linguist!: Learning Endangered Languages in LLM s with In-Context Linguistic Descriptions. Findings of the Association for Computational Linguistics: ACL 2024. 2024. doi:10.18653/v1/2024.findings-acl.925

  40. [40]

    Not All Languages Are Created Equal in LLM s: Improving Multilingual Capability by Cross-Lingual-Thought Prompting

    Huang, Haoyang and Tang, Tianyi and Zhang, Dongdong and Zhao, Xin and Song, Ting and Xia, Yan and Wei, Furu. Not All Languages Are Created Equal in LLM s: Improving Multilingual Capability by Cross-Lingual-Thought Prompting. Findings of the Association for Computational Linguistics: EMNLP 2023. 2023. doi:10.18653/v1/2023.findings-emnlp.826

  41. [41]

    Exploring Human-Like Translation Strategy with Large Language Models

    He, Zhiwei and Liang, Tian and Jiao, Wenxiang and Zhang, Zhuosheng and Yang, Yujiu and Wang, Rui and Tu, Zhaopeng and Shi, Shuming and Wang, Xing. Exploring Human-Like Translation Strategy with Large Language Models. Transactions of the Association for Computational Linguistics. 2024. doi:10.1162/tacl_a_00642

  42. [42]

    Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis , articleno =

    Rajbhandari, Samyam and Rasley, Jeff and Ruwase, Olatunji and He, Yuxiong , title =. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis , articleno =. 2020 , isbn =

  43. [43]

    1980--2014 , howpublished =

    Jakarta Field Station , title =. 1980--2014 , howpublished =

  44. [44]

    2005 , note =

    Yohanes Manhitu , title =. 2005 , note =

  45. [45]

    Transactions of the Association for Computational Linguistics , author =

    Bojanowski, Piotr and Grave, Edouard and Joulin, Armand and Mikolov, Tomas. Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics. 2017. doi:10.1162/tacl_a_00051

  46. [46]

    Cendol: Open Instruction-tuned Generative Large Language Models for I ndonesian Languages

    Cahyawijaya, Samuel and Lovenia, Holy and Koto, Fajri and Putri, Rifki and Cenggoro, Wawan and Lee, Jhonson and Akbar, Salsabil and Dave, Emmanuel and Nuurshadieq, Nuurshadieq and Mahendra, Muhammad and Putri, Rr and Wilie, Bryan and Winata, Genta and Aji, Alham and Purwarianti, Ayu and Fung, Pascale. Cendol: Open Instruction-tuned Generative Large Langua...

  47. [47]

    Sailor: Open Language Models for South- E ast A sia

    Dou, Longxu and Liu, Qian and Zeng, Guangtao and Guo, Jia and Zhou, Jiahui and Mao, Xin and Jin, Ziqi and Lu, Wei and Lin, Min. Sailor: Open Language Models for South- E ast A sia. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. 2024. doi:10.18653/v1/2024.emnlp-demo.45

  48. [48]

    2024 , eprint=

    Aya Expanse: Combining Research Breakthroughs for a New Multilingual Frontier , author=. 2024 , eprint=

  49. [49]

    2024 , eprint=

    SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages , author=. 2024 , eprint=

  50. [50]

    2023 , eprint=

    MADLAD-400: A Multilingual And Document-Level Large Audited Dataset , author=. 2023 , eprint=

  51. [51]

    2020 , eprint=

    Scaling Laws for Neural Language Models , author=. 2020 , eprint=

  52. [52]

    W iki M atrix: Mining 135 M Parallel Sentences in 1620 Language Pairs from W ikipedia

    Schwenk, Holger and Chaudhary, Vishrav and Sun, Shuo and Gong, Hongyu and Guzm \'a n, Francisco. W iki M atrix: Mining 135 M Parallel Sentences in 1620 Language Pairs from W ikipedia. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. 2021. doi:10.18653/v1/2021.eacl-main.115

  53. [53]

    CCM atrix: Mining Billions of High-Quality Parallel Sentences on the Web

    Schwenk, Holger and Wenzek, Guillaume and Edunov, Sergey and Grave, Edouard and Joulin, Armand and Fan, Angela. CCM atrix: Mining Billions of High-Quality Parallel Sentences on the Web. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume ...

  54. [54]

    2024 , eprint=

    Taxi1500: A Multilingual Dataset for Text Classification in 1500 Languages , author=. 2024 , eprint=

  55. [55]

    P an L ex: Building a Resource for Panlingual Lexical Translation

    Kamholz, David and Pool, Jonathan and Colowick, Susan. P an L ex: Building a Resource for Panlingual Lexical Translation. Proceedings of the Ninth International Conference on Language Resources and Evaluation ( LREC `14). 2014

  56. [56]

    2024 , eprint=

    Constructing and Expanding Low-Resource and Underrepresented Parallel Datasets for Indonesian Local Languages , author=. 2024 , eprint=

  57. [57]

    INTERFERENSI FONOLOGIS PENUTUR BAHASA MELAYU KUPANG KE DALAM BAHASA INDONESIA DI KOTA KUPANG , volume =

    Rafael, Agnes Maria Diana , year =. INTERFERENSI FONOLOGIS PENUTUR BAHASA MELAYU KUPANG KE DALAM BAHASA INDONESIA DI KOTA KUPANG , volume =. Jurnal Penelitian Humaniora , publisher =. doi:10.23917/humaniora.v20i1.7225 , number =

  58. [58]

    2003 , publisher=

    Kamus pengantar bahasa Kupang , author=. 2003 , publisher=

  59. [59]

    2023 , eprint=

    PolyLM: An Open Source Polyglot Large Language Model , author=. 2023 , eprint=

  60. [60]

    Kontektualisasi Direct Instruction Dalam Pembelajaran Sains , volume =

    Zahriani, Zahriani , year =. Kontektualisasi Direct Instruction Dalam Pembelajaran Sains , volume =. Lantanida Journal , publisher =. doi:10.22373/lj.v2i1.667 , number =

  61. [61]

    and Morris, Jared R

    Hughes, Charles A. and Morris, Jared R. and Therrien, William J. and Benson, Sarah K. , year =. Explicit Instruction: Historical and Contemporary Contexts , volume =. Learning Disabilities Research &; Practice , publisher =. doi:10.1111/ldrp.12142 , number =

  62. [62]

    Theory in Second Language Acquisition (Recognition of Concepts Toward Krashen’s Second Language Acquisition Theory for Five Main Hypotheses) , volume =

    Pauzan, Pauzan , year =. Theory in Second Language Acquisition (Recognition of Concepts Toward Krashen’s Second Language Acquisition Theory for Five Main Hypotheses) , volume =. Journal on Education , publisher =. doi:10.31004/joe.v6i4.6210 , number =

  63. [63]

    Nelson , journal =

    Deanna L. Nelson , journal =. A Context-Based Strategy for Teaching Vocabulary , urldate =

  64. [64]

    , year =

    Graves, Michael F. , year =. Vocabulary Learning and Instruction , volume =. doi:10.2307/1167219 , journal =

  65. [65]

    2023 , eprint=

    Dictionary-based Phrase-level Prompting of Large Language Models for Machine Translation , author=. 2023 , eprint=

  66. [66]

    and Raugh, Michael R

    Atkinson, Richard C. and Raugh, Michael R. , year =. An application of the mnemonic keyword method to the acquisition of a Russian vocabulary. , volume =. Journal of Experimental Psychology: Human Learning and Memory , publisher =. doi:10.1037/0278-7393.1.2.126 , number =

  67. [67]

    Rekrut , journal =

    Martha D. Rekrut , journal =. Effective Vocabulary Instruction , urldate =

  68. [68]

    and Ding, Liang and Chao, Lidia S

    Liu, Xuebo and Wang, Longyue and Wong, Derek F. and Ding, Liang and Chao, Lidia S. and Shi, Shuming and Tu, Zhaopeng. On the Copying Behaviors of Pre-Training for Neural Machine Translation. Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. 2021. doi:10.18653/v1/2021.findings-acl.373

  69. [69]

    Richard Landis and Gary G

    J. Richard Landis and Gary G. Koch , journal =. An Application of Hierarchical Kappa-type Statistics in the Assessment of Majority Agreement among Multiple Observers , urldate =

  70. [70]

    doi:10.5281/zenodo.4461265 , file =

    Grootendorst, Maarten , month = jan, year =. doi:10.5281/zenodo.4461265 , file =

  71. [71]

    doi:10.20944/preprints201908.0073.v1 , url =

    Prafull Sharma and Yingbo Li , title =. doi:10.20944/preprints201908.0073.v1 , url =

  72. [72]

    I ndo NLU : Benchmark and Resources for Evaluating I ndonesian Natural Language Understanding

    Wilie, Bryan and Vincentio, Karissa and Winata, Genta Indra and Cahyawijaya, Samuel and Li, Xiaohong and Lim, Zhi Yuan and Soleman, Sidik and Mahendra, Rahmad and Fung, Pascale and Bahar, Syafri and Purwarianti, Ayu. I ndo NLU : Benchmark and Resources for Evaluating I ndonesian Natural Language Understanding. Proceedings of the 1st Conference of the Asia...

  73. [73]

    Automatic

    Rose, Stuart and Engel, Dave and Cramer, Nick and Cowley, Wendy , year =. Automatic. Text. doi:10.1002/9780470689646.ch1 , note =

  74. [74]

    doi:https://doi.org/10.1016/j.ins.2019.09.013 , journal =

    Ricardo Campos and Vítor Mangaravite and Arian Pasquali and Alípio Jorge and Célia Nunes and Adam Jatowt , keywords =. YAKE! Keyword extraction from single documents using multiple local features , journal =. 2020 , issn =. doi:https://doi.org/10.1016/j.ins.2019.09.013 , url =

  75. [75]

    Bag of Tricks for Efficient Text Classification

    Joulin, Armand and Grave, Edouard and Bojanowski, Piotr and Mikolov, Tomas. Bag of Tricks for Efficient Text Classification. Proceedings of the 15th Conference of the E uropean Chapter of the Association for Computational Linguistics: Volume 2, Short Papers. 2017

  76. [76]

    B leu: a Method for Automatic Evaluation of Machine Translation

    Papineni, Kishore and Roukos, Salim and Ward, Todd and Zhu, Wei-Jing. B leu: a Method for Automatic Evaluation of Machine Translation. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. 2002. doi:10.3115/1073083.1073135

  77. [77]

    A Study of Translation Edit Rate with Targeted Human Annotation

    Snover, Matthew and Dorr, Bonnie and Schwartz, Rich and Micciulla, Linnea and Makhoul, John. A Study of Translation Edit Rate with Targeted Human Annotation. Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers. 2006

  78. [78]

    (2015) ’chrF: Character N-Gram F-Score for Automatic MT Evaluation’ in Proceedings of the Tenth Workshop on Statistical Machine Translation

    Popovi \'c , Maja. chr F : character n-gram F -score for automatic MT evaluation. Proceedings of the Tenth Workshop on Statistical Machine Translation. 2015. doi:10.18653/v1/W15-3049

  79. [79]

    (2018) ’ A Call for Clarity in Reporting BLEU Scores’ in Proceedings of the Third Conference on Machine Translation

    Post, Matt. A Call for Clarity in Reporting BLEU Scores. Proceedings of the Third Conference on Machine Translation: Research Papers. 2018. doi:10.18653/v1/W18-6319

  80. [80]

    ROUGE : A Package for Automatic Evaluation of Summaries

    Lin, Chin-Yew. ROUGE : A Package for Automatic Evaluation of Summaries. Text Summarization Branches Out. 2004

Showing first 80 references.