Cross-Lingual Transfer for Distantly Supervised and Low-resources Indonesian NER

Fariz Ikhwantri

arxiv: 1907.11158 · v1 · submitted 2019-07-25 · 💻 cs.CL

Cross-Lingual Transfer for Distantly Supervised and Low-resources Indonesian NER

Fariz Ikhwantri This is my paper

Pith reviewed 2026-05-24 16:08 UTC · model grok-4.3

classification 💻 cs.CL

keywords cross-lingual transfernamed entity recognitionlow-resource languagesIndonesianpre-trained language modelsdistant supervisionfine-tuningbi-directional language model

0 comments

The pith

Fine-tuning pre-trained language models from high-resource languages improves named entity recognition for low-resource Indonesian in both gold and silver data scenarios.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to establish that fine-tuning pre-trained language models from high-resource languages boosts named entity recognition performance for Indonesian when data is limited to either small gold-standard annotations or large but noisy distantly supervised silver data. This matters because many languages lack enough labeled examples or parallel task data to train effective models from scratch. The experiments show clear gains on the small gold sets and results that match supervised cross-lingual baselines on the silver sets. The work compares this transfer method against mono-lingual language models and part-of-speech tagging, using character-level bi-directional language model inputs for the downstream task.

Core claim

Fine-tuning pre-trained language models from high-resource languages in cross-lingual transfer scenarios yields significant improvement for small gold corpus and competitive results in large silver corpus compared to supervised cross-lingual transfer, enabling better performance without parallel annotation in the same task.

What carries the argument

Cross-lingual fine-tuning of pre-trained bi-directional language models using character-level input applied to the named entity recognition downstream task.

If this is right

Significant accuracy gains on small gold Indonesian NER datasets via cross-lingual transfer from high-resource models.
Competitive performance on large distantly supervised silver datasets relative to supervised cross-lingual methods.
Effective results without needing parallel task-specific annotations between source and target languages.
Advantage demonstrated over mono-lingual pre-trained models and part-of-speech tagging as alternative transfer sources.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same fine-tuning pattern could extend to other sequence labeling tasks such as part-of-speech tagging or dependency parsing in low-resource settings.
Languages with similar script and morphological traits to Indonesian might see comparable benefits from the identical high-resource model sources.
Combining this transfer step with additional distant supervision signals could further lower the amount of manual annotation required for new languages.

Load-bearing premise

Pre-trained language models from high-resource languages contain transferable knowledge that fine-tuning can adapt to Indonesian named entity recognition without any parallel annotations or task-specific data.

What would settle it

A held-out Indonesian NER test set where fine-tuned models show no F1 improvement or lower scores than mono-lingual baselines or training from scratch on the same gold or silver data.

Figures

Figures reproduced from arXiv: 1907.11158 by Fariz Ikhwantri.

**Figure 2.** Figure 2: Left image, Baseline scenario for supervised cross-lingual transfer learn [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Word-tag overlap rate breakdown between mono-lingual and cross [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗

read the original abstract

Manually annotated corpora for low-resource languages are usually small in quantity (gold), or large but distantly supervised (silver). Inspired by recent progress of injecting pre-trained language model (LM) on many Natural Language Processing (NLP) task, we proposed to fine-tune pre-trained language model from high-resources languages to low-resources languages to improve the performance of both scenarios. Our empirical experiment demonstrates significant improvement when fine-tuning pre-trained language model in cross-lingual transfer scenarios for small gold corpus and competitive results in large silver compare to supervised cross-lingual transfer, which will be useful when there is no parallel annotation in the same task to begin. We compare our proposed method of cross-lingual transfer using pre-trained LM to different sources of transfer such as mono-lingual LM and Part-of-Speech tagging (POS) in the downstream task of both large silver and small gold NER dataset by exploiting character-level input of bi-directional language model task.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This applies cross-lingual fine-tuning of a character-level biLM to Indonesian NER on both small gold and large silver data, with direct comparisons to monolingual LM and POS transfer, but the abstract gives no numbers so the size of any gains is unclear.

read the letter

The core claim is that fine-tuning a pre-trained character-level bidirectional LM from high-resource languages improves NER on small gold Indonesian data and stays competitive on large distantly supervised silver data, without needing parallel task annotations. They run explicit head-to-head tests against a monolingual LM baseline and a POS-transfer baseline on both data regimes. That setup is straightforward and addresses a real practical constraint for low-resource languages. The comparisons are the part that stands out; they make the cross-lingual LM route look like one usable option rather than an untested assumption. The work stays empirical and does not claim a new architecture or theoretical result. On the downside, the abstract states “significant improvement” and “competitive results” without any deltas, standard deviations, or even dataset sizes, so it is impossible to judge whether the gains are large enough to matter or just within noise. The full paper presumably contains the tables, but the visible evidence is thin. There is also no mention of statistical testing or ablation on the character-level choice versus other input representations. This is incremental work that extends existing cross-lingual LM transfer ideas to one more language and data type. It will mainly interest people who already work on Indonesian or similar Austronesian NER and need a concrete recipe for mixing gold and silver data. For a general NLP audience the contribution is modest. I would still send it to peer review because the experimental design is clear and the baselines are reasonable; the numbers can be checked and the paper revised if the gains are small. It does not look like a desk reject.

Referee Report

2 major / 2 minor

Summary. The paper proposes fine-tuning a character-level bi-directional language model pre-trained on high-resource languages for cross-lingual transfer to Indonesian NER. It evaluates the approach on small gold-standard and large distantly supervised (silver) datasets, claiming significant gains over monolingual LM and POS baselines for the gold setting and competitive performance versus supervised cross-lingual transfer for the silver setting, without requiring parallel task-specific annotations.

Significance. If the empirical results hold with the reported comparisons, the work would demonstrate a practical route for improving low-resource NER via high-resource pre-trained LMs, especially valuable when no parallel annotations exist. The explicit multi-source transfer comparisons (monolingual LM, POS) add evaluative clarity to the contribution.

major comments (2)

[Abstract] Abstract: the central claim of 'significant improvement' for small gold and 'competitive results' for large silver is asserted without any quantitative F1 scores, standard deviations, or statistical tests; the experimental section must supply these numbers and tests to substantiate the load-bearing empirical claim.
[§4 (Experiments)] Experimental setup: the description of the fine-tuning procedure, choice of pre-trained LM, and exact baseline implementations (including how supervised cross-lingual transfer is constructed) lacks sufficient detail on hyperparameters and data splits, preventing assessment of whether the reported gains are robust.

minor comments (2)

The paper should clarify the languages used for pre-training the biLM and discuss their typological distance from Indonesian.
Notation for the silver vs. gold datasets and the character-level input should be made consistent between the method and results sections.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. The comments highlight opportunities to strengthen the presentation of empirical results and experimental details. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim of 'significant improvement' for small gold and 'competitive results' for large silver is asserted without any quantitative F1 scores, standard deviations, or statistical tests; the experimental section must supply these numbers and tests to substantiate the load-bearing empirical claim.

Authors: We agree that the abstract would be strengthened by including quantitative support for the claims. In the revised version we will incorporate the key F1 scores (with standard deviations where reported) and reference the statistical tests from the experimental results to make the central claims explicit. revision: yes
Referee: [§4 (Experiments)] Experimental setup: the description of the fine-tuning procedure, choice of pre-trained LM, and exact baseline implementations (including how supervised cross-lingual transfer is constructed) lacks sufficient detail on hyperparameters and data splits, preventing assessment of whether the reported gains are robust.

Authors: We acknowledge that additional detail is needed for reproducibility. We will expand Section 4 to specify the fine-tuning hyperparameters, the exact pre-trained LM and its source, the data splits used, and the precise construction of the supervised cross-lingual transfer baselines. revision: yes

Circularity Check

0 steps flagged

No significant circularity; purely empirical study with explicit baselines

full rationale

The paper presents an empirical method for cross-lingual transfer via fine-tuning a character-level biLM, with direct comparisons to monolingual LM and POS transfer baselines on gold and silver Indonesian NER data. No equations, derivations, uniqueness theorems, or fitted parameters are invoked as load-bearing steps; the central claim rests on reported performance numbers rather than any reduction to self-defined quantities or self-citations. The weakest assumption (transferable knowledge without parallel data) is the hypothesis under test via those comparisons, not a hidden premise. This matches the default case of a self-contained empirical paper.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The paper is an empirical NLP transfer study; it rests on the standard assumption that pre-trained LMs encode transferable linguistic knowledge and on the domain assumption that distant supervision produces usable silver labels for NER.

axioms (2)

domain assumption Pre-trained language models from high-resource languages encode knowledge transferable to low-resource languages via fine-tuning
Invoked in the abstract when proposing cross-lingual transfer without parallel data.
domain assumption Distant supervision produces silver labels of sufficient quality for NER training
Required for the large silver corpus scenario described in the abstract.

pith-pipeline@v0.9.0 · 5685 in / 1332 out tokens · 21994 ms · 2026-05-24T16:08:52.233282+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages · 6 internal anchors

[1]

2016 International Conference on Advanced Computer Science and Information Systems (ICACSIS) pp

Alﬁna, I., Manurung, R., Fanany, M.I.: Dbpedia entities expansion in automatically building dataset for indonesian ner. 2016 International Conference on Advanced Computer Science and Information Systems (ICACSIS) pp. 335–340 (2016)

work page 2016
[2]

2017 International Conference on Advanced Computer Science and Information Systems (ICACSIS) pp

Alﬁna, I., Savitri, S., Fanany, M.I.: Modiﬁed dbpedia entities expansion for tagging automatically ner dataset. 2017 International Conference on Advanced Computer Science and Information Systems (ICACSIS) pp. 216–221 (2017)

work page 2017
[3]

In: Proceedings of the 2009 Workshop on The People’s Web Meets NLP: Collaboratively Constructed Semantic Resources

Balasuriya, D., Ringland, N., Nothman, J., Murphy, T., Curran, J.R.: Named entity recognition in wikipedia. In: Proceedings of the 2009 Workshop on The People’s Web Meets NLP: Collaboratively Constructed Semantic Resources. pp. 10–18. People’s Web ’09, Association for Computational Linguistics, Stroudsburg, PA, USA (2009)

work page 2009
[4]

In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Blevins, T., Levy, O., Zettlemoyer, L.: Deep rnns encode soft hierarchical syntax. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). pp. 14–19. Association for Computational Linguistics (2018), http://aclweb.org/anthology/P18-2003

work page 2018
[5]

In: EMNLP (2015)

Bowman, S.R., Angeli, G., Potts, C., Manning, C.D.: A large annotated corpus for learning natural language inference. In: EMNLP (2015)

work page 2015
[6]

Chelba, C., Mikolov, T., Schuster, M., Ge, Q., Brants, T., Koehn, P., Robinson, T.: One billion word benchmark for measuring progress in statistical language modeling (2013)

work page 2013
[7]

In: Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Cotterell, R., Duh, K.: Low-resource named entity recognition with cross-lingual, character-level neural conditional random ﬁelds. In: Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers). pp. 91–96. Asian Federation of Natural Language Processing (2017),http: //aclweb.org/anthology/I17-2016

work page 2017
[8]

In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05)

Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by gibbs sampling. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05). pp. 363–370. Association for Computational Linguistics (2005), http://www.aclweb. org/anthology/P05-1045

work page 2005
[9]

Gardner, M., Grus, J., Neumann, M., Tafjord, O., Dasigi, P., Liu, N.F., Peters, M., Schmitz, M., Zettlemoyer, L.S.: Allennlp: A deep semantic natural language processing platform. vol. arXiv:1803.07640 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[10]

In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Howard, J., Ruder, S.: Universal language model ﬁne-tuning for text classiﬁcation. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 328–339. Association for Computational Linguistics (2018), http://aclweb.org/anthology/P18-1031

work page 2018
[11]

Jozefowicz, R., Vinyals, O., Schuster, M., Shazeer, N., Wu, Y.: Exploring the limits of language modeling (2016), https://arxiv.org/pdf/1602.02410.pdf

work page internal anchor Pith review Pith/arXiv arXiv 2016
[12]

In: Proceedings of the Thirtieth AAAI Conference on Artiﬁcial Intelligence

Kim, Y., Jernite, Y., Sontag, D., Rush, A.M.: Character-aware neural language models. In: Proceedings of the Thirtieth AAAI Conference on Artiﬁcial Intelligence. pp. 2741–2749. AAAI’16, AAAI Press (2016)

work page 2016
[13]

Adam: A Method for Stochastic Optimization

Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. CoRR abs/1412.6980 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014
[14]

In: Proceedings of the 4th International Conference on Neural Information Processing Systems

Krogh, A., Hertz, J.A.: A simple weight decay can improve generalization. In: Proceedings of the 4th International Conference on Neural Information Processing Systems. pp. 950–957. NIPS’91, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1991), http://dl.acm.org/citation.cfm?id=2986916.2987033

work page arXiv 1991
[15]

Kurniawan, K., Aji, A.F.: Toward a standardized and more accurate indonesian part-of-speech tagging (2018)

work page 2018
[16]

In: Proceed- ings of the 2018 EMNLP Workshop W-NUT: The 4th Workshop on Noisy User- generated Text

Kurniawan, K., Louvan, S.: Empirical evaluation of character-based model on neural named-entity recognition in indonesian conversational texts. In: Proceed- ings of the 2018 EMNLP Workshop W-NUT: The 4th Workshop on Noisy User- generated Text. pp. 85–92. Association for Computational Linguistics (2018), http://aclweb.org/anthology/W18-6112

work page 2018
[17]

Empirical Evaluation of Character-Based Model on Neural Named-Entity Recognition in Indonesian Conversational Texts

Kurniawan, K., Louvan, S.: Empirical evaluation of character-based model on neural named-entity recognition in indonesian conversational texts. CoRR abs/1805.12291 (2018), http://arxiv.org/abs/1805.12291

work page internal anchor Pith review Pith/arXiv arXiv 2018
[18]

In: Proceedings of the 54th Annual Meeting of the Association for Com- putational Linguistics (Volume 1: Long Papers)

Ma, X., Hovy, E.: End-to-end sequence labeling via bi-directional lstm-cnns- crf. In: Proceedings of the 54th Annual Meeting of the Association for Com- putational Linguistics (Volume 1: Long Papers). pp. 1064–1074. Association for Computational Linguistics (2016). https://doi.org/10.18653/v1/P16-1101, http: //aclweb.org/anthology/P16-1101

work page doi:10.18653/v1/p16-1101 2016
[19]

In: Proceed- ings of the 55th Annual Meeting of the Association for Computational Lin- guistics (Volume 1: Long Papers)

Ni, J., Dinu, G., Florian, R.: Weakly supervised cross-lingual named entity re- cognition via eﬀective annotation and representation projection. In: Proceed- ings of the 55th Annual Meeting of the Association for Computational Lin- guistics (Volume 1: Long Papers). pp. 1470–1480. Association for Computational Linguistics (2017). https://doi.org/10.18653/v...

work page doi:10.18653/v1/p17-1135 2017
[20]

In: Proceedings of the Australasian Language Technology Associ- ation Workshop 2008

Nothman, J., Curran, J.R., Murphy, T.: Transforming wikipedia into named entity training data. In: Proceedings of the Australasian Language Technology Associ- ation Workshop 2008. pp. 124–132 (2008), http://www.aclweb.org/anthology/ U08-1016

work page 2008
[21]

Computational Linguistics 31, 71–106 (2005)

Palmer, M., Kingsbury, P., Gildea, D.: The proposition bank: An annotated corpus of semantic roles. Computational Linguistics 31, 71–106 (2005)

work page 2005
[22]

In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Pan, X., Zhang, B., May, J., Nothman, J., Knight, K., Ji, H.: Cross-lingual name tagging and linking for 282 languages. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 1946–1958. Association for Computational Linguistics (2017). https://doi.org/10.18653/v1/P17-1178, http://aclweb.org...

work page doi:10.18653/v1/p17-1178 1946
[23]

Pennington, R

Pennington, J., Socher, R., Manning, C.: Glove: Global vectors for word representa- tion. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Lan- guage Processing (EMNLP). pp. 1532–1543. Association for Computational Lin- guistics (2014). https://doi.org/10.3115/v1/D14-1162, http://www.aclweb.org/ anthology/D14-1162

work page doi:10.3115/v1/d14-1162 2014
[24]

Evaluation of sentence embeddings in downstream and linguistic probing tasks

Perone, C.S., Silveira, R., Paula, T.S.: Evaluation of sentence embeddings in down- stream and linguistic probing tasks. CoRR abs/1806.06259 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[25]

In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Peters, M., Ammar, W., Bhagavatula, C., Power, R.: Semi-supervised se- quence tagging with bidirectional language models. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 1756–1765. Association for Computational Linguistics (2017). https://doi.org/10.18653/v1/P17-1161, http://aclweb.o...

work page doi:10.18653/v1/p17-1161 2017
[26]

In: Proc

Peters, M.E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., Zettlemoyer, L.: Deep contextualized word representations. In: Proc. of NAACL (2018)

work page 2018
[27]

SQ u AD : 100,000+ questions for machine comprehension of text

Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: Squad: 100,000+ questions for machine comprehension of text. In: Proceedings of the 2016 Conference on Em- pirical Methods in Natural Language Processing. pp. 2383–2392. Association for Computational Linguistics (2016). https://doi.org/10.18653/v1/D16-1264, http: //www.aclweb.org/anthology/D16-1264

work page doi:10.18653/v1/d16-1264 2016
[28]

2014 International Conference on Asian Language Processing (IALP) pp

Rashel, F., Luthﬁ, A., Dinakaramani, A., Manurung, R.: Building an indonesian rule-based part-of-speech tagger. 2014 International Conference on Asian Language Processing (IALP) pp. 70–73 (2014)

work page 2014
[29]

In: Proceed- ings of the 55th Annual Meeting of the Association for Computational Linguist- ics (Volume 1: Long Papers)

Rei, M.: Semi-supervised multitask learning for sequence labeling. In: Proceed- ings of the 55th Annual Meeting of the Association for Computational Linguist- ics (Volume 1: Long Papers). pp. 2121–2130. Association for Computational Lin- guistics (2017). https://doi.org/10.18653/v1/P17-1194, http://www.aclweb.org/ anthology/P17-1194

work page doi:10.18653/v1/p17-1194 2017
[30]

Robins, A.V.: Catastrophic forgetting, rehearsal and pseudorehearsal. Connect. Sci. 7, 123–146 (1995)

work page 1995
[31]

In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C.D., Ng, A., Potts, C.: Recursive deep models for semantic compositionality over a sentiment tree- bank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. pp. 1631–1642. Association for Computational Linguistics (2013), http://www.aclweb.org/anthology/D13-1170

work page 2013
[32]

Srivastava, R.K., Greﬀ, K., Schmidhuber, J.: Highway networks (2015)

work page 2015
[33]

Institute for Logic, Language and Computation, Universiteit van Amsterdam, The Netherlands (2003)

Tala, F.Z.: A study of stemming eﬀects on information retrieval in bahasa indone- sia. Institute for Logic, Language and Computation, Universiteit van Amsterdam, The Netherlands (2003)

work page 2003
[34]

In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003 - Volume 4

Tjong Kim Sang, E.F., De Meulder, F.: Introduction to the conll-2003 shared task: Language-independent named entity recognition. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003 - Volume 4. pp. 142–147. CONLL ’03, Association for Computational Linguistics, Stroudsburg, PA, USA (2003)

work page 2003
[35]

In: Proceedings of the 2018 Con- ference on Empirical Methods in Natural Language Processing

Xie, J., Yang, Z., Neubig, G., Smith, N.A., Carbonell, J.: Neural cross-lingual named entity recognition with minimal resources. In: Proceedings of the 2018 Con- ference on Empirical Methods in Natural Language Processing. pp. 369–379. As- sociation for Computational Linguistics (2018), http://aclweb.org/anthology/ D18-1034

work page 2018
[36]

Transfer Learning for Sequence Tagging with Hierarchical Recurrent Networks

Yang, Z., Salakhutdinov, R., Cohen, W.W.: Transfer learning for sequence tagging with hierarchical recurrent networks. CoRR abs/1703.06345 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016

[1] [1]

2016 International Conference on Advanced Computer Science and Information Systems (ICACSIS) pp

Alﬁna, I., Manurung, R., Fanany, M.I.: Dbpedia entities expansion in automatically building dataset for indonesian ner. 2016 International Conference on Advanced Computer Science and Information Systems (ICACSIS) pp. 335–340 (2016)

work page 2016

[2] [2]

2017 International Conference on Advanced Computer Science and Information Systems (ICACSIS) pp

Alﬁna, I., Savitri, S., Fanany, M.I.: Modiﬁed dbpedia entities expansion for tagging automatically ner dataset. 2017 International Conference on Advanced Computer Science and Information Systems (ICACSIS) pp. 216–221 (2017)

work page 2017

[3] [3]

In: Proceedings of the 2009 Workshop on The People’s Web Meets NLP: Collaboratively Constructed Semantic Resources

Balasuriya, D., Ringland, N., Nothman, J., Murphy, T., Curran, J.R.: Named entity recognition in wikipedia. In: Proceedings of the 2009 Workshop on The People’s Web Meets NLP: Collaboratively Constructed Semantic Resources. pp. 10–18. People’s Web ’09, Association for Computational Linguistics, Stroudsburg, PA, USA (2009)

work page 2009

[4] [4]

In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Blevins, T., Levy, O., Zettlemoyer, L.: Deep rnns encode soft hierarchical syntax. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). pp. 14–19. Association for Computational Linguistics (2018), http://aclweb.org/anthology/P18-2003

work page 2018

[5] [5]

In: EMNLP (2015)

Bowman, S.R., Angeli, G., Potts, C., Manning, C.D.: A large annotated corpus for learning natural language inference. In: EMNLP (2015)

work page 2015

[6] [6]

Chelba, C., Mikolov, T., Schuster, M., Ge, Q., Brants, T., Koehn, P., Robinson, T.: One billion word benchmark for measuring progress in statistical language modeling (2013)

work page 2013

[7] [7]

In: Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Cotterell, R., Duh, K.: Low-resource named entity recognition with cross-lingual, character-level neural conditional random ﬁelds. In: Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers). pp. 91–96. Asian Federation of Natural Language Processing (2017),http: //aclweb.org/anthology/I17-2016

work page 2017

[8] [8]

In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05)

Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by gibbs sampling. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05). pp. 363–370. Association for Computational Linguistics (2005), http://www.aclweb. org/anthology/P05-1045

work page 2005

[9] [9]

Gardner, M., Grus, J., Neumann, M., Tafjord, O., Dasigi, P., Liu, N.F., Peters, M., Schmitz, M., Zettlemoyer, L.S.: Allennlp: A deep semantic natural language processing platform. vol. arXiv:1803.07640 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[10] [10]

In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Howard, J., Ruder, S.: Universal language model ﬁne-tuning for text classiﬁcation. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 328–339. Association for Computational Linguistics (2018), http://aclweb.org/anthology/P18-1031

work page 2018

[11] [11]

Jozefowicz, R., Vinyals, O., Schuster, M., Shazeer, N., Wu, Y.: Exploring the limits of language modeling (2016), https://arxiv.org/pdf/1602.02410.pdf

work page internal anchor Pith review Pith/arXiv arXiv 2016

[12] [12]

In: Proceedings of the Thirtieth AAAI Conference on Artiﬁcial Intelligence

Kim, Y., Jernite, Y., Sontag, D., Rush, A.M.: Character-aware neural language models. In: Proceedings of the Thirtieth AAAI Conference on Artiﬁcial Intelligence. pp. 2741–2749. AAAI’16, AAAI Press (2016)

work page 2016

[13] [13]

Adam: A Method for Stochastic Optimization

Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. CoRR abs/1412.6980 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014

[14] [14]

In: Proceedings of the 4th International Conference on Neural Information Processing Systems

Krogh, A., Hertz, J.A.: A simple weight decay can improve generalization. In: Proceedings of the 4th International Conference on Neural Information Processing Systems. pp. 950–957. NIPS’91, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1991), http://dl.acm.org/citation.cfm?id=2986916.2987033

work page arXiv 1991

[15] [15]

Kurniawan, K., Aji, A.F.: Toward a standardized and more accurate indonesian part-of-speech tagging (2018)

work page 2018

[16] [16]

In: Proceed- ings of the 2018 EMNLP Workshop W-NUT: The 4th Workshop on Noisy User- generated Text

Kurniawan, K., Louvan, S.: Empirical evaluation of character-based model on neural named-entity recognition in indonesian conversational texts. In: Proceed- ings of the 2018 EMNLP Workshop W-NUT: The 4th Workshop on Noisy User- generated Text. pp. 85–92. Association for Computational Linguistics (2018), http://aclweb.org/anthology/W18-6112

work page 2018

[17] [17]

Empirical Evaluation of Character-Based Model on Neural Named-Entity Recognition in Indonesian Conversational Texts

Kurniawan, K., Louvan, S.: Empirical evaluation of character-based model on neural named-entity recognition in indonesian conversational texts. CoRR abs/1805.12291 (2018), http://arxiv.org/abs/1805.12291

work page internal anchor Pith review Pith/arXiv arXiv 2018

[18] [18]

In: Proceedings of the 54th Annual Meeting of the Association for Com- putational Linguistics (Volume 1: Long Papers)

Ma, X., Hovy, E.: End-to-end sequence labeling via bi-directional lstm-cnns- crf. In: Proceedings of the 54th Annual Meeting of the Association for Com- putational Linguistics (Volume 1: Long Papers). pp. 1064–1074. Association for Computational Linguistics (2016). https://doi.org/10.18653/v1/P16-1101, http: //aclweb.org/anthology/P16-1101

work page doi:10.18653/v1/p16-1101 2016

[19] [19]

In: Proceed- ings of the 55th Annual Meeting of the Association for Computational Lin- guistics (Volume 1: Long Papers)

Ni, J., Dinu, G., Florian, R.: Weakly supervised cross-lingual named entity re- cognition via eﬀective annotation and representation projection. In: Proceed- ings of the 55th Annual Meeting of the Association for Computational Lin- guistics (Volume 1: Long Papers). pp. 1470–1480. Association for Computational Linguistics (2017). https://doi.org/10.18653/v...

work page doi:10.18653/v1/p17-1135 2017

[20] [20]

In: Proceedings of the Australasian Language Technology Associ- ation Workshop 2008

Nothman, J., Curran, J.R., Murphy, T.: Transforming wikipedia into named entity training data. In: Proceedings of the Australasian Language Technology Associ- ation Workshop 2008. pp. 124–132 (2008), http://www.aclweb.org/anthology/ U08-1016

work page 2008

[21] [21]

Computational Linguistics 31, 71–106 (2005)

Palmer, M., Kingsbury, P., Gildea, D.: The proposition bank: An annotated corpus of semantic roles. Computational Linguistics 31, 71–106 (2005)

work page 2005

[22] [22]

In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Pan, X., Zhang, B., May, J., Nothman, J., Knight, K., Ji, H.: Cross-lingual name tagging and linking for 282 languages. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 1946–1958. Association for Computational Linguistics (2017). https://doi.org/10.18653/v1/P17-1178, http://aclweb.org...

work page doi:10.18653/v1/p17-1178 1946

[23] [23]

Pennington, R

Pennington, J., Socher, R., Manning, C.: Glove: Global vectors for word representa- tion. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Lan- guage Processing (EMNLP). pp. 1532–1543. Association for Computational Lin- guistics (2014). https://doi.org/10.3115/v1/D14-1162, http://www.aclweb.org/ anthology/D14-1162

work page doi:10.3115/v1/d14-1162 2014

[24] [24]

Evaluation of sentence embeddings in downstream and linguistic probing tasks

Perone, C.S., Silveira, R., Paula, T.S.: Evaluation of sentence embeddings in down- stream and linguistic probing tasks. CoRR abs/1806.06259 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[25] [25]

In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Peters, M., Ammar, W., Bhagavatula, C., Power, R.: Semi-supervised se- quence tagging with bidirectional language models. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 1756–1765. Association for Computational Linguistics (2017). https://doi.org/10.18653/v1/P17-1161, http://aclweb.o...

work page doi:10.18653/v1/p17-1161 2017

[26] [26]

In: Proc

Peters, M.E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., Zettlemoyer, L.: Deep contextualized word representations. In: Proc. of NAACL (2018)

work page 2018

[27] [27]

SQ u AD : 100,000+ questions for machine comprehension of text

Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: Squad: 100,000+ questions for machine comprehension of text. In: Proceedings of the 2016 Conference on Em- pirical Methods in Natural Language Processing. pp. 2383–2392. Association for Computational Linguistics (2016). https://doi.org/10.18653/v1/D16-1264, http: //www.aclweb.org/anthology/D16-1264

work page doi:10.18653/v1/d16-1264 2016

[28] [28]

2014 International Conference on Asian Language Processing (IALP) pp

Rashel, F., Luthﬁ, A., Dinakaramani, A., Manurung, R.: Building an indonesian rule-based part-of-speech tagger. 2014 International Conference on Asian Language Processing (IALP) pp. 70–73 (2014)

work page 2014

[29] [29]

In: Proceed- ings of the 55th Annual Meeting of the Association for Computational Linguist- ics (Volume 1: Long Papers)

Rei, M.: Semi-supervised multitask learning for sequence labeling. In: Proceed- ings of the 55th Annual Meeting of the Association for Computational Linguist- ics (Volume 1: Long Papers). pp. 2121–2130. Association for Computational Lin- guistics (2017). https://doi.org/10.18653/v1/P17-1194, http://www.aclweb.org/ anthology/P17-1194

work page doi:10.18653/v1/p17-1194 2017

[30] [30]

Robins, A.V.: Catastrophic forgetting, rehearsal and pseudorehearsal. Connect. Sci. 7, 123–146 (1995)

work page 1995

[31] [31]

In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C.D., Ng, A., Potts, C.: Recursive deep models for semantic compositionality over a sentiment tree- bank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. pp. 1631–1642. Association for Computational Linguistics (2013), http://www.aclweb.org/anthology/D13-1170

work page 2013

[32] [32]

Srivastava, R.K., Greﬀ, K., Schmidhuber, J.: Highway networks (2015)

work page 2015

[33] [33]

Institute for Logic, Language and Computation, Universiteit van Amsterdam, The Netherlands (2003)

Tala, F.Z.: A study of stemming eﬀects on information retrieval in bahasa indone- sia. Institute for Logic, Language and Computation, Universiteit van Amsterdam, The Netherlands (2003)

work page 2003

[34] [34]

In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003 - Volume 4

Tjong Kim Sang, E.F., De Meulder, F.: Introduction to the conll-2003 shared task: Language-independent named entity recognition. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003 - Volume 4. pp. 142–147. CONLL ’03, Association for Computational Linguistics, Stroudsburg, PA, USA (2003)

work page 2003

[35] [35]

In: Proceedings of the 2018 Con- ference on Empirical Methods in Natural Language Processing

Xie, J., Yang, Z., Neubig, G., Smith, N.A., Carbonell, J.: Neural cross-lingual named entity recognition with minimal resources. In: Proceedings of the 2018 Con- ference on Empirical Methods in Natural Language Processing. pp. 369–379. As- sociation for Computational Linguistics (2018), http://aclweb.org/anthology/ D18-1034

work page 2018

[36] [36]

Transfer Learning for Sequence Tagging with Hierarchical Recurrent Networks

Yang, Z., Salakhutdinov, R., Cohen, W.W.: Transfer learning for sequence tagging with hierarchical recurrent networks. CoRR abs/1703.06345 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016