On the Definition of Japanese Word

Yugo Murawaki

arxiv: 1906.09719 · v1 · pith:4ACZ4CXInew · submitted 2019-06-24 · 💻 cs.CL

On the Definition of Japanese Word

Yugo Murawaki This is my paper

Pith reviewed 2026-05-25 17:58 UTC · model grok-4.3

classification 💻 cs.CL

keywords Japanese word definitionsyntactic wordsUniversal Dependenciesdependency annotationShort Unit Wordsbunsetsu

0 comments

The pith

Short Unit Words used in UD Japanese treebanks do not qualify as syntactic words under the annotation guidelines.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper investigates the proper definition of syntactic words for Japanese in the context of Universal Dependencies annotation. It concludes that the Short Unit Words adopted in existing Japanese UD treebanks do not match the syntactic word concept outlined in the guidelines. This choice deviates from the traditional use of bunsetsu units in Japanese dependency parsing. The author points out that while some linguistic definitions of Japanese words exist, they have not been applied to corpus annotation. Using these unfamiliar criteria would involve weighing costs against benefits in annotation practice.

Core claim

The annotation guidelines for Universal Dependencies require syntactic words as basic units, but Short Unit Words in Japanese UD treebanks are not syntactic words as specified by those guidelines.

What carries the argument

The UD guidelines' definition of syntactic words, applied to evaluate whether Short Unit Words qualify in Japanese.

If this is right

Dependency parsing models trained on current Japanese UD data would use units that do not align with the intended syntactic words.
Annotation consistency across languages in UD could be compromised if Japanese uses non-qualifying units.
Future revisions might need to adopt different word units to comply with the guidelines.
Non-mainstream linguistic definitions of Japanese words could be considered for annotation despite their unfamiliarity.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Other languages with ambiguous word boundaries might face similar challenges in applying UD guidelines.
Adopting linguistic word definitions could improve cross-lingual comparability in dependency annotations.
Testing the application of word definitions on sample sentences could reveal practical annotation issues.

Load-bearing premise

The UD guidelines provide a sufficiently clear, language-independent definition of syntactic words that can be applied to Japanese.

What would settle it

A direct comparison showing that Short Unit Words satisfy the UD syntactic word criteria in specific Japanese sentences would falsify the claim.

Figures

Figures reproduced from arXiv: 1906.09719 by Yugo Murawaki.

read the original abstract

The annotation guidelines for Universal Dependencies (UD) stipulate that the basic units of dependency annotation are syntactic words, but it is not clear what are syntactic words in Japanese. Departing from the long tradition of using phrasal units called bunsetsu for dependency parsing, the current UD Japanese treebanks adopt the Short Unit Words. However, we argue that they are not syntactic word as specified by the annotation guidelines. Although we find non-mainstream attempts to linguistically define Japanese words, such definitions have never been applied to corpus annotation. We discuss the costs and benefits of adopting the rather unfamiliar criteria.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper flags that Short Unit Words in Japanese UD treebanks likely fail the syntactic word test in the guidelines, but offers little direct evidence or comparison to back the claim.

read the letter

The main thing to know is that the author argues current Japanese UD treebanks use Short Unit Words that do not count as syntactic words under the UD guidelines, and that some non-mainstream linguistic definitions of Japanese words exist but have never been tried in annotation. The paper contrasts this with the older bunsetsu tradition and weighs the costs and benefits of changing course. It does a reasonable job of surfacing the mismatch and noting that the UD shift away from phrasal units creates a specific problem for Japanese given its lack of spaces and agglutinative morphology. Mentioning unused definitions from the literature is a fair observation for anyone who works on these resources. The soft spots are clear and central. The abstract states the claim without showing how the guidelines' syntactic criteria are actually applied to SUW examples or providing any data on annotation differences. The guidelines are framed as general principles, so ruling out SUW may rest on an interpretive step rather than a direct contradiction, which aligns with the stress-test note. Without explicit comparisons or derivations in the text, the argument stays at the level of assertion. This paper is for UD treebank maintainers and researchers dealing with word segmentation in languages without standard boundaries. A reader focused on annotation guidelines would find the discussion of options useful. It deserves a serious referee because the practical question it raises affects existing treebanks and future design choices, even though the current version would need added evidence to hold up under review. I would send it to referees for feedback on the strength of the mismatch claim.

Referee Report

2 major / 2 minor

Summary. The manuscript argues that Short Unit Words (SUW) adopted in current UD Japanese treebanks do not qualify as syntactic words under the UD annotation guidelines, which prioritize syntactic criteria over orthographic or traditional phrasal units such as bunsetsu. It reviews non-mainstream linguistic attempts to define Japanese words, notes that such definitions have not been used in corpus annotation, and discusses costs and benefits of different criteria.

Significance. If substantiated, the result would identify an inconsistency between UD guidelines and Japanese treebank practice, with implications for cross-linguistic comparability of syntactic annotations. The discussion of alternative word definitions could inform guideline revisions for agglutinative languages, but the paper supplies no new data, treebank comparisons, or explicit criterion applications to support its central claim.

major comments (2)

[Abstract] Abstract and introduction: the claim that SUW 'are not syntactic word as specified by the annotation guidelines' is asserted without quoting or applying any specific UD guideline criterion (e.g., the syntactic-word definition in the UD guidelines) to Japanese examples, leaving the mismatch interpretive rather than demonstrated.
[UD guidelines discussion] Discussion of UD guidelines: the argument presupposes that the guidelines contain an operational, language-independent definition of syntactic words sufficient to exclude SUW, yet provides no direct test of this assumption against Japanese morphological structure or bunsetsu units.

minor comments (2)

[Abstract] Abstract: grammatical agreement error ('syntactic word' should read 'syntactic words').
[Abstract] Abstract: the phrase 'we find non-mainstream attempts' is imprecise; name the specific linguistic works referenced.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We address each major comment point by point below and indicate where revisions will be made to strengthen the explicit demonstration of our claims.

read point-by-point responses

Referee: [Abstract] Abstract and introduction: the claim that SUW 'are not syntactic word as specified by the annotation guidelines' is asserted without quoting or applying any specific UD guideline criterion (e.g., the syntactic-word definition in the UD guidelines) to Japanese examples, leaving the mismatch interpretive rather than demonstrated.

Authors: We agree that the abstract and introduction assert the central claim without direct quotation or application of specific UD criteria. Although the full manuscript references the guidelines' syntactic priorities, we will revise these sections to include explicit quotations from the UD syntactic word definition and apply the criteria to concrete Japanese examples involving Short Unit Words, thereby demonstrating the mismatch rather than leaving it interpretive. revision: yes
Referee: [UD guidelines discussion] Discussion of UD guidelines: the argument presupposes that the guidelines contain an operational, language-independent definition of syntactic words sufficient to exclude SUW, yet provides no direct test of this assumption against Japanese morphological structure or bunsetsu units.

Authors: The manuscript contrasts the UD guidelines' syntactic criteria with traditional Japanese units such as bunsetsu and reviews alternative linguistic definitions. We acknowledge the value of a more direct test. In revision, we will add explicit applications of the UD syntactic word criteria to Japanese morphological structures and bunsetsu units, providing the requested direct comparison. revision: yes

Circularity Check

0 steps flagged

No significant circularity; argument applies external UD guidelines to Japanese units

full rationale

The paper's central claim compares Short Unit Words against the syntactic word definition supplied by the external Universal Dependencies annotation guidelines and prior linguistic literature. No equations, fitted parameters, self-definitional loops, or load-bearing self-citations appear; the derivation consists of an interpretive mismatch between an independent external standard and the chosen annotation units. This is self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that UD guidelines define syntactic words in a way that can be applied to Japanese and that Short Unit Words have been chosen without satisfying that definition. No free parameters or invented entities are introduced.

axioms (1)

domain assumption UD annotation guidelines stipulate that the basic units are syntactic words whose definition is language-independent enough to apply to Japanese.
Directly stated in the opening sentence of the abstract.

pith-pipeline@v0.9.0 · 5610 in / 1069 out tokens · 20100 ms · 2026-05-25T17:58:35.660704+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages · 1 internal anchor

[1]

Masayuki Asahara, Hiroshi Kanayama, Takaaki Tanaka, Yusuke Miyao, Sumire Uematsu, Shinsuke Mori, Yuji Matsumoto, Mai Omura, and Yugo Murawaki. 2018. Universal D ependencies version 2 for J apanese. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

work page 2018
[2]

Masayuki Asahara and Yuji Matsumoto. 2016. BCCWJ-DepPara : A syntactic annotation treebank on the ` B alanced C orpus of C ontemporary W ritten J apanese'. In Proceedings of the 12th Workshop on Asian Langauge Resources (ALR12), pages 49--58

work page 2016
[3]

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv:abs/1409.0473

work page internal anchor Pith review Pith/arXiv arXiv 2014
[4]

Daisuke Bekki. 2010. Nihon-go bunp\= o no keishiki riron . Kurosio Publishers. (in Japanese)

work page 2010
[5]

Sabine Buchholz and Erwin Marsi. 2006. https://www.aclweb.org/anthology/W06-2920 C o NLL - X shared task on multilingual dependency parsing . In Proceedings of the Tenth Conference on Computational Natural Language Learning ( C o NLL -X) , pages 149--164

work page 2006
[6]

Noam Chomsky. 1970. Remarks on nominalization. In Roderick A. Jacobs and Peter S. Rosenbaum, editors, Readings in English Transformational Grammar, pages 184--221. Ginn

work page 1970
[7]

Cohen, Dipanjan Das, and Noah A

Shay B. Cohen, Dipanjan Das, and Noah A. Smith. 2011. https://www.aclweb.org/anthology/D11-1005 Unsupervised structure prediction with non-parallel multilingual guidance . In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pages 50--61

work page 2011
[8]

William Croft, Dawn Nordquist, Katherine Looney, and Michael Regan. 2017. Linguistic typology meets U niversal D ependencies. In Proceedings of the 15th International Workshop on Treebanks and Linguistic Theories (TLT15), pages 63--75

work page 2017
[9]

Anna-Maria Di Sciullo and Edwin Williams. 1987. On the Definition of Word. MIT Press

work page 1987
[10]

Shinkichi Hahimoto. 1933. Kokugo-h\= o y\= o setsu . Meiji Shoin. (in Japanese)

work page 1933
[11]

Jan Haji c , Massimiliano Ciaramita, Richard Johansson, Daisuke Kawahara, Maria Ant \`o nia Mart \' , Llu \' s M \`a rquez, Adam Meyers, Joakim Nivre, Sebastian Pad \'o , Jan S t e p \'a nek, Pavel Stra n \'a k, Mihai Surdeanu, Nianwen Xue, and Yi Zhang. 2009. https://www.aclweb.org/anthology/W09-1201 The C o NLL -2009 shared task: Syntactic and semantic ...

work page 2009
[12]

Masatsugu Hangyo, Daisuke Kawahara, and Sadao Kurohashi. 2012. Building a diverse document leads corpus annotated with semantic relations. In Proceedings of the 26th Pacific Asia Conference on Language, Information, and Computation, pages 535--544

work page 2012
[13]

Martin Haspelmath. 2010. https://doi.org/10.1353/lan.2010.0021 Comparative concepts and descriptive categories in crosslinguistic studies . Language, 86(3):663--687

work page doi:10.1353/lan.2010.0021 2010
[14]

Martin Haspelmath. 2011. https://doi.org/10.1515/flin.2011.002 The indeterminacy of word segmentation and the nature of morphology and syntax . Folia Linguistica, 45(1):31--80

work page doi:10.1515/flin.2011.002 2011
[15]

Martin Haspelmath. 2015. https://doi.org/10.1515/9781614514510-009 Defining vs. diagnosing linguistic categories: A case study of clitic phenomena . In Joanna Blaszczak, Dorota Klimek-Jankowska, and Krzysztof Migdalski, editors, How Categorical are Categories? New Approaches to the Old Questions of Noun, Verb, and Adjective, pages 273--304. De Gruyter Mouton

work page doi:10.1515/9781614514510-009 2015
[16]

Shiro Hattori. 1960. Gengo-gaku no H\= o h\= o , chapter Fuzoku-go to Fuzoku-keishiki. Iwanami Shoten. (in Japanese)

work page 1960
[17]

Sho Hoshino, Yusuke Miyao, Katsuhito Sudoh, and Masaaki Nagata. 2013. Two-stage pre-ordering for J apanese-to- E nglish statistical machine translation. In Proceedings of the Sixth International Joint Conference on Natural Language Processing, pages 1062--1066

work page 2013
[18]

Taro Kageyama. 1993. Bunp\= o to Go-keisei . Hituzi Syobo Publishing. (in Japanese)

work page 1993
[19]

Daisuke Kawahara, Yuichiro Machida, Tomohide Shibata, Sadao Kurohashi, Hayato Kobayashi, and Manabu Sassano. 2014. Rapid development of a corpus with discourse annotations using two-stage crowdsourcing. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pages 269--278

work page 2014
[20]

Taku Kudo, Kaoru Yamamoto, and Yuji Matsumoto. 2004. https://www.aclweb.org/anthology/W04-3230 Applying conditional random fields to J apanese morphological analysis . In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP 2004), pages 230--237

work page 2004
[21]

Sadao Kurohashi and Makoto Nagao. 1994. KN parser: J apanese dependency/case structure analyzer. In Proceedings of the Workshop on Sharable Natural Language, pages 48--55

work page 1994
[22]

Sadao Kurohashi and Makoto Nagao. 1998. Building a J apanese parsed corpus while improving the parsing system. In Proceedings of the NLPRS, pages 719--724

work page 1998
[23]

Sadao Kurohashi, Toshihisa Nakamura, Yuji Matsumoto, and Makoto Nagao. 1994. Improvements of J apanese morphological analyzer JUMAN . In Proceedings of The International Workshop on Sharable Natural Language Resources, pages 22--38

work page 1994
[24]

Rochelle Lieber. 1992. Deconstructing Morphology. University of Chicago Press

work page 1992
[25]

Kikuo Maekawa, Makoto Yamazaki, Toshinobu Ogiso, Takehiko Maruyama, Hideki Ogura, Wakako Kashino, Hanae Koiso, Masaya Yamaguchi, Makiro Tanaka, and Yasuharu Den. 2014. https://doi.org/10.1007/s10579-013-9261-0 Balanced C orpus of C ontemporary W ritten J apanese . Language Resources and Evaluation, 48:345--371

work page doi:10.1007/s10579-013-9261-0 2014
[26]

a ckstr\

Ryan McDonald, Joakim Nivre, Yvonne Quirmbach-Brundage, Yoav Goldberg, Dipanjan Das, Kuzman Ganchev, Keith Hall, Slav Petrov, Hao Zhang, Oscar T\" a ckstr\" o m, Claudia Bedini, N\' u ria Bertomeu Castell\' o , and Jungmee Lee. 2013. https://www.aclweb.org/anthology/P13-2017 Universal dependency annotation for multilingual parsing . In Proceedings of the ...

work page 2013
[27]

Osahito Miyaoka. 2015. Go to wa Nani ka Saik\= o (Reconsidering What is the ``Word''?) . Sanseido. (in Japanese)

work page 2015
[28]

Yugo Murawaki and Sadao Kurohashi. 2008. https://www.aclweb.org/anthology/D08-1045 Online acquisition of J apanese unknown morphemes using morphological constraints . In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing (EMNLP 2008), pages 429--437

work page 2008
[29]

Toshinobu Ogiso, Asuko Kondo, Yoko Mabuchi, and Noriko Hattori. 2017. Construction of the ``corpus of historical J apanese: M eiji-- T aish\= o series i -- magazines''. In Proceedings of Digital Humanities 2017

work page 2017
[30]

Hideki Ogura, Hanae Koiso, Yumi Fujiike, Sayaka Miyauchi, Hikari Konishi, and Yutaka Hara. 2011. Gendai Kakikotoba Kink\= o K\= o pasu Keitairon J\= o h\= o Kiteish\= u Dai 4 Han (Rules Governing the Morphological Analysis Contained in the BCCWJ , 4th ed.) . (in Japanese)

work page 2011
[31]

Gregory Pringle. 2016. http://www.cjvlang.com/Spicks/udjapanese.html Thoughts on the U niversal D ependencies proposal for J apanese: The problem of the word as a linguistic unit . Accessed: 2019-06-22

work page 2016
[32]

Milan Straka and Jana Strakov \'a . 2017. https://doi.org/10.18653/v1/K17-3009 Tokenizing, POS tagging, lemmatizing and parsing UD 2.0 with UDP ipe . In Proceedings of the C o NLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies , pages 88--99

work page doi:10.18653/v1/k17-3009 2017
[33]

Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems 27, pages 3104--3112

work page 2014
[34]

Takaaki Tanaka, Yusuke Miyao, Masayuki Asahara, Sumire Uematsu, Hiroshi Kanayama, Shinsuke Mori, and Yuji Matsumoto. 2016. https://www.aclweb.org/anthology/L16-1261 U niversal D ependencies for J apanese . In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)

work page 2016
[35]

Arseny Tolmachev, Daisuke Kawahara, and Sadao Kurohashi. 2019. https://www.aclweb.org/anthology/N19-1281 Shrinking J apanese morphological analyzers with neural networks and semi-supervised learning . In Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1...

work page 2019
[36]

Kiyotaka Uchimoto, Satoshi Sekine, and Hitoshi Isahara. 1999. https://www.aclweb.org/anthology/E99-1026 J apanese dependency structure analysis based on maximum entropy models . In Ninth Conference of the E uropean Chapter of the Association for Computational Linguistics

work page 1999
[37]

Universal Dependencies contributors . 2019 a . http://universaldependencies.org/introduction Introduction . Accessed: 2019-06-22

work page 2019
[38]

Universal Dependencies contributors . 2019 b . http://universaldependencies.org/u/overview/tokenization.html Tokenization and word segmentation . Accessed: 2019-06-22

work page 2019

[1] [1]

Masayuki Asahara, Hiroshi Kanayama, Takaaki Tanaka, Yusuke Miyao, Sumire Uematsu, Shinsuke Mori, Yuji Matsumoto, Mai Omura, and Yugo Murawaki. 2018. Universal D ependencies version 2 for J apanese. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

work page 2018

[2] [2]

Masayuki Asahara and Yuji Matsumoto. 2016. BCCWJ-DepPara : A syntactic annotation treebank on the ` B alanced C orpus of C ontemporary W ritten J apanese'. In Proceedings of the 12th Workshop on Asian Langauge Resources (ALR12), pages 49--58

work page 2016

[3] [3]

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv:abs/1409.0473

work page internal anchor Pith review Pith/arXiv arXiv 2014

[4] [4]

Daisuke Bekki. 2010. Nihon-go bunp\= o no keishiki riron . Kurosio Publishers. (in Japanese)

work page 2010

[5] [5]

Sabine Buchholz and Erwin Marsi. 2006. https://www.aclweb.org/anthology/W06-2920 C o NLL - X shared task on multilingual dependency parsing . In Proceedings of the Tenth Conference on Computational Natural Language Learning ( C o NLL -X) , pages 149--164

work page 2006

[6] [6]

Noam Chomsky. 1970. Remarks on nominalization. In Roderick A. Jacobs and Peter S. Rosenbaum, editors, Readings in English Transformational Grammar, pages 184--221. Ginn

work page 1970

[7] [7]

Cohen, Dipanjan Das, and Noah A

Shay B. Cohen, Dipanjan Das, and Noah A. Smith. 2011. https://www.aclweb.org/anthology/D11-1005 Unsupervised structure prediction with non-parallel multilingual guidance . In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pages 50--61

work page 2011

[8] [8]

William Croft, Dawn Nordquist, Katherine Looney, and Michael Regan. 2017. Linguistic typology meets U niversal D ependencies. In Proceedings of the 15th International Workshop on Treebanks and Linguistic Theories (TLT15), pages 63--75

work page 2017

[9] [9]

Anna-Maria Di Sciullo and Edwin Williams. 1987. On the Definition of Word. MIT Press

work page 1987

[10] [10]

Shinkichi Hahimoto. 1933. Kokugo-h\= o y\= o setsu . Meiji Shoin. (in Japanese)

work page 1933

[11] [11]

Jan Haji c , Massimiliano Ciaramita, Richard Johansson, Daisuke Kawahara, Maria Ant \`o nia Mart \' , Llu \' s M \`a rquez, Adam Meyers, Joakim Nivre, Sebastian Pad \'o , Jan S t e p \'a nek, Pavel Stra n \'a k, Mihai Surdeanu, Nianwen Xue, and Yi Zhang. 2009. https://www.aclweb.org/anthology/W09-1201 The C o NLL -2009 shared task: Syntactic and semantic ...

work page 2009

[12] [12]

Masatsugu Hangyo, Daisuke Kawahara, and Sadao Kurohashi. 2012. Building a diverse document leads corpus annotated with semantic relations. In Proceedings of the 26th Pacific Asia Conference on Language, Information, and Computation, pages 535--544

work page 2012

[13] [13]

Martin Haspelmath. 2010. https://doi.org/10.1353/lan.2010.0021 Comparative concepts and descriptive categories in crosslinguistic studies . Language, 86(3):663--687

work page doi:10.1353/lan.2010.0021 2010

[14] [14]

Martin Haspelmath. 2011. https://doi.org/10.1515/flin.2011.002 The indeterminacy of word segmentation and the nature of morphology and syntax . Folia Linguistica, 45(1):31--80

work page doi:10.1515/flin.2011.002 2011

[15] [15]

Martin Haspelmath. 2015. https://doi.org/10.1515/9781614514510-009 Defining vs. diagnosing linguistic categories: A case study of clitic phenomena . In Joanna Blaszczak, Dorota Klimek-Jankowska, and Krzysztof Migdalski, editors, How Categorical are Categories? New Approaches to the Old Questions of Noun, Verb, and Adjective, pages 273--304. De Gruyter Mouton

work page doi:10.1515/9781614514510-009 2015

[16] [16]

Shiro Hattori. 1960. Gengo-gaku no H\= o h\= o , chapter Fuzoku-go to Fuzoku-keishiki. Iwanami Shoten. (in Japanese)

work page 1960

[17] [17]

Sho Hoshino, Yusuke Miyao, Katsuhito Sudoh, and Masaaki Nagata. 2013. Two-stage pre-ordering for J apanese-to- E nglish statistical machine translation. In Proceedings of the Sixth International Joint Conference on Natural Language Processing, pages 1062--1066

work page 2013

[18] [18]

Taro Kageyama. 1993. Bunp\= o to Go-keisei . Hituzi Syobo Publishing. (in Japanese)

work page 1993

[19] [19]

Daisuke Kawahara, Yuichiro Machida, Tomohide Shibata, Sadao Kurohashi, Hayato Kobayashi, and Manabu Sassano. 2014. Rapid development of a corpus with discourse annotations using two-stage crowdsourcing. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pages 269--278

work page 2014

[20] [20]

Taku Kudo, Kaoru Yamamoto, and Yuji Matsumoto. 2004. https://www.aclweb.org/anthology/W04-3230 Applying conditional random fields to J apanese morphological analysis . In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP 2004), pages 230--237

work page 2004

[21] [21]

Sadao Kurohashi and Makoto Nagao. 1994. KN parser: J apanese dependency/case structure analyzer. In Proceedings of the Workshop on Sharable Natural Language, pages 48--55

work page 1994

[22] [22]

Sadao Kurohashi and Makoto Nagao. 1998. Building a J apanese parsed corpus while improving the parsing system. In Proceedings of the NLPRS, pages 719--724

work page 1998

[23] [23]

Sadao Kurohashi, Toshihisa Nakamura, Yuji Matsumoto, and Makoto Nagao. 1994. Improvements of J apanese morphological analyzer JUMAN . In Proceedings of The International Workshop on Sharable Natural Language Resources, pages 22--38

work page 1994

[24] [24]

Rochelle Lieber. 1992. Deconstructing Morphology. University of Chicago Press

work page 1992

[25] [25]

Kikuo Maekawa, Makoto Yamazaki, Toshinobu Ogiso, Takehiko Maruyama, Hideki Ogura, Wakako Kashino, Hanae Koiso, Masaya Yamaguchi, Makiro Tanaka, and Yasuharu Den. 2014. https://doi.org/10.1007/s10579-013-9261-0 Balanced C orpus of C ontemporary W ritten J apanese . Language Resources and Evaluation, 48:345--371

work page doi:10.1007/s10579-013-9261-0 2014

[26] [26]

a ckstr\

Ryan McDonald, Joakim Nivre, Yvonne Quirmbach-Brundage, Yoav Goldberg, Dipanjan Das, Kuzman Ganchev, Keith Hall, Slav Petrov, Hao Zhang, Oscar T\" a ckstr\" o m, Claudia Bedini, N\' u ria Bertomeu Castell\' o , and Jungmee Lee. 2013. https://www.aclweb.org/anthology/P13-2017 Universal dependency annotation for multilingual parsing . In Proceedings of the ...

work page 2013

[27] [27]

Osahito Miyaoka. 2015. Go to wa Nani ka Saik\= o (Reconsidering What is the ``Word''?) . Sanseido. (in Japanese)

work page 2015

[28] [28]

Yugo Murawaki and Sadao Kurohashi. 2008. https://www.aclweb.org/anthology/D08-1045 Online acquisition of J apanese unknown morphemes using morphological constraints . In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing (EMNLP 2008), pages 429--437

work page 2008

[29] [29]

Toshinobu Ogiso, Asuko Kondo, Yoko Mabuchi, and Noriko Hattori. 2017. Construction of the ``corpus of historical J apanese: M eiji-- T aish\= o series i -- magazines''. In Proceedings of Digital Humanities 2017

work page 2017

[30] [30]

Hideki Ogura, Hanae Koiso, Yumi Fujiike, Sayaka Miyauchi, Hikari Konishi, and Yutaka Hara. 2011. Gendai Kakikotoba Kink\= o K\= o pasu Keitairon J\= o h\= o Kiteish\= u Dai 4 Han (Rules Governing the Morphological Analysis Contained in the BCCWJ , 4th ed.) . (in Japanese)

work page 2011

[31] [31]

Gregory Pringle. 2016. http://www.cjvlang.com/Spicks/udjapanese.html Thoughts on the U niversal D ependencies proposal for J apanese: The problem of the word as a linguistic unit . Accessed: 2019-06-22

work page 2016

[32] [32]

Milan Straka and Jana Strakov \'a . 2017. https://doi.org/10.18653/v1/K17-3009 Tokenizing, POS tagging, lemmatizing and parsing UD 2.0 with UDP ipe . In Proceedings of the C o NLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies , pages 88--99

work page doi:10.18653/v1/k17-3009 2017

[33] [33]

Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems 27, pages 3104--3112

work page 2014

[34] [34]

Takaaki Tanaka, Yusuke Miyao, Masayuki Asahara, Sumire Uematsu, Hiroshi Kanayama, Shinsuke Mori, and Yuji Matsumoto. 2016. https://www.aclweb.org/anthology/L16-1261 U niversal D ependencies for J apanese . In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)

work page 2016

[35] [35]

Arseny Tolmachev, Daisuke Kawahara, and Sadao Kurohashi. 2019. https://www.aclweb.org/anthology/N19-1281 Shrinking J apanese morphological analyzers with neural networks and semi-supervised learning . In Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1...

work page 2019

[36] [36]

Kiyotaka Uchimoto, Satoshi Sekine, and Hitoshi Isahara. 1999. https://www.aclweb.org/anthology/E99-1026 J apanese dependency structure analysis based on maximum entropy models . In Ninth Conference of the E uropean Chapter of the Association for Computational Linguistics

work page 1999

[37] [37]

Universal Dependencies contributors . 2019 a . http://universaldependencies.org/introduction Introduction . Accessed: 2019-06-22

work page 2019

[38] [38]

Universal Dependencies contributors . 2019 b . http://universaldependencies.org/u/overview/tokenization.html Tokenization and word segmentation . Accessed: 2019-06-22

work page 2019