pith. sign in

arxiv: 1906.11298 · v1 · pith:3QNNWB4Wnew · submitted 2019-06-26 · 💻 cs.CL · cs.LG

A Generative Model for Punctuation in Dependency Trees

Pith reviewed 2026-05-25 15:23 UTC · model grok-4.3

classification 💻 cs.CL cs.LG
keywords punctuationdependency treesgenerative modellatent variablesincomplete data likelihoodstring transductiondynamic programmingsyntax
0
0 comments X

The pith

Dependency trees have latent underlying punctuation marks that can be recovered by a generative model trained only on observed surface strings.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper formalizes punctuation as latent underlying marks placed to delimit syntactic constituents in a dependency tree, with a string-rewriting transduction that produces the observed surface marks. It defines a generative model that can be trained efficiently via dynamic programming by locally maximizing the incomplete-data likelihood of the observed sentences. Reconstructions of the underlying marks appear plausible across five languages and match Nunberg's linguistic analysis for English. The same model improves punctuation restoration over baselines and can re-render appropriate surface punctuation after a syntactic transformation of the sentence.

Core claim

Punctuation marks observed in treebanks are surface realizations produced by a transduction from latent underlying marks; these underlying marks delimit or separate constituents inside the syntax tree. The model places the underlying marks generatively and transduces them to surface form, admits efficient dynamic programming, and is trained by maximizing the likelihood of the observed yield without ever seeing the underlying marks. Reconstructions obtained this way are consistent with linguistic theory and support improved punctuation restoration as well as punctuation-aware syntactic transformations.

What carries the argument

Generative model of latent underlying punctuation marks placed in a dependency tree and transduced to surface marks by string rewriting, trained via incomplete-data likelihood maximization with dynamic programming.

If this is right

  • The model produces plausible underlying punctuation reconstructions in five languages.
  • Reconstructions are consistent with Nunberg's analysis of English punctuation.
  • The model outperforms baselines on punctuation restoration.
  • The trained transduction mechanism can render surface punctuation after syntactic transformations of a sentence.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same latent-mark approach could be tested on treebanks that include prosodic or intonational annotations to see whether similar delimiters emerge.
  • If the model improves restoration, it could be inserted as a post-processing step inside existing dependency parsers.
  • The transduction component might be reused to generate punctuation when converting spoken transcripts into written form.

Load-bearing premise

Latent underlying punctuation marks exist and their placement plus transduction rules can be recovered from the likelihood of observed surface strings alone, without any direct supervision on the latent marks.

What would settle it

On a new set of sentences, either the model's reconstructed underlying marks are judged systematically inconsistent with linguistic analysis or its punctuation-restoration accuracy falls below the strongest baseline.

Figures

Figures reproduced from arXiv: 1906.11298 by Dingquan Wang, Jason Eisner, Xiang Lisa Li.

Figure 1
Figure 1. Figure 1: The generative story of a sentence. Given an unpunctuated tree T at top, at each node w ∈ T, the ATTACH process stochastically attaches a left puncteme l and a right puncteme r, which may be empty. The resulting tree T 0 has underlying punctua￾tion u. Each slot’s punctuation ui ∈ u is rewritten to xi ∈ x by NOISYCHANNEL. In tasks such as word embedding induction (Mikolov et al., 2013; Pennington et al., 20… view at source ↗
Figure 2
Figure 2. Figure 2: Editing abcde 7→ ade with a sliding win￾dow. (When an absorption rule maps 2 tokens to 1, our diagram leaves blank space that is not part of the out￾put string.) At each step, the left-to-right process has already committed to the green tokens as output; has not yet looked at the blue input tokens; and is currently considering how to (further) rewrite the black tokens. The right column shows the chosen edi… view at source ↗
Figure 3
Figure 3. Figure 3: Rewrite probabilities learned for English, averaged over the last 4 epochs on en treebank (blue bars) or en_esl treebank (orange bars). The header above each figure is the underlying punctuation string (input to NOISYCHANNEL). The two counts in the fig￾ure headers are the number of occurrences of the under￾lying punctuation strings in the 1-best reconstruction of underlying punctuation sequences (by Algori… view at source ↗
Figure 4
Figure 4. Figure 4: Edit distance per slot (which we call average edit distance, or AED) for each of the 5 corpora. Lower is better. The table gives the final AED on the test data. Its first 3 columns show the baseline methods just as in [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: An example of our PFST on vocabulary Σ = {a, b}. The input (underlying punctuation to￾kens) is colored in blue and the output (surface punctu￾ation tokens) is colored in green. All arc probabilities are suppressed for readability. ∧ is the start state, $ is the final state,  denotes the empty string, and $ de￾notes a special end-of-input token. The four rewriting rules at the bottom of the figure are illu… view at source ↗
Figure 6
Figure 6. Figure 6: The WFST obtained by composing the yel￾low PFST F in [PITH_FULL_IMAGE:figures/full_fig_p021_6.png] view at source ↗
read the original abstract

Treebanks traditionally treat punctuation marks as ordinary words, but linguists have suggested that a tree's "true" punctuation marks are not observed (Nunberg, 1990). These latent "underlying" marks serve to delimit or separate constituents in the syntax tree. When the tree's yield is rendered as a written sentence, a string rewriting mechanism transduces the underlying marks into "surface" marks, which are part of the observed (surface) string but should not be regarded as part of the tree. We formalize this idea in a generative model of punctuation that admits efficient dynamic programming. We train it without observing the underlying marks, by locally maximizing the incomplete data likelihood (similarly to EM). When we use the trained model to reconstruct the tree's underlying punctuation, the results appear plausible across 5 languages, and in particular, are consistent with Nunberg's analysis of English. We show that our generative model can be used to beat baselines on punctuation restoration. Also, our reconstruction of a sentence's underlying punctuation lets us appropriately render the surface punctuation (via our trained underlying-to-surface mechanism) when we syntactically transform the sentence.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper formalizes Nunberg's (1990) linguistic proposal by positing latent underlying punctuation marks in dependency trees that are transduced to observed surface marks via a string-rewriting mechanism. It presents a generative model admitting efficient dynamic programming, trains the model by locally maximizing incomplete-data likelihood (analogous to EM) without observing the latents, and reports that the resulting reconstructions appear plausible across five languages, are consistent with Nunberg's English analysis, improve punctuation restoration over baselines, and support appropriate surface rendering after syntactic transformations.

Significance. If the central claim holds, the work supplies a computationally tractable formalization of a linguistic theory of punctuation together with a practical method for punctuation restoration and tree transformation. The unsupervised training via incomplete-data likelihood and the use of dynamic programming constitute clear technical strengths; the cross-lingual qualitative results and the restoration gains provide initial empirical support.

major comments (2)
  1. [Abstract and experimental evaluation sections] The central claim that the model recovers the specific latent underlying marks posited by Nunberg (rather than some other factorization consistent with the surface data) rests on post-hoc qualitative judgment of plausibility. Because the objective is non-convex and the latents are never observed, multiple alternative latent configurations can yield identical or higher marginal likelihood; the manuscript does not supply a quantitative check (e.g., held-out alignment with independently annotated underlying marks or a controlled simulation recovering known latents) that would demonstrate identification of the intended generative story.
  2. [Punctuation restoration experiments] The punctuation-restoration experiments demonstrate gains over baselines, but the evaluation protocol does not isolate whether the improvement derives from the latent-variable structure or from the surface transduction component alone; an ablation that removes the latent layer while retaining the transduction mechanism would clarify the contribution of the core modeling assumption.
minor comments (2)
  1. [Model definition] Notation for the underlying-to-surface transduction rules and the dynamic-programming recursions should be introduced with explicit definitions and a small worked example to improve readability.
  2. [Experimental setup] The manuscript should state the precise number of languages, treebank sizes, and evaluation metrics used for the cross-lingual reconstruction experiments.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and for highlighting both the technical contributions and the areas where further clarification would strengthen the paper. We respond to each major comment below.

read point-by-point responses
  1. Referee: [Abstract and experimental evaluation sections] The central claim that the model recovers the specific latent underlying marks posited by Nunberg (rather than some other factorization consistent with the surface data) rests on post-hoc qualitative judgment of plausibility. Because the objective is non-convex and the latents are never observed, multiple alternative latent configurations can yield identical or higher marginal likelihood; the manuscript does not supply a quantitative check (e.g., held-out alignment with independently annotated underlying marks or a controlled simulation recovering known latents) that would demonstrate identification of the intended generative story.

    Authors: We agree that, because the underlying marks are never observed, the manuscript cannot quantitatively demonstrate that the learned latents are exactly those posited by Nunberg rather than another factorization yielding the same marginal likelihood. The evaluation is therefore qualitative, resting on the plausibility of the reconstructions and their consistency with Nunberg's English analysis. The model is explicitly constructed around Nunberg's transduction mechanism, and the local likelihood objective is designed to recover configurations compatible with that mechanism; the reported results show that the resulting reconstructions align with the linguistic proposal. No independently annotated underlying marks exist for a held-out quantitative check, so such an evaluation is not possible with existing resources. revision: no

  2. Referee: [Punctuation restoration experiments] The punctuation-restoration experiments demonstrate gains over baselines, but the evaluation protocol does not isolate whether the improvement derives from the latent-variable structure or from the surface transduction component alone; an ablation that removes the latent layer while retaining the transduction mechanism would clarify the contribution of the core modeling assumption.

    Authors: The referee correctly notes that the current experiments do not isolate the contribution of the latent layer from the transduction mechanism. While the baselines lack the full generative model, an explicit ablation that retains only the surface transduction component would provide a clearer comparison. We will add this ablation experiment to the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity; standard latent-variable training on incomplete data

full rationale

The paper defines a generative model over latent underlying punctuation and observed surface strings, then trains by locally maximizing the incomplete-data likelihood via dynamic programming (analogous to EM). Reconstructions of the latent marks are posterior inferences under the trained model rather than quantities defined by construction from fitted parameters or self-citations. No load-bearing self-citation, ansatz smuggling, or renaming of known results appears in the derivation; the approach is self-contained as a conventional latent-variable generative model whose outputs are not forced to equal its inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The model rests on the linguistic premise of latent marks and the statistical premise that incomplete-data likelihood maximization can recover them. No free parameters, axioms, or invented entities are explicitly listed in the abstract.

axioms (2)
  • domain assumption Latent underlying punctuation marks exist and delimit constituents in the syntax tree (Nunberg 1990).
    Invoked in the first paragraph of the abstract as the motivation for the model.
  • domain assumption A string rewriting mechanism transduces underlying marks into surface marks.
    Stated as part of the generative story in the abstract.

pith-pipeline@v0.9.0 · 5727 in / 1395 out tokens · 17614 ms · 2026-05-25T15:23:50.115544+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

67 extracted references · 67 canonical work pages · 5 internal anchors

  1. [1]

    URL: " 'urlintro :=

    ENTRY address author booktitle chapter edition editor howpublished institution journal key month note number organization pages publisher school series title type volume year eprint doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRINGS urlintro eprinturl eprintpr...

  2. [2]

    write newline

    " write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

  3. [3]

    Apoorv Agarwal, Boyi Xie, Ilia Vovsha, Owen Rambow, and Rebecca Passonneau. 2011. http://www.aclweb.org/anthology/W11-0705 Sentiment analysis of T witter data . In Proceedings of the Workshop on Language in Social Media (LSM 2011), pages 30--38

  4. [4]

    Miguel Ballesteros and Leo Wanner. 2016. https://doi.org/10.18653/v1/D16-1111 A neural network architecture for multilingual punctuation generation . In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 1048--1053

  5. [5]

    Perles, and E

    Yehoshua Bar-Hillel, M. Perles, and E. Shamir. 1961. http://search.proquest.com/openview/fb41296047fb7453dcb1de182b4aa0b6/1 On formal properties of simple phrase structure grammars . Zeitschrift f\" ur Phonetik, Sprachwissenschaft und Kommunikationsforschung , 14:143--172. Reprinted in Y. Bar-Hillel (1964), Language and Information: Selected Essays on the...

  6. [6]

    Jean Berstel and Christophe Reutenauer. 1988. Rational Series and their Languages. Springer-Verlag

  7. [7]

    Ann Bies, Mark Ferguson, Karen Katz, Robert MacIntyre, Victoria Tredinnick, Grace Kim, Mary Ann Marcinkiewicz, and Britta Schasberger. 1995. ftp://ftp.cis.upenn.edu/pub/treebank/doc/manual/root.ps.gz Bracketing guidelines for T reebank II style: P enn T reebank project . Technical Report MS-CIS-95-06, University of Pennsylvania

  8. [8]

    Ted Briscoe. 1994. https://www.cl.cam.ac.uk/ ejb1/punct-pos-parsing.ps Parsing (with) punctuation, etc. Technical report, Xerox European Research Laboratory

  9. [9]

    Danqi Chen and Christopher Manning. 2014. https://doi.org/10.3115/v1/D14-1082 A fast and accurate dependency parser using neural networks . In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 740--750

  10. [10]

    Noam Chomsky and Morris Halle. 1968. The Sound Pattern of E nglish . Harper and Row, New York

  11. [11]

    Ryan Cotterell, Nanyun Peng, and Jason Eisner. 2014. https://doi.org/10.3115/v1/P14-2102 Stochastic contextual edit distance and probabilistic FST s . In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 625--630

  12. [12]

    Ryan Cotterell, Nanyun Peng, and Jason Eisner. 2015. http://cs.jhu.edu/ jason/papers/#cotterell-peng-eisner-2015 Modeling word forms using latent underlying morphs and phonology . Transactions of the Association for Computational Linguistics (TACL), 3:433--447

  13. [13]

    Ryan Cotterell, Tim Vieira, and Hinrich Sch \"u tze. 2016. http://www.aclweb.org/anthology/N16-1080 A joint model of orthography and morphological segmentation . In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), pages 664--669

  14. [14]

    Brooke Cowan, Ivona Ku c erov \'a , and Michael Collins. 2006. http://www.aclweb.org/anthology/W06-1628 A discriminative model for tree-to-tree translation . In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 232--241

  15. [15]

    Aron Culotta and Jeffrey Sorensen. 2004. http://www.aclweb.org/anthology/P04-1054 Dependency tree kernels for relation extraction . In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL)

  16. [16]

    Timothy Dozat and Christopher Manning. 2017. https://arxiv.org/pdf/1611.01734.pdf Efficient third-order dependency parsers . In Proceedings of the 5th International Conference on Learning Representations (ICLR)

  17. [17]

    Chris Dyer, Adhiguna Kuncoro, Miguel Ballesteros, and Noah A. Smith. 2016. https://doi.org/10.18653/v1/N16-1024 Recurrent neural network grammars . In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), pages 199--209

  18. [18]

    Jason Eisner. 1996. http://cs.jhu.edu/ jason/papers/#eisner-1996-coling Three new probabilistic models for dependency parsing: An exploration . In Proceedings of the 16th International Conference on Computational Linguistics (COLING), pages 340--345

  19. [19]

    Jason Eisner. 2016. http://cs.jhu.edu/ jason/papers/#eisner-2016 Inside-outside and forward-backward algorithms are just backprop . In Proceedings of the EMNLP Workshop on Structured Prediction for NLP

  20. [20]

    Caracciolo di Forino

    A. Caracciolo di Forino. 1968. String processing languages and generalized M arkov algorithms. In D. G. Bobrow, editor, Symbol Manipulation Languages and Techniques, pages 191--206. North-Holland Publishing Company, Amsterdam

  21. [21]

    Kuzman Ganchev, Jennifer Gillenwater, and Ben Taskar. 2010. Posterior regularization for structured latent variable models. Journal of Machine Learning Research, 11:2001--2049

  22. [22]

    Filip Ginter, Jan Haji c , Juhani Luotolahti, Milan Straka, and Daniel Zeman. 2017. http://hdl.handle.net/11234/1-1989 CoNLL 2017 shared task - automatically annotated raw texts and word embeddings . LINDAT / CLARIN digital library at the Institute of Formal and Applied Linguistics ( \'U FAL ), Faculty of Mathematics and Physics, Charles University

  23. [23]

    Yoav Goldberg and Michael Elhadad. 2010. http://www.aclweb.org/anthology/N10-1115 An efficient algorithm for easy-first non-directional dependency parsing . In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT), pages 742--750

  24. [24]

    Joshua Goodman. 1999. http://research.microsoft.com/ joshuago/finalring.ps Semiring parsing . Computational Linguistics, 25(4):573--605

  25. [25]

    George Heidorn. 2000. https://books.google.com/books?id=MnEjBsMIxxsC&lpg=PP1&pg=PA186 Intelligent writing assistance . In Robert Dale, Herman Moisl, and Harold Somers, editors, Handbook of Natural Language Processing, pages 181--207. Marcel Dekker, New York

  26. [26]

    Douglas Johnson

    C. Douglas Johnson. 1972. Formal Aspects of Phonological Description. Mouton

  27. [27]

    Bernard E. M. Jones. 1994. http://www.aclweb.org/anthology/C94-1069 Exploring the role of punctuation in parsing natural text . In COLING 1994 Volume 1: The 15th International Conference on Computational Linguistics

  28. [28]

    Diederik Kingma and Jimmy Ba. 2014. https://arxiv.org/pdf/1412.6980.pdf Adam : A method for stochastic optimization . In Proceedings of the International Conference on Learning Representations (ICLR)

  29. [29]

    Eliyahu Kiperwasser and Yoav Goldberg. 2016. http://aclweb.org/anthology/Q16-1023 Simple and accurate dependency parsing using bidirectional LSTM feature representations . Transactions of the Association for Computational Linguistics (TACL), 4:313--327

  30. [30]

    Kevin Knight and Daniel Marcu. 2002. Summarization beyond sentence extraction: A probabilistic approach to sentence compression. Artificial Intelligence, 139(1):91--107

  31. [31]

    Fang Kong, Guodong Zhou, Longhua Qian, and Qiaoming Zhu. 2010. http://www.aclweb.org/anthology/C10-1068 Dependency-driven anaphoricity determination for coreference resolution . In Proceedings of the 23rd International Conference on Computational Linguistics (COLING), pages 599--607

  32. [32]

    Terry Koo and Michael Collins. 2010. http://aclweb.org/anthology/P10-1001 Efficient third-order dependency parsers . In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL), pages 1--11

  33. [33]

    Albert E. Krahn. 2014. https://dc.uwm.edu/cgi/viewcontent.cgi?article=1470&context=etd A New Paradigm for Punctuation . Ph.D. thesis, The University of Wisconsin-Milwaukee

  34. [34]

    Tao Lei, Yu Xin, Yuan Zhang, Regina Barzilay, and Tommi Jaakkola. 2014. http://www.aclweb.org/anthology/P14-1130 Low-rank tensors for scoring dependency structures . In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL), pages 1381--1391

  35. [35]

    Roger Levy. 2008. http://www.aclweb.org/anthology/D08-1025 A noisy-channel model of human sentence comprehension under uncertain input . In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 234--243

  36. [36]

    Xing Li, Chengqing Zong, and Rile Hu. 2005. http://www.aclweb.org/anthology/I05-2002 A hierarchical parsing approach with punctuation processing for long C hinese sentences . In Proceedings of the International Joint Conference on Natural Language Processing (IJCNLP)

  37. [37]

    Zhifei Li and Jason Eisner. 2009. http://cs.jhu.edu/ jason/papers/#li-eisner-2009 First- and second-order expectation semirings with applications to minimum-risk training on translation forests . In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 40--51

  38. [38]

    Wei Lu and Hwee Tou Ng. 2010. Better punctuation prediction with dynamic conditional random fields. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 177--186

  39. [39]

    Marco Lui and Li Wang. 2013. http://www.aclweb.org/anthology/U13-1020 Recovering casing and punctuation using conditional random fields . In Proceedings of the Australasian Language Technology Association Workshop (ALTA), pages 137--141

  40. [40]

    Ji Ma, Yue Zhang, and Jingbo Zhu. 2014. https://doi.org/10.3115/v1/P14-2128 Punctuation processing for projective dependency parsing . In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 791--796

  41. [41]

    Mann and Andrew McCallum

    Gideon S. Mann and Andrew McCallum. 2010. http://www.jmlr.org/papers/volume11/mann10a/mann10a.pdf Generalized expectation criteria for semi-supervised learning with weakly labeled data . Journal of Machine Learning Research, 11:955--984

  42. [42]

    Andrey Andreevich Markov. 1960. The theory of algorithms. American Mathematical Society Translations, series 2(15):1--14

  43. [43]

    Ilia Markov, Vivi Nastase, and Carlo Strapparava. 2018. http://www.aclweb.org/anthology/C18-1293 Punctuation as native language interference . In Proceedings of the 27th International Conference on Computational Linguistics (COLING), pages 3456--3466

  44. [44]

    Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. http://arxiv.org/abs/1301.3781 Efficient estimation of word representations in vector space . Computing Research Repository (CoRR), arXiv:1301.3781

  45. [45]

    Mark-Jan Nederhof and Giorgio Satta. 2003. Probabilistic parsing as intersection. In 8th International Workshop on Parsing Technologies (IWPT), pages 137--148

  46. [46]

    Hwee Tou Ng, Siew Mei Wu, Ted Briscoe, Christian Hadiwinoto, Raymond Hendy Susanto, and Christopher Bryant. 2014. https://doi.org/10.3115/v1/W14-1701 The CoNLL -2014 shared task on grammatical error correction . In Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task, pages 1--14

  47. [47]

    o rstell, Cristina Bosco, Gosse Bouma, Sam Bowman, G \

    Joakim Nivre, Z eljko Agi \'c , Lars Ahrenberg, Maria Jesus Aranzabe, Masayuki Asahara, Aitziber Atutxa, Miguel Ballesteros, John Bauer, Kepa Bengoetxea, Yevgeni Berzak, Riyaz Ahmad Bhat, Eckhard Bick, Carl B \"o rstell, Cristina Bosco, Gosse Bouma, Sam Bowman, G \"u l s en Cebiro g lu Eryi g it, Giuseppe G. A. Celano, Fabricio Chalub, C a g r C \"o lteki...

  48. [48]

    Joakim Nivre, Johan Hall, Sandra K\"ubler, Ryan McDonald, Jens Nilsson, Sebastian Riedel, and Deniz Yuret. 2007 a . http://www.aclweb.org/anthology/D/D07/D07-1096 The CoNLL 2007 shared task on dependency parsing . In Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007, pages 915--932

  49. [49]

    u l s en Eryigit, Sandra K \

    Joakim Nivre, Johan Hall, Jens Nilsson, Atanas Chanev, G \"u l s en Eryigit, Sandra K \"u bler, Svetoslav Marinov, and Erwin Marsi. 2007 b . Maltparser: A language-independent system for data-driven dependency parsing. Natural Language Engineering, 13(2):95--135

  50. [50]

    Joakim Nivre et al. 2018. http://universaldependencies.org/guidelines.html Universal dependencies annotation guidelines . Available at universaldependencies.org

  51. [51]

    Geoffrey Nunberg. 1990. The Linguistics of Punctuation. Number 18 in CSLI Lecture Notes. Center for the Study of Language and Information

  52. [52]

    Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. 2002. http://www.aclweb.org/anthology/W02-1011 Thumbs up? S entiment classification using machine learning techniques . In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002)

  53. [53]

    Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. https://doi.org/10.3115/v1/D14-1162 GloVe : Global vectors for word representation . In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1532--1543

  54. [54]

    Fernando C. N. Pereira and Michael D. Riley. 1996. https://arxiv.org/abs/cmp-lg/9603001 Speech recognition by composition of weighted finite automata . Computing Research Repository (CoRR), arXiv:cmp-lg/9603001

  55. [55]

    Yara Parser: A Fast and Accurate Dependency Parser

    Mohammad Sadegh Rasooli and Joel R. Tetreault. 2015. http://arxiv.org/abs/1503.06733 Yara parser: A fast and accurate dependency parser . Computing Research Repository, arXiv:1503.06733 (version 2)

  56. [56]

    Radim R eh u r ek and Petr Sojka. 2010. http://is.muni.cz/publication/884893/en Software framework for topic modelling with large corpora . In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks , pages 45--50

  57. [57]

    Spitkovsky, Hiyan Alshawi, and Daniel Jurafsky

    Valentin I. Spitkovsky, Hiyan Alshawi, and Daniel Jurafsky. 2011. http://dl.acm.org/citation.cfm?id=2018936.2018939 Punctuation: Making a point in unsupervised dependency parsing . In Proceedings of the Fifteenth Conference on Computational Natural Language Learning, CoNLL '11, pages 19--28

  58. [58]

    Kai Sheng Tai, Richard Socher, and Christopher D. Manning. 2015. https://doi.org/10.3115/v1/P15-1150 Improved semantic representations from tree-structured long short-term memory networks . In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (A...

  59. [59]

    Ottokar Tilk and Tanel Alum \"a e. 2016. Bidirectional recurrent neural network with attention mechanism for punctuation restoration. In Interspeech, pages 3047--3051

  60. [60]

    Tran, Yonatan Bisk, Ashish Vaswani, Daniel Marcu, and Kevin Knight

    Ke M. Tran, Yonatan Bisk, Ashish Vaswani, Daniel Marcu, and Kevin Knight. 2016. https://doi.org/10.18653/v1/W16-5907 Unsupervised neural hidden M arkov models . In Proceedings of the Workshop on Structured Prediction for NLP, pages 63--71

  61. [61]

    University of Chicago . 2010. The Chicago Manual of Style. University of Chicago Press

  62. [62]

    Dingquan Wang and Jason Eisner. 2016. http://cs.jhu.edu/ jason/papers/#wang-eisner-2016 The G alactic D ependencies treebanks: Getting more data by synthesizing new languages . Transactions of the Association for Computational Linguistics (TACL), 4:491--505

  63. [63]

    Michael White and Rajakrishnan Rajkumar. 2008. http://www.aclweb.org/anthology/W08-1703 A more precise analysis of punctuation for broad-coverage surface realization with CCG . In Proceedings of the COLING 2008 Workshop on Grammar Engineering Across Frameworks, pages 17--24

  64. [64]

    K. Xu, L. Xie, and K. Yao. 2016. https://doi.org/10.1109/ISCSLP.2016.7918492 Investigating LSTM for punctuation prediction . In 2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP), pages 1--5

  65. [65]

    Richard Zens, Franz Josef Och, and Hermann Ney. 2002. https://link.springer.com/chapter/10.1007/3-540-45751-8_2 Phrase-based statistical machine translation . In Annual Conference on Artificial Intelligence, pages 18--32

  66. [66]

    Dongdong Zhang, Shuangzhi Wu, Nan Yang, and Mu Li. 2013. http://www.aclweb.org/anthology/P13-1074 Punctuation prediction with transition-based parsing . In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL), pages 752--760

  67. [67]

    Yue Zhang and Joakim Nivre. 2011. http://www.aclweb.org/anthology/P11-2033 Transition-based dependency parsing with rich non-local features . In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), pages 188--193