pith. sign in

arxiv: 1805.05089 · v1 · pith:KU57OZQQnew · submitted 2018-05-14 · 💻 cs.CL

Parser Training with Heterogeneous Treebanks

classification 💻 cs.CL
keywords treebanksembeddingstrainingtreebankfine-tuningheterogeneousmultipleparser
0
0 comments X
read the original abstract

How to make the most of multiple heterogeneous treebanks when training a monolingual dependency parser is an open question. We start by investigating previously suggested, but little evaluated, strategies for exploiting multiple treebanks based on concatenating training sets, with or without fine-tuning. We go on to propose a new method based on treebank embeddings. We perform experiments for several languages and show that in many cases fine-tuning and treebank embeddings lead to substantial improvements over single treebanks or concatenation, with average gains of 2.0--3.5 LAS points. We argue that treebank embeddings should be preferred due to their conceptual simplicity, flexibility and extensibility.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.