Neural Word Segmentation with Rich Pretraining

Fei Dong; Jie Yang; Yue Zhang

arxiv: 1704.08960 · v1 · pith:MDLLV23Onew · submitted 2017-04-28 · 💻 cs.CL

Neural Word Segmentation with Rich Pretraining

Jie Yang , Yue Zhang , Fei Dong This is my paper

classification 💻 cs.CL

keywords segmentationpretrainingwordexternalneuralsourcesmodelresearch

0 comments

read the original abstract

Neural word segmentation research has benefited from large-scale raw texts by leveraging them for pretraining character and word embeddings. On the other hand, statistical segmentation research has exploited richer sources of external information, such as punctuation, automatic segmentation and POS. We investigate the effectiveness of a range of external training sources for neural word segmentation by building a modular segmentation model, pretraining the most important submodule using rich external sources. Results show that such pretraining significantly improves the model, leading to accuracies competitive to the best methods on six benchmarks.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Investigating Self-Attention Network for Chinese Word Segmentation
cs.CL 2019-07 unverdicted novelty 4.0

Self-attention networks achieve competitive results to BiLSTM-CRF on Chinese word segmentation, with BERT and word integration yielding the best reported performance on six heterogeneous domain benchmarks.