pith. sign in

arxiv: 1803.09337 · v1 · pith:FNBHHYUDnew · submitted 2018-03-25 · 💻 cs.CL

Text Segmentation as a Supervised Learning Task

classification 💻 cs.CL
keywords segmentationtextdatasetlabeledlearningsupervisedtaskwork
0
0 comments X
read the original abstract

Text segmentation, the task of dividing a document into contiguous segments based on its semantic structure, is a longstanding challenge in language understanding. Previous work on text segmentation focused on unsupervised methods such as clustering or graph search, due to the paucity in labeled data. In this work, we formulate text segmentation as a supervised learning problem, and present a large new dataset for text segmentation that is automatically extracted and labeled from Wikipedia. Moreover, we develop a segmentation model based on this dataset and show that it generalizes well to unseen natural text.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Evaluation of Chunking Strategies for Effective Text Embedding in Low-Resource Language on Agricultural Documents

    cs.CL 2026-05 unverdicted novelty 3.0

    Recursive character-based chunking at 300 characters outperforms Sentence-Based, Khmer-Aware, and LLM-Based methods on L2 distance, answer relevance, and Khmer IoU in a 5-fold evaluation on 18 Khmer agricultural QA pairs.