Neural CRF Model for Sentence Alignment in Text Simplification

Chao Jiang; Mounica Maddela; Wei Xu; Wuwei Lan; Yang Zhong

arxiv: 2005.02324 · v4 · pith:2CBBZWIInew · submitted 2020-05-05 · 💻 cs.CL

Neural CRF Model for Sentence Alignment in Text Simplification

Chao Jiang , Mounica Maddela , Wuwei Lan , Yang Zhong , Wei Xu This is my paper

classification 💻 cs.CL

keywords sentencesimplificationtextalignmentdatasetsmodelneuralquality

0 comments

read the original abstract

The success of a text simplification system heavily depends on the quality and quantity of complex-simple sentence pairs in the training corpus, which are extracted by aligning sentences between parallel articles. To evaluate and improve sentence alignment quality, we create two manually annotated sentence-aligned datasets from two commonly used text simplification corpora, Newsela and Wikipedia. We propose a novel neural CRF alignment model which not only leverages the sequential nature of sentences in parallel documents but also utilizes a neural sentence pair model to capture semantic similarity. Experiments demonstrate that our proposed approach outperforms all the previous work on monolingual sentence alignment task by more than 5 points in F1. We apply our CRF aligner to construct two new text simplification datasets, Newsela-Auto and Wiki-Auto, which are much larger and of better quality compared to the existing datasets. A Transformer-based seq2seq model trained on our datasets establishes a new state-of-the-art for text simplification in both automatic and human evaluation.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Language Generation as Optimal Control: Closed-Loop Diffusion in Latent Control Space
cs.CL 2026-05 unverdicted novelty 7.0

The paper introduces Manta-LM, which approximates the Hamilton-Jacobi-Bellman optimal policy via Flow Matching in a rectified latent control space to enable high-fidelity parallel language generation.
Language Generation as Optimal Control: Closed-Loop Diffusion in Latent Control Space
cs.CL 2026-05 unverdicted novelty 6.0

Language generation is recast as optimal control and solved approximately with flow matching in rectified latent control space to enable high-fidelity parallel text generation.
Language Generation as Optimal Control: Closed-Loop Diffusion in Latent Control Space
cs.CL 2026-05 unverdicted novelty 6.0

Manta-LM approximates the HJB equation via flow matching in latent control space to realize closed-loop optimal control for language generation.