pith. sign in

arxiv: 1707.08052 · v1 · pith:YOKVRAJGnew · submitted 2017-07-25 · 💻 cs.CL

Challenges in Data-to-Document Generation

classification 💻 cs.CL
keywords generationmodelsneuralcurrentdescriptivedocumentsmethodsperformance
0
0 comments X
read the original abstract

Recent neural models have shown significant progress on the problem of generating short descriptive texts conditioned on a small number of database records. In this work, we suggest a slightly more difficult data-to-text generation task, and investigate how effective current approaches are on this task. In particular, we introduce a new, large-scale corpus of data records paired with descriptive documents, propose a series of extractive evaluation methods for analyzing performance, and obtain baseline results using current neural generation methods. Experiments show that these models produce fluent text, but fail to convincingly approximate human-generated documents. Moreover, even templated baselines exceed the performance of these neural models on some metrics, though copy- and reconstruction-based extensions lead to noticeable improvements.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. TabEmb: Joint Semantic-Structure Embedding for Table Annotation

    cs.LG 2026-04 unverdicted novelty 5.0

    TabEmb decouples LLM-based semantic column embeddings from graph-based structural modeling to produce joint representations that improve table annotation tasks.