DART: Open-Domain Structured Data Record to Text Generation

Aadit Vyas; Abhinand Sivaprasad; Ahmad Zaidi; Amrit Rau; Ankit Gupta; Caiming Xiong; Chiachun Hsieh; Dragomir Radev; Faiaz Rahman; Jessica Pan

arxiv: 2007.02871 · v2 · pith:FFQZ7NFNnew · submitted 2020-07-06 · 💻 cs.CL

DART: Open-Domain Structured Data Record to Text Generation

Linyong Nan , Dragomir Radev , Rui Zhang , Amrit Rau , Abhinand Sivaprasad , Chiachun Hsieh , Xiangru Tang , Aadit Vyas

show 16 more authors

Neha Verma Pranav Krishna Yangxiaokang Liu Nadia Irwanto Jessica Pan Faiaz Rahman Ahmad Zaidi Mutethia Mutuma Yasin Tarabar Ankit Gupta Tao Yu Yi Chern Tan Xi Victoria Lin Caiming Xiong Richard Socher Nazneen Fatema Rajani

This is my paper

classification 💻 cs.CL

keywords dartdatasemanticstructureddata-to-textdatasetdomaingeneration

0 comments

read the original abstract

We present DART, an open domain structured DAta Record to Text generation dataset with over 82k instances (DARTs). Data-to-Text annotations can be a costly process, especially when dealing with tables which are the major source of structured data and contain nontrivial structures. To this end, we propose a procedure of extracting semantic triples from tables that encodes their structures by exploiting the semantic dependencies among table headers and the table title. Our dataset construction framework effectively merged heterogeneous sources from open domain semantic parsing and dialogue-act-based meaning representation tasks by utilizing techniques such as: tree ontology annotation, question-answer pair to declarative sentence conversion, and predicate unification, all with minimum post-editing. We present systematic evaluation on DART as well as new state-of-the-art results on WebNLG 2017 to show that DART (1) poses new challenges to existing data-to-text datasets and (2) facilitates out-of-domain generalization. Our data and code can be found at https://github.com/Yale-LILY/dart.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

TrimCaching: Parameter-sharing Edge Caching for AI Model Downloading
cs.NI 2024-04 unverdicted novelty 7.0

TrimCaching introduces parameter-sharing edge caching for AI models, formulates it as a submodular maximization problem with submodular constraints, provides approximation algorithms for special and general cases, and...
LoRA: Low-Rank Adaptation of Large Language Models
cs.CL 2021-06 accept novelty 7.0

Adapting large language models by training only a low-rank decomposition BA added to frozen weight matrices matches full fine-tuning while cutting trainable parameters by orders of magnitude and adding no inference latency.
Prefix-Tuning: Optimizing Continuous Prompts for Generation
cs.CL 2021-01 conditional novelty 7.0

Prefix-tuning matches or exceeds fine-tuning on NLG tasks by optimizing a continuous prefix using 0.1% of parameters while keeping the LM frozen.
Unveiling Privacy Risks in Multi-modal Large Language Models: Task-specific Vulnerabilities and Mitigation Challenges
cs.CR 2026-06 unverdicted novelty 6.0

Introduces MM-Privacy dataset and evaluations showing MLLMs leak sensitive data from images in various tasks, highlighting task inconsistency effects.