A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents

Arman Cohan; Doo Soon Kim; Franck Dernoncourt; Nazli Goharian; Seokhwan Kim; Trung Bui; Walter Chang

arxiv: 1804.05685 · v2 · pith:U4XAZVBHnew · submitted 2018-04-16 · 💻 cs.CL

A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents

Arman Cohan , Franck Dernoncourt , Doo Soon Kim , Trung Bui , Seokhwan Kim , Walter Chang , Nazli Goharian This is my paper

classification 💻 cs.CL

keywords abstractivedocumentsmodelmodelssummarizationdiscourse-awareresultsapproach

0 comments

read the original abstract

Neural abstractive summarization models have led to promising results in summarizing relatively short documents. We propose the first model for abstractive summarization of single, longer-form documents (e.g., research papers). Our approach consists of a new hierarchical encoder that models the discourse structure of a document, and an attentive discourse-aware decoder to generate the summary. Empirical results on two large-scale datasets of scientific papers show that our model significantly outperforms state-of-the-art models.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Dynamic Chunking for Diffusion Language Models
cs.CL 2026-05 unverdicted novelty 7.0

DCDM replaces positional blocks with learnable semantic chunks via differentiable Chunking Attention, yielding consistent gains over block and unstructured diffusion baselines up to 1.5B parameters.
A Queueing-Theoretic Framework for Stability Analysis of LLM Inference with KV Cache Memory Constraints
cs.LG 2026-05 unverdicted novelty 6.0

A queueing model derives stability conditions for LLM inference services under combined compute and KV cache memory limits, with experimental validation showing typical deviations under 10%.
From Tokens to Layers: Redefining Stall-Free Scheduling for MoE Serving with Layered Prefill
cs.LG 2025-10 unverdicted novelty 6.0

Layered prefill replaces token-chunked prefill with layer-group interleaving in MoE models, cutting TTFT by up to 70%, end-to-end latency by 41%, and per-token energy by 22% while preserving stall-free TBT.