pith. sign in

arxiv: 1804.05685 · v2 · pith:U4XAZVBHnew · submitted 2018-04-16 · 💻 cs.CL

A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents

classification 💻 cs.CL
keywords abstractivedocumentsmodelmodelssummarizationdiscourse-awareresultsapproach
0
0 comments X
read the original abstract

Neural abstractive summarization models have led to promising results in summarizing relatively short documents. We propose the first model for abstractive summarization of single, longer-form documents (e.g., research papers). Our approach consists of a new hierarchical encoder that models the discourse structure of a document, and an attentive discourse-aware decoder to generate the summary. Empirical results on two large-scale datasets of scientific papers show that our model significantly outperforms state-of-the-art models.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Dynamic Chunking for Diffusion Language Models

    cs.CL 2026-05 unverdicted novelty 7.0

    DCDM replaces positional blocks with learnable semantic chunks via differentiable Chunking Attention, yielding consistent gains over block and unstructured diffusion baselines up to 1.5B parameters.

  2. A Queueing-Theoretic Framework for Stability Analysis of LLM Inference with KV Cache Memory Constraints

    cs.LG 2026-05 unverdicted novelty 6.0

    A queueing model derives stability conditions for LLM inference services under combined compute and KV cache memory limits, with experimental validation showing typical deviations under 10%.

  3. From Tokens to Layers: Redefining Stall-Free Scheduling for MoE Serving with Layered Prefill

    cs.LG 2025-10 unverdicted novelty 6.0

    Layered prefill replaces token-chunked prefill with layer-group interleaving in MoE models, cutting TTFT by up to 70%, end-to-end latency by 41%, and per-token energy by 22% while preserving stall-free TBT.