What is this Article about? Extreme Summarization with Topic-aware Convolutional Neural Networks

Mirella Lapata; Shashi Narayan; Shay B. Cohen

arxiv: 1907.08722 · v1 · pith:ZGZEP63Qnew · submitted 2019-07-19 · 💻 cs.CL

What is this Article about? Extreme Summarization with Topic-aware Convolutional Neural Networks

Shashi Narayan , Shay B. Cohen , Mirella Lapata This is my paper

Pith reviewed 2026-05-24 18:56 UTC · model grok-4.3

classification 💻 cs.CL

keywords extreme summarizationabstractive summarizationconvolutional neural networkstopic-aware modelsnews summarizationBBC datasetone-sentence summarylong-range dependencies

0 comments

The pith

Topic-aware CNNs outperform extractive oracles and prior abstractive models on one-sentence BBC news summaries.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces extreme summarization as the task of producing a single sentence that answers what a news article is about. It argues that this task inherently requires abstractive modeling because extractive methods cannot synthesize sufficiently concise content. The authors harvest a large BBC news dataset to support the task and introduce a convolutional neural network architecture conditioned on the article's topics. Experiments demonstrate that the model captures long-range dependencies, identifies pertinent content, and exceeds both an oracle extractive baseline and existing abstractive systems in automatic and human evaluations.

Core claim

The central claim is that an abstractive summarization model built entirely from convolutional neural networks and conditioned on article topics can capture long-range dependencies in documents and recognize pertinent content, thereby outperforming an oracle extractive system and state-of-the-art abstractive approaches on the extreme summarization task when tested on the collected BBC dataset.

What carries the argument

Topic-aware convolutional neural network that conditions the entire summarization process on the article's topics to model document content.

If this is right

Extreme summarization requires abstractive rather than extractive methods.
Conditioning on topics improves recognition of relevant document content.
Convolutional networks alone can handle long-range dependencies for this task.
The BBC dataset serves as a benchmark for one-sentence abstractive summarization.
Human judgments align with automatic metrics in confirming model superiority.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The topic-conditioning mechanism could transfer to other short-form generation tasks such as title creation.
Pure CNN architectures may prove competitive with recurrent models on additional document-level NLP problems.
The dataset harvesting approach could be replicated for non-English news domains to test cross-lingual generalization.
Combining the topic signal with other conditioning variables might further improve content selection.

Load-bearing premise

Extreme summarization by its nature cannot be handled adequately by extractive strategies and therefore requires an abstractive approach.

What would settle it

An extractive system that matches or exceeds the proposed model's human evaluation scores on one-sentence summaries from the BBC dataset.

Figures

Figures reproduced from arXiv: 1907.08722 by Mirella Lapata, Shashi Narayan, Shay B. Cohen.

**Figure 2.** Figure 2: Topic-conditioned convolutional model for extre [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗

**Figure 3.** Figure 3: Example output summaries on the XSum test set with [ [PITH_FULL_IMAGE:figures/full_fig_p020_3.png] view at source ↗

**Figure 4.** Figure 4: Example output summaries on the Newsroom Abstractive test set with [ROUGE-1, ROUGE-2 and ROUGE-L] scores, gold standard reference, and corresponding questions. Words highlighted in blue are either the right answer or constitute appropriate context for inferring it; words in red lead to the wrong answer. ranked worst with the lowest score of −0.397. In line with our findings in Section 6.3, participants fou… view at source ↗

**Figure 5.** Figure 5: Summaries ranked from least to most informative fr [PITH_FULL_IMAGE:figures/full_fig_p026_5.png] view at source ↗

**Figure 6.** Figure 6: Percentage of n-grams in test summaries seen in training summaries. summary and asked to decide whether it was informative (i.e., it relayed pertinent content from the document), partially informative, or uninformative. The study was conducted on Amazon Mechanical Turk with the same 100 test documents used of our judgment elicitation and QA studies on XSum and Newsroom-Abs. We collected judgments from thre… view at source ↗

**Figure 7.** Figure 7: Type-Token Ratio for summary n-grams in the entire dataset. phrases. As an example consider the summary from [PITH_FULL_IMAGE:figures/full_fig_p028_7.png] view at source ↗

read the original abstract

We introduce 'extreme summarization', a new single-document summarization task which aims at creating a short, one-sentence news summary answering the question ``What is the article about?''. We argue that extreme summarization, by nature, is not amenable to extractive strategies and requires an abstractive modeling approach. In the hope of driving research on this task further: (a) we collect a real-world, large scale dataset by harvesting online articles from the British Broadcasting Corporation (BBC); and (b) propose a novel abstractive model which is conditioned on the article's topics and based entirely on convolutional neural networks. We demonstrate experimentally that this architecture captures long-range dependencies in a document and recognizes pertinent content, outperforming an oracle extractive system and state-of-the-art abstractive approaches when evaluated automatically and by humans on the extreme summarization dataset.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces extreme summarization as a new single-document task that requires generating a one-sentence abstractive summary answering 'What is the article about?'. It collects a large BBC news dataset, presents a topic-aware CNN architecture, and claims that this model captures long-range dependencies, outperforms an oracle extractive baseline, and exceeds prior abstractive systems on both automatic metrics and human evaluation.

Significance. If the outperformance claims are robust, the work would be significant for defining a challenging new summarization benchmark and for showing that topic conditioning enables CNNs to handle document-level content selection for ultra-short outputs. The new dataset and the fully convolutional design are concrete contributions that could be reused.

major comments (2)

[Abstract, §1] Abstract and §1 (motivation): the assertion that extreme summarization 'by nature' is not amenable to extractive strategies rests on an oracle extractive baseline that selects the sentence maximizing n-gram overlap with the reference summary. This baseline does not test whether an extractive system optimized for the 'what the article is about' criterion could suffice, leaving the necessity of the topic-aware CNN under-supported.
[§4] §4 (experiments): the human evaluation protocol, including the exact instructions given to annotators, inter-annotator agreement statistics, and significance tests for the reported outperformance over the oracle and SOTA baselines, must be reported in full; without them the claim that the model 'recognizes pertinent content' cannot be assessed.

minor comments (2)

[§3] §3 (model): the precise integration of the topic distribution into the CNN layers (e.g., whether it is concatenated at every filter or only at the first layer) should be stated explicitly with an equation.
[Table 2] Table 2 or equivalent: report the number of parameters and training time for the proposed model alongside the baselines to allow direct comparison of computational cost.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. Below we respond point-by-point to the major comments and indicate the revisions we will make.

read point-by-point responses

Referee: [Abstract, §1] Abstract and §1 (motivation): the assertion that extreme summarization 'by nature' is not amenable to extractive strategies rests on an oracle extractive baseline that selects the sentence maximizing n-gram overlap with the reference summary. This baseline does not test whether an extractive system optimized for the 'what the article is about' criterion could suffice, leaving the necessity of the topic-aware CNN under-supported.

Authors: The reference summary itself embodies the 'what the article is about' criterion. Consequently, the oracle that selects the sentence maximizing n-gram overlap with this reference constitutes the theoretical upper bound achievable by any extractive system. No extractive model, regardless of its optimization criterion, can surpass this oracle. We will revise the abstract and §1 to explicitly state that the oracle demonstrates the inherent limitations of sentence extraction for this task, thereby supporting the need for an abstractive approach. revision: yes
Referee: [§4] §4 (experiments): the human evaluation protocol, including the exact instructions given to annotators, inter-annotator agreement statistics, and significance tests for the reported outperformance over the oracle and SOTA baselines, must be reported in full; without them the claim that the model 'recognizes pertinent content' cannot be assessed.

Authors: We agree that complete reporting of the human evaluation protocol is necessary. In the revised manuscript we will add a dedicated subsection detailing the exact instructions given to annotators, the inter-annotator agreement statistics, and the statistical significance tests (including p-values) for all reported comparisons against the oracle extractive baseline and prior abstractive systems. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical model proposal and evaluation on held-out data

full rationale

The paper defines a new task, collects a fresh BBC dataset, introduces a topic-aware CNN architecture, and reports automatic and human evaluations against an oracle extractive baseline and prior abstractive systems. The motivation that extreme summarization 'by nature' requires abstractive modeling is presented as an argument rather than a derived claim from equations or prior self-citations. No predictions reduce to fitted parameters by construction, no uniqueness theorems are imported from the authors' own work, and the central outperformance result rests on independent test-set measurements rather than self-referential definitions or renamings. This is standard empirical NLP work with no load-bearing circular steps.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

Based solely on abstract; the work centers on a new task and empirical model rather than heavy unstated axioms. Standard neural network assumptions apply.

free parameters (2)

CNN filter sizes and layer counts
Standard tunable hyperparameters in the convolutional architecture for text modeling.
Topic distribution parameters
Model conditions on article topics, implying fitted topic model outputs from data.

axioms (1)

domain assumption Convolutional neural networks can capture long-range dependencies in documents when appropriately configured.
Directly invoked in the claim that the architecture captures long-range dependencies.

pith-pipeline@v0.9.0 · 5677 in / 1177 out tokens · 30589 ms · 2026-05-24T18:56:24.490673+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

104 extracted references · 104 canonical work pages · 8 internal anchors

[1]

B., & Martins, A

Almeida, M. B., & Martins, A. F. T. (2013). Fast and robust com pressive summarization with dual decomposition and multi-task learning. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics , pp. 196–206, Soﬁa, Bul- garia

work page 2013
[2]

Angelidis, S., & Lapata, M. (2018). Summarizing opinions: A spect extraction meets sen- timent prediction and they are both weakly supervised. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Process ing, pp. 3675–3686

work page 2018
[3]

Bahdanau, D., Cho, K., & Bengio, Y. (2015). Neural machine tr anslation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning

work page 2015
[4]

Barzilay, R., & Elhadad, M. (1997). Using lexical chains for text summarization. In Pro- ceedings of the ACL Workshop on Intelligent Scalable Text Summ arization, pp. 10–17,

work page 1997
[5]

Barzilay, R., Elhadad, N., & McKeown, K. R. (2002). Inferrin g strategies for sentence ordering in multidocument news summarization. Journal of Artiﬁcial Intelligence Research, 17 (1), 35–55

work page 2002
[6]

Berg-Kirkpatrick, T., Gillick, D., & Klein, D. (2011). Join tly learning to extract and com- press. In Proceedings of the 49th Annual Meeting of the Association fo r Computational Linguistics: Human Language Technologies , pp. 481–490, Portland, Oregon, USA. 29

work page 2011
[7]

M., Ng, A

Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichle t allocation. The Journal of Machine Learning Research, 3, 993–1022

work page 2003
[8]

Carbonell, J., & Goldstein, J. (1998). The use of MMR, divers ity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development i n Information Retrieval, pp. 335–336, Melbourne, Australia

work page 1998
[9]

Celikyilmaz, A., Bosselut, A., He, X., & Choi, Y. (2018). Dee p communicating agents for abstractive summarization. In Proceedings of the 16th Annual Conference of the North American Chapter of the Association for Computational Li nguistics: Human Language Technologies, New Orleans, USA

work page 2018
[10]

Chen, Q., Zhu, X., Ling, Z., Wei, S., & Jiang, H. (2016). Distr action-based neural networks for modeling documents. In Proceedings of the 25th International Joint Conference on Artiﬁcial Intelligence , pp. 2754–2760, New York, USA

work page 2016
[11]

Chen, Y.-C., & Bansal, M. (2018). Fast abstractive summariz ation with reinforce-selected sentence rewriting. In Proceedings of the 56th Annual Meeting of the Association fo r Computational Linguistics , pp. 675–686, Melbourne, Australia

work page 2018
[12]

Cheng, J., & Lapata, M. (2016). Neural summarization by extr acting sentences and words. In Proceedings of the 54th Annual Meeting of the Association fo r Computational Lin- guistics, pp. 484–494, Berlin, Germany

work page 2016
[13]

Bengio, Y. (2014). Learning phrase representations using R NN encoder–decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods on Natural Language Processing , pp. 1724–1734, Doha, Qatar

work page 2014
[14]

Clarke, J., & Lapata, M. (2010). Discourse constraints for d ocument compression. Compu- tational Linguistics , 36 (3), 411–441

work page 2010
[15]

S., Bui, T., Kim, S., Chan g, W., & Goharian, N

Cohan, A., Dernoncourt, F., Kim, D. S., Bui, T., Kim, S., Chan g, W., & Goharian, N. (2018). A discourse-aware attention model for abstractive summarization of long documents. In Proceedings of the 2018 Conference of the North American Chapt er of the Association for Computational Linguistics: Human Lang uage Technologies, pp. 615–621, New Orleans, Louisiana

work page 2018
[16]

N., Fan, A., Auli, M., & Grangier, D

Dauphin, Y. N., Fan, A., Auli, M., & Grangier, D. (2017). Lang uage modeling with gated convolutional networks. In Proceedings of the 34th International Conference on Ma- chine Learning, pp. 933–941, Sydney, Australia

work page 2017
[17]

C., & Hinkley, D

Davison, A. C., & Hinkley, D. V. (1997). Bootstrap Methods and Their Application . Cam- bridge University Press

work page 1997
[18]

Devlin, J., Chang, M., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of deep bidirectional transformers for language understanding. CoRR, abs/1810.04805

work page internal anchor Pith review Pith/arXiv arXiv 2018
[19]

B., Wang, C., Gao, J., & Paisley, J

Dieng, A. B., Wang, C., Gao, J., & Paisley, J. (2017). TopicRN N: A recurrent neural network with long-range semantic dependency. In Proceedings of the 5th International Conference on Learning Representations , Toulon, France. 30 Topic-A ware Convolutional Neural Networks for Extreme Sum marization

work page 2017
[20]

Dong, L., Yang, N., Wang, W., Wei, F., Liu, X., Wang, Y., Gao, J ., Zhou, M., & Hon, H. (2019). Uniﬁed language model pre-training for natural lan guage understanding and generation. CoRR, abs/1905.03197

work page arXiv 2019
[21]

Dong, Y., Shen, Y., Crawford, E., van Hoof, H., & Cheung, J. C. K. (2018). BanditSum: Extractive summarization as a contextual bandit. In Proceedings of the 2018 Confer- ence on Empirical Methods in Natural Language Processing , pp. 3739–3748, Brussels, Belgium

work page 2018
[22]

Dorr, B., Zajic, D., & Schwartz, R. (2003). Hedge trimmer: A p arse-and-trim approach to headline generation. In Proceedings of the Text Summarization Workshop at NAACL , pp. 1–8, Edmonton, Canada

work page 2003
[23]

Duchi, J., Hazan, E., & Singer, Y. (2011). Adaptive subgradi ent methods for online learning and stochastic optimization. Journal of Machine Learning Research , 12, 2121–2159

work page 2011
[24]

Durrett, G., Berg-Kirkpatrick, T., & Klein, D. (2016). Lear ning-based single document summarization with compression and anaphoricity constrai nts. In Proceedings of the 54th Annual Meeting of the Association for Computational Ling uistics, pp. 1998–2008,

work page 2016
[25]

Erkan, G., & Radev, D. R. (2004). LexRank: Graph-based lexic al centrality as salience in text summarization. Journal of Artiﬁcial Intelligence Research , 22 (1), 457–479

work page 2004
[26]

Fan, A., Grangier, D., & Auli, M. (2017). Controllable abstr active summarization. CoRR, abs/1711.05217

work page internal anchor Pith review Pith/arXiv arXiv 2017
[27]

Fan, A., Lewis, M., & Dauphin, Y. (2018). Hierarchical neura l story generation. In Pro- ceedings of the 56th Annual Meeting of the Association for Com putational Linguistics ,

work page 2018
[28]

Filatova, E., & Hatzivassiloglou, V. (2004). Event-based e xtractive summarization. In Proceedings of ACL Workshop on Text Summarization Branches O ut, pp. 104–111,

work page 2004
[29]

Gehrmann, S., Deng, Y., & Rush, A. (2018). Bottom-up abstrac tive summarization. In Proceedings of the 2018 Conference on Empirical Methods in Na tural Language Pro- cessing, pp. 4098–4109

work page 2018
[30]

Ghosh, S., Vinyals, O., Strope, B., Roy, S., Dean, T., & Heck, L. (2016). Contextual LSTM (CLSTM) models for large scale NLP tasks. CoRR, abs/1602.06291

work page internal anchor Pith review Pith/arXiv arXiv 2016
[31]

Grusky, M., Naaman, M., & Artzi, Y. (2018). NEWSROOM: A datas et of 1.3 million sum- maries with diverse extractive strategies. In Proceedings of the 16th Annual Confer- ence of the North American Chapter of the Association for Compu tational Linguistics: Human Language Technologies, New Orleans, USA. 31

work page 2018
[32]

Gu, J., Lu, Z., Li, H., & Li, V. O. (2016). Incorporating copyi ng mechanism in sequence- to-sequence learning. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics , pp. 1631–1640, Berlin, Germany

work page 2016
[33]

Hardy, H., Narayan, S., & Vlachos, A. (2019). HighRES: Highl ight-based reference-less eval- uation of summarization. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics , Florence, Italy

work page 2019
[34]

Harman, D., & Over, P. (2004). The eﬀects of human variation in duc summarization evaluation. In Text Summarization Branches Out

work page 2004
[35]

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual lea rning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Patte rn Recognition, pp. 770–778, Las Vegas, USA

work page 2016
[36]

Blunsom, P. (2015). Teaching machines to read and comprehen d. In Advances in Neural Information Processing Systems 28 , pp. 1693–1701. Morgan, Kaufmann

work page 2015
[37]

Hsu, W.-T., Lin, C.-K., Lee, M.-Y., Min, K., Tang, J., & Sun, M . (2018). A uniﬁed model for extractive and abstractive summarization using incons istency loss. In Proceedings of the 56th Annual Meeting of the Association for Computation al Linguistics , pp. 132–141, Melbourne, Australia

work page 2018
[38]

Hu, B., Chen, Q., & Zhu, F. (2015). LCSTS: A large scale chines e short text summarization dataset. In Proceedings of the 2015 Conference on Empirical Methods in Na tural Language Processing, pp. 1967–1972, Lisbon, Portugal

work page 2015
[39]

Jing, H. (2002). Using hidden Markov modeling to decompose h uman-written summaries. Computational Linguistics , 28 (4), 527–544

work page 2002
[40]

Kim, B., Kim, H., & Kim, G. (2018). Abstractive summarizatio n of reddit posts with multi-level memory networks. CoRR, abs/1811.00783

work page internal anchor Pith review Pith/arXiv arXiv 2018
[41]

Kiritchenko, S., & Mohammad, S. (2017). Best-worst scaling more reliable than rating scales: A case study on sentiment intensity annotation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics , pp. 465–470, Vancouver, Canada

work page 2017
[42]

Koupaee, M., & Wang, W. Y. (2018). WikiHow: A large scale text summarization dataset. CoRR, abs/1810.09305. K ˚ ageb¨ ack, M., Mogren, O., Tahmasebi, N., & Dubhashi, D. (2 014). Extractive summa- rization using continuous vector space models. In Proceedings of the Workshop on Continuous Vector Space Models and their Compositionality , pp. 31–39, Gothenbur...

work page internal anchor Pith review Pith/arXiv arXiv 2018
[43]

Kupiec, J., Pedersen, J., & Chen, F. (1995). A trainable docu ment summarizer. In Pro- ceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval , pp. 406–407, Seattle, Washington, USA. 32 Topic-A ware Convolutional Neural Networks for Extreme Sum marization

work page 1995
[44]

Li, C., Xu, W., Li, S., & Gao, S. (2018). Guiding generation fo r abstractive text summariza- tion based on key information guide network. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computati onal Linguistics: Human Language Technologies, pp. 55–60, New Orleans, Louisiana

work page 2018
[45]

Li, L., Zhou, K., Xue, G.-R., Zha, H., & Yu, Y. (2009). Enhanci ng diversity, coverage and balance for summarization through structure learning. In Proceedings of the 18th international Conference on World Wide Web , pp. 71–80, Madrid, Spain

work page 2009
[46]

Li, P., Bing, L., & Lam, W. (2018a). Actor-critic based train ing framework for abstractive summarization. CoRR, abs/1803.11070

work page internal anchor Pith review Pith/arXiv arXiv
[47]

Li, W., Xiao, X., Lyu, Y., & Wang, Y. (2018b). Improving neura l abstractive document summarization with explicit information selection modeli ng. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Pro cessing, pp. 1787– 1796, Brussels, Belgium

work page 2018
[48]

Y., & Hovy, E

Lin, C. Y., & Hovy, E. (2003). Automatic evaluation of summar ies using n-gram co- occurrence statistics. In Proceedings of the 2003 Human Language Technology Confer- ence of the North American Chapter of the Association for Compu tational Linguistics , pp. 71–78, Edmonton, Canada

work page 2003
[49]

J., Saleh, M., Pot, E., Goodrich, B., Sepassi, R., Kai ser, L., & Shazeer, N

Liu, P. J., Saleh, M., Pot, E., Goodrich, B., Sepassi, R., Kai ser, L., & Shazeer, N. (2018). Generating wikipedia by summarizing long sequences. In Proceedings of the 6th In- ternational Conference on Learning Representations , Vancouver Canada

work page 2018
[50]

Liu, Y. (2019). Fine-tune BERT for extractive summarizatio n. CoRR, abs/1903.10318

work page arXiv 2019
[51]

J., Flynn, T

Louviere, J. J., Flynn, T. N., & Marley, A. A. J. (2015). Best-worst scaling: Theory, methods and applications. Cambridge University Press

work page 2015
[52]

J., & Woodworth, G

Louviere, J. J., & Woodworth, G. G. (1991). Best-worst scali ng: A model for the largest diﬀerence judgments. University of Alberta: Working Paper , -

work page 1991
[53]

Mani, I. (2001). Automatic Summarization . Natural language processing. John Benjamins Publishing Company

work page 2001
[54]

Martins, A., & Smith, N. A. (2009). Summarization with a join t model for sentence extrac- tion and compression. In Proceedings of the Workshop on Integer Linear Programming for Natural Language Processing , pp. 1–9, Boulder, Colorado

work page 2009
[55]

Mendes, A., Narayan, S., Miranda, S., Marinho, Z., Martins, A. F. T., & Cohen, S. B. (2019). Jointly extracting and compressing documents with summary state repre- sentations. In Proceedings of the 2019 Conference of the North American Chapt er of the Association for Computational Linguistics: Human Lang uage Technologies, pp. 3955–3966, Minneapolis, Minnesota

work page 2019
[56]

Mihalcea, R., & Tarau, P. (2004). TextRank: Bringing order i nto texts. In Proceedings of the 2004 Conference on Empirical Methods in Natural Langua ge Processing, pp. 404–411, Barcelona, Spain

work page 2004
[57]

Mikolov, T., & Zweig, G. (2012). Context dependent recurren t neural network language model. In Proceedings of the Spoken Language Technology Workshop , pp. 234–239. IEEE. 33

work page 2012
[58]

Nallapati, R., Zhai, F., & Zhou, B. (2017). SummaRuNNer: A re current neural network based sequence model for extractive summarization of docum ents. In Proceedings of the 31st AAAI Conference on Artiﬁcial Intelligence , pp. 3075–3081, San Francisco, California USA

work page 2017
[59]

d., Gulcehre, C., & Xiang , B

Nallapati, R., Zhou, B., Santos, C. d., Gulcehre, C., & Xiang , B. (2016). Abstractive text summarization using sequence-to-sequence RNNs and beyond . In Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learn ing, pp. 280–290,

work page 2016
[60]

Napoles, C., Gormley, M., & Van Durme, B. (2012). Annotated g igaword. In Proceedings of the Joint Workshop on Automatic Knowledge Base Constructi on and Web-scale Knowledge Extraction at NAACL , pp. 95–100, Montreal, Canada

work page 2012
[61]

Neural Extractive Summarization with Side Information

Narayan, S., Papasarantopoulos, N., Cohen, S. B., & Lapata, M. (2017). Neural extractive summarization with side information. CoRR, abs/1704.04530

work page internal anchor Pith review Pith/arXiv arXiv 2017
[62]

Nenkova, A. (2005). Automatic text summarization of newswi re: Lessons learned from the Document Understanding Conference. In Proceedings of the 29th National Conference on Artiﬁcial Intelligence , pp. 1436–1441, Pittsburgh, Pennsylvania, USA

work page 2005
[63]

Nenkova, A., & McKeown, K. (2011). Automatic summarization . Foundations and Trends in Information Retrieval , 5 (2–3), 103–233

work page 2011
[64]

Nenkova, A., Vanderwende, L., & McKeown, K. (2006). A compos itional context sensitive multi-document summarizer: Exploring the factors that inﬂ uence summarization. In Proceedings of the 29th Annual International ACM SIGIR Conferenc e on Research and Development in Information Retrieval , pp. 573–580, Seattle, Washington, USA

work page 2006
[65]

Over, P., Dang, H., & Harman, D. (2007). Duc in context. Information Processing and Management, 43 (6), 1506–1520

work page 2007
[66]

Parveen, D., Ramsl, H.-M., & Strube, M. (2015). Topical cohe rence for graph-based extrac- tive summarization. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1949–1954, Lisbon, Portugal

work page 2015
[67]

Pascanu, R., Mikolov, T., & Bengio, Y. (2013). On the diﬃcult y of training recurrent neural networks. In Proceedings of the 30th International Conference on Internati onal Conference on Machine Learning , pp. 1310–1318, Atlanta, GA, USA

work page 2013
[68]

Pasunuru, R., & Bansal, M. (2018). Multi-reward reinforced summarization with saliency and entailment. In Proceedings of the 16th Annual Conference of the North Americ an Chapter of the Association for Computational Linguistics: Hum an Language Tech- nologies, New Orleans, USA. 34 Topic-A ware Convolutional Neural Networks for Extreme Sum marization

work page 2018
[69]

Paulus, R., Xiong, C., & Socher, R. (2018). A deep reinforced model for abstractive sum- marization. In Proceedings of the 6th International Conference on Learning R epre- sentations, Vancouver, BC, Canada

work page 2018
[70]

Perez-Beltrachini, L., Liu, Y., & Lapata, M. (2019). Genera ting summaries with topic templates and structured convolutional decoders. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics , Florence, Italy

work page 2019
[71]

Peters, M., Neumann, M., Iyyer, M., Gardner, M., Clark, C., L ee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. In Proceedings of the 2018 Confer- ence of the North American Chapter of the Association for Compu tational Linguistics: Human Language Technologies, pp. 2227–2237, New Orleans, Louisiana

work page 2018
[72]

Moon, T. (2013). Generating extractive summaries of scient iﬁc paradigms. Journal of Artiﬁcial Intelligence Research , 46 (1), 165–201

work page 2013
[73]

Topper, M., Winkel, A., & Zhang, Z. (2004). MEAD — A platform f or multidocument multilingual text summarization. In Proceedings of the 4th International Conference on Language Resources and Evaluation , pp. 699–702, Lisbon, Portugal

work page 2004
[74]

M., Chopra, S., & Weston, J

Rush, A. M., Chopra, S., & Weston, J. (2015). A neural attenti on model for abstractive sentence summarization. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing , pp. 379–389, Lisbon, Portugal

work page 2015
[75]

Sandhaus, E. (2008). The New York Times Annotated Corpus. Linguistic Data Consortium, Philadelphia, 6 (12)

work page 2008
[76]

Schilder, F., & Kondadadi, R. (2008). FastSum: Fast and accu rate query-based multi- document summarization. In Proceedings of the 45th Annual Meeting of the Associ- ation of Computational Linguistics and HLT: Short Papers , pp. 205–208, Columbus,

work page 2008
[77]

Schluter, N. (2017). The limits of automatic summarisation according to rouge. In Proceed- ings of the 15th Conference of the European Chapter of the Assoc iation for Compu- tational Linguistics: Short Papers , pp. 41–45, Valencia, Spain

work page 2017
[78]

J., & Manning, C

See, A., Liu, P. J., & Manning, C. D. (2017). Get to the point: S ummarization with pointer- generator networks. In Proceedings of the 55th Annual Meeting of the Association fo r Computational Linguistics , pp. 1073–1083, Vancouver, Canada

work page 2017
[79]

Shen, D., Sun, J.-T., Li, H., Yang, Q., & Chen, Z. (2007). Docu ment summarization using conditional random ﬁelds. In Proceedings of the 20th International Joint Conference on Artiﬁcal intelligence , pp. 2862–2867, Hyderabad, India

work page 2007
[80]

Shi, X., Knight, K., & Yuret, D. (2016). Why neural translati ons are the right length. In Proceedings of the 2016 Conference on Empirical Methods in Na tural Language Processing, pp. 2278–2282, Austin, Texas

work page 2016

Showing first 80 references.

[1] [1]

B., & Martins, A

Almeida, M. B., & Martins, A. F. T. (2013). Fast and robust com pressive summarization with dual decomposition and multi-task learning. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics , pp. 196–206, Soﬁa, Bul- garia

work page 2013

[2] [2]

Angelidis, S., & Lapata, M. (2018). Summarizing opinions: A spect extraction meets sen- timent prediction and they are both weakly supervised. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Process ing, pp. 3675–3686

work page 2018

[3] [3]

Bahdanau, D., Cho, K., & Bengio, Y. (2015). Neural machine tr anslation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning

work page 2015

[4] [4]

Barzilay, R., & Elhadad, M. (1997). Using lexical chains for text summarization. In Pro- ceedings of the ACL Workshop on Intelligent Scalable Text Summ arization, pp. 10–17,

work page 1997

[5] [5]

Barzilay, R., Elhadad, N., & McKeown, K. R. (2002). Inferrin g strategies for sentence ordering in multidocument news summarization. Journal of Artiﬁcial Intelligence Research, 17 (1), 35–55

work page 2002

[6] [6]

Berg-Kirkpatrick, T., Gillick, D., & Klein, D. (2011). Join tly learning to extract and com- press. In Proceedings of the 49th Annual Meeting of the Association fo r Computational Linguistics: Human Language Technologies , pp. 481–490, Portland, Oregon, USA. 29

work page 2011

[7] [7]

M., Ng, A

Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichle t allocation. The Journal of Machine Learning Research, 3, 993–1022

work page 2003

[8] [8]

Carbonell, J., & Goldstein, J. (1998). The use of MMR, divers ity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development i n Information Retrieval, pp. 335–336, Melbourne, Australia

work page 1998

[9] [9]

Celikyilmaz, A., Bosselut, A., He, X., & Choi, Y. (2018). Dee p communicating agents for abstractive summarization. In Proceedings of the 16th Annual Conference of the North American Chapter of the Association for Computational Li nguistics: Human Language Technologies, New Orleans, USA

work page 2018

[10] [10]

Chen, Q., Zhu, X., Ling, Z., Wei, S., & Jiang, H. (2016). Distr action-based neural networks for modeling documents. In Proceedings of the 25th International Joint Conference on Artiﬁcial Intelligence , pp. 2754–2760, New York, USA

work page 2016

[11] [11]

Chen, Y.-C., & Bansal, M. (2018). Fast abstractive summariz ation with reinforce-selected sentence rewriting. In Proceedings of the 56th Annual Meeting of the Association fo r Computational Linguistics , pp. 675–686, Melbourne, Australia

work page 2018

[12] [12]

Cheng, J., & Lapata, M. (2016). Neural summarization by extr acting sentences and words. In Proceedings of the 54th Annual Meeting of the Association fo r Computational Lin- guistics, pp. 484–494, Berlin, Germany

work page 2016

[13] [13]

Bengio, Y. (2014). Learning phrase representations using R NN encoder–decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods on Natural Language Processing , pp. 1724–1734, Doha, Qatar

work page 2014

[14] [14]

Clarke, J., & Lapata, M. (2010). Discourse constraints for d ocument compression. Compu- tational Linguistics , 36 (3), 411–441

work page 2010

[15] [15]

S., Bui, T., Kim, S., Chan g, W., & Goharian, N

Cohan, A., Dernoncourt, F., Kim, D. S., Bui, T., Kim, S., Chan g, W., & Goharian, N. (2018). A discourse-aware attention model for abstractive summarization of long documents. In Proceedings of the 2018 Conference of the North American Chapt er of the Association for Computational Linguistics: Human Lang uage Technologies, pp. 615–621, New Orleans, Louisiana

work page 2018

[16] [16]

N., Fan, A., Auli, M., & Grangier, D

Dauphin, Y. N., Fan, A., Auli, M., & Grangier, D. (2017). Lang uage modeling with gated convolutional networks. In Proceedings of the 34th International Conference on Ma- chine Learning, pp. 933–941, Sydney, Australia

work page 2017

[17] [17]

C., & Hinkley, D

Davison, A. C., & Hinkley, D. V. (1997). Bootstrap Methods and Their Application . Cam- bridge University Press

work page 1997

[18] [18]

Devlin, J., Chang, M., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of deep bidirectional transformers for language understanding. CoRR, abs/1810.04805

work page internal anchor Pith review Pith/arXiv arXiv 2018

[19] [19]

B., Wang, C., Gao, J., & Paisley, J

Dieng, A. B., Wang, C., Gao, J., & Paisley, J. (2017). TopicRN N: A recurrent neural network with long-range semantic dependency. In Proceedings of the 5th International Conference on Learning Representations , Toulon, France. 30 Topic-A ware Convolutional Neural Networks for Extreme Sum marization

work page 2017

[20] [20]

Dong, L., Yang, N., Wang, W., Wei, F., Liu, X., Wang, Y., Gao, J ., Zhou, M., & Hon, H. (2019). Uniﬁed language model pre-training for natural lan guage understanding and generation. CoRR, abs/1905.03197

work page arXiv 2019

[21] [21]

Dong, Y., Shen, Y., Crawford, E., van Hoof, H., & Cheung, J. C. K. (2018). BanditSum: Extractive summarization as a contextual bandit. In Proceedings of the 2018 Confer- ence on Empirical Methods in Natural Language Processing , pp. 3739–3748, Brussels, Belgium

work page 2018

[22] [22]

Dorr, B., Zajic, D., & Schwartz, R. (2003). Hedge trimmer: A p arse-and-trim approach to headline generation. In Proceedings of the Text Summarization Workshop at NAACL , pp. 1–8, Edmonton, Canada

work page 2003

[23] [23]

Duchi, J., Hazan, E., & Singer, Y. (2011). Adaptive subgradi ent methods for online learning and stochastic optimization. Journal of Machine Learning Research , 12, 2121–2159

work page 2011

[24] [24]

Durrett, G., Berg-Kirkpatrick, T., & Klein, D. (2016). Lear ning-based single document summarization with compression and anaphoricity constrai nts. In Proceedings of the 54th Annual Meeting of the Association for Computational Ling uistics, pp. 1998–2008,

work page 2016

[25] [25]

Erkan, G., & Radev, D. R. (2004). LexRank: Graph-based lexic al centrality as salience in text summarization. Journal of Artiﬁcial Intelligence Research , 22 (1), 457–479

work page 2004

[26] [26]

Fan, A., Grangier, D., & Auli, M. (2017). Controllable abstr active summarization. CoRR, abs/1711.05217

work page internal anchor Pith review Pith/arXiv arXiv 2017

[27] [27]

Fan, A., Lewis, M., & Dauphin, Y. (2018). Hierarchical neura l story generation. In Pro- ceedings of the 56th Annual Meeting of the Association for Com putational Linguistics ,

work page 2018

[28] [28]

Filatova, E., & Hatzivassiloglou, V. (2004). Event-based e xtractive summarization. In Proceedings of ACL Workshop on Text Summarization Branches O ut, pp. 104–111,

work page 2004

[29] [29]

Gehrmann, S., Deng, Y., & Rush, A. (2018). Bottom-up abstrac tive summarization. In Proceedings of the 2018 Conference on Empirical Methods in Na tural Language Pro- cessing, pp. 4098–4109

work page 2018

[30] [30]

Ghosh, S., Vinyals, O., Strope, B., Roy, S., Dean, T., & Heck, L. (2016). Contextual LSTM (CLSTM) models for large scale NLP tasks. CoRR, abs/1602.06291

work page internal anchor Pith review Pith/arXiv arXiv 2016

[31] [31]

Grusky, M., Naaman, M., & Artzi, Y. (2018). NEWSROOM: A datas et of 1.3 million sum- maries with diverse extractive strategies. In Proceedings of the 16th Annual Confer- ence of the North American Chapter of the Association for Compu tational Linguistics: Human Language Technologies, New Orleans, USA. 31

work page 2018

[32] [32]

Gu, J., Lu, Z., Li, H., & Li, V. O. (2016). Incorporating copyi ng mechanism in sequence- to-sequence learning. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics , pp. 1631–1640, Berlin, Germany

work page 2016

[33] [33]

Hardy, H., Narayan, S., & Vlachos, A. (2019). HighRES: Highl ight-based reference-less eval- uation of summarization. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics , Florence, Italy

work page 2019

[34] [34]

Harman, D., & Over, P. (2004). The eﬀects of human variation in duc summarization evaluation. In Text Summarization Branches Out

work page 2004

[35] [35]

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual lea rning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Patte rn Recognition, pp. 770–778, Las Vegas, USA

work page 2016

[36] [36]

Blunsom, P. (2015). Teaching machines to read and comprehen d. In Advances in Neural Information Processing Systems 28 , pp. 1693–1701. Morgan, Kaufmann

work page 2015

[37] [37]

Hsu, W.-T., Lin, C.-K., Lee, M.-Y., Min, K., Tang, J., & Sun, M . (2018). A uniﬁed model for extractive and abstractive summarization using incons istency loss. In Proceedings of the 56th Annual Meeting of the Association for Computation al Linguistics , pp. 132–141, Melbourne, Australia

work page 2018

[38] [38]

Hu, B., Chen, Q., & Zhu, F. (2015). LCSTS: A large scale chines e short text summarization dataset. In Proceedings of the 2015 Conference on Empirical Methods in Na tural Language Processing, pp. 1967–1972, Lisbon, Portugal

work page 2015

[39] [39]

Jing, H. (2002). Using hidden Markov modeling to decompose h uman-written summaries. Computational Linguistics , 28 (4), 527–544

work page 2002

[40] [40]

Kim, B., Kim, H., & Kim, G. (2018). Abstractive summarizatio n of reddit posts with multi-level memory networks. CoRR, abs/1811.00783

work page internal anchor Pith review Pith/arXiv arXiv 2018

[41] [41]

Kiritchenko, S., & Mohammad, S. (2017). Best-worst scaling more reliable than rating scales: A case study on sentiment intensity annotation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics , pp. 465–470, Vancouver, Canada

work page 2017

[42] [42]

Koupaee, M., & Wang, W. Y. (2018). WikiHow: A large scale text summarization dataset. CoRR, abs/1810.09305. K ˚ ageb¨ ack, M., Mogren, O., Tahmasebi, N., & Dubhashi, D. (2 014). Extractive summa- rization using continuous vector space models. In Proceedings of the Workshop on Continuous Vector Space Models and their Compositionality , pp. 31–39, Gothenbur...

work page internal anchor Pith review Pith/arXiv arXiv 2018

[43] [43]

Kupiec, J., Pedersen, J., & Chen, F. (1995). A trainable docu ment summarizer. In Pro- ceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval , pp. 406–407, Seattle, Washington, USA. 32 Topic-A ware Convolutional Neural Networks for Extreme Sum marization

work page 1995

[44] [44]

Li, C., Xu, W., Li, S., & Gao, S. (2018). Guiding generation fo r abstractive text summariza- tion based on key information guide network. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computati onal Linguistics: Human Language Technologies, pp. 55–60, New Orleans, Louisiana

work page 2018

[45] [45]

Li, L., Zhou, K., Xue, G.-R., Zha, H., & Yu, Y. (2009). Enhanci ng diversity, coverage and balance for summarization through structure learning. In Proceedings of the 18th international Conference on World Wide Web , pp. 71–80, Madrid, Spain

work page 2009

[46] [46]

Li, P., Bing, L., & Lam, W. (2018a). Actor-critic based train ing framework for abstractive summarization. CoRR, abs/1803.11070

work page internal anchor Pith review Pith/arXiv arXiv

[47] [47]

Li, W., Xiao, X., Lyu, Y., & Wang, Y. (2018b). Improving neura l abstractive document summarization with explicit information selection modeli ng. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Pro cessing, pp. 1787– 1796, Brussels, Belgium

work page 2018

[48] [48]

Y., & Hovy, E

Lin, C. Y., & Hovy, E. (2003). Automatic evaluation of summar ies using n-gram co- occurrence statistics. In Proceedings of the 2003 Human Language Technology Confer- ence of the North American Chapter of the Association for Compu tational Linguistics , pp. 71–78, Edmonton, Canada

work page 2003

[49] [49]

J., Saleh, M., Pot, E., Goodrich, B., Sepassi, R., Kai ser, L., & Shazeer, N

Liu, P. J., Saleh, M., Pot, E., Goodrich, B., Sepassi, R., Kai ser, L., & Shazeer, N. (2018). Generating wikipedia by summarizing long sequences. In Proceedings of the 6th In- ternational Conference on Learning Representations , Vancouver Canada

work page 2018

[50] [50]

Liu, Y. (2019). Fine-tune BERT for extractive summarizatio n. CoRR, abs/1903.10318

work page arXiv 2019

[51] [51]

J., Flynn, T

Louviere, J. J., Flynn, T. N., & Marley, A. A. J. (2015). Best-worst scaling: Theory, methods and applications. Cambridge University Press

work page 2015

[52] [52]

J., & Woodworth, G

Louviere, J. J., & Woodworth, G. G. (1991). Best-worst scali ng: A model for the largest diﬀerence judgments. University of Alberta: Working Paper , -

work page 1991

[53] [53]

Mani, I. (2001). Automatic Summarization . Natural language processing. John Benjamins Publishing Company

work page 2001

[54] [54]

Martins, A., & Smith, N. A. (2009). Summarization with a join t model for sentence extrac- tion and compression. In Proceedings of the Workshop on Integer Linear Programming for Natural Language Processing , pp. 1–9, Boulder, Colorado

work page 2009

[55] [55]

Mendes, A., Narayan, S., Miranda, S., Marinho, Z., Martins, A. F. T., & Cohen, S. B. (2019). Jointly extracting and compressing documents with summary state repre- sentations. In Proceedings of the 2019 Conference of the North American Chapt er of the Association for Computational Linguistics: Human Lang uage Technologies, pp. 3955–3966, Minneapolis, Minnesota

work page 2019

[56] [56]

Mihalcea, R., & Tarau, P. (2004). TextRank: Bringing order i nto texts. In Proceedings of the 2004 Conference on Empirical Methods in Natural Langua ge Processing, pp. 404–411, Barcelona, Spain

work page 2004

[57] [57]

Mikolov, T., & Zweig, G. (2012). Context dependent recurren t neural network language model. In Proceedings of the Spoken Language Technology Workshop , pp. 234–239. IEEE. 33

work page 2012

[58] [58]

Nallapati, R., Zhai, F., & Zhou, B. (2017). SummaRuNNer: A re current neural network based sequence model for extractive summarization of docum ents. In Proceedings of the 31st AAAI Conference on Artiﬁcial Intelligence , pp. 3075–3081, San Francisco, California USA

work page 2017

[59] [59]

d., Gulcehre, C., & Xiang , B

Nallapati, R., Zhou, B., Santos, C. d., Gulcehre, C., & Xiang , B. (2016). Abstractive text summarization using sequence-to-sequence RNNs and beyond . In Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learn ing, pp. 280–290,

work page 2016

[60] [60]

Napoles, C., Gormley, M., & Van Durme, B. (2012). Annotated g igaword. In Proceedings of the Joint Workshop on Automatic Knowledge Base Constructi on and Web-scale Knowledge Extraction at NAACL , pp. 95–100, Montreal, Canada

work page 2012

[61] [61]

Neural Extractive Summarization with Side Information

Narayan, S., Papasarantopoulos, N., Cohen, S. B., & Lapata, M. (2017). Neural extractive summarization with side information. CoRR, abs/1704.04530

work page internal anchor Pith review Pith/arXiv arXiv 2017

[62] [62]

Nenkova, A. (2005). Automatic text summarization of newswi re: Lessons learned from the Document Understanding Conference. In Proceedings of the 29th National Conference on Artiﬁcial Intelligence , pp. 1436–1441, Pittsburgh, Pennsylvania, USA

work page 2005

[63] [63]

Nenkova, A., & McKeown, K. (2011). Automatic summarization . Foundations and Trends in Information Retrieval , 5 (2–3), 103–233

work page 2011

[64] [64]

Nenkova, A., Vanderwende, L., & McKeown, K. (2006). A compos itional context sensitive multi-document summarizer: Exploring the factors that inﬂ uence summarization. In Proceedings of the 29th Annual International ACM SIGIR Conferenc e on Research and Development in Information Retrieval , pp. 573–580, Seattle, Washington, USA

work page 2006

[65] [65]

Over, P., Dang, H., & Harman, D. (2007). Duc in context. Information Processing and Management, 43 (6), 1506–1520

work page 2007

[66] [66]

Parveen, D., Ramsl, H.-M., & Strube, M. (2015). Topical cohe rence for graph-based extrac- tive summarization. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1949–1954, Lisbon, Portugal

work page 2015

[67] [67]

Pascanu, R., Mikolov, T., & Bengio, Y. (2013). On the diﬃcult y of training recurrent neural networks. In Proceedings of the 30th International Conference on Internati onal Conference on Machine Learning , pp. 1310–1318, Atlanta, GA, USA

work page 2013

[68] [68]

Pasunuru, R., & Bansal, M. (2018). Multi-reward reinforced summarization with saliency and entailment. In Proceedings of the 16th Annual Conference of the North Americ an Chapter of the Association for Computational Linguistics: Hum an Language Tech- nologies, New Orleans, USA. 34 Topic-A ware Convolutional Neural Networks for Extreme Sum marization

work page 2018

[69] [69]

Paulus, R., Xiong, C., & Socher, R. (2018). A deep reinforced model for abstractive sum- marization. In Proceedings of the 6th International Conference on Learning R epre- sentations, Vancouver, BC, Canada

work page 2018

[70] [70]

Perez-Beltrachini, L., Liu, Y., & Lapata, M. (2019). Genera ting summaries with topic templates and structured convolutional decoders. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics , Florence, Italy

work page 2019

[71] [71]

Peters, M., Neumann, M., Iyyer, M., Gardner, M., Clark, C., L ee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. In Proceedings of the 2018 Confer- ence of the North American Chapter of the Association for Compu tational Linguistics: Human Language Technologies, pp. 2227–2237, New Orleans, Louisiana

work page 2018

[72] [72]

Moon, T. (2013). Generating extractive summaries of scient iﬁc paradigms. Journal of Artiﬁcial Intelligence Research , 46 (1), 165–201

work page 2013

[73] [73]

Topper, M., Winkel, A., & Zhang, Z. (2004). MEAD — A platform f or multidocument multilingual text summarization. In Proceedings of the 4th International Conference on Language Resources and Evaluation , pp. 699–702, Lisbon, Portugal

work page 2004

[74] [74]

M., Chopra, S., & Weston, J

Rush, A. M., Chopra, S., & Weston, J. (2015). A neural attenti on model for abstractive sentence summarization. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing , pp. 379–389, Lisbon, Portugal

work page 2015

[75] [75]

Sandhaus, E. (2008). The New York Times Annotated Corpus. Linguistic Data Consortium, Philadelphia, 6 (12)

work page 2008

[76] [76]

Schilder, F., & Kondadadi, R. (2008). FastSum: Fast and accu rate query-based multi- document summarization. In Proceedings of the 45th Annual Meeting of the Associ- ation of Computational Linguistics and HLT: Short Papers , pp. 205–208, Columbus,

work page 2008

[77] [77]

Schluter, N. (2017). The limits of automatic summarisation according to rouge. In Proceed- ings of the 15th Conference of the European Chapter of the Assoc iation for Compu- tational Linguistics: Short Papers , pp. 41–45, Valencia, Spain

work page 2017

[78] [78]

J., & Manning, C

See, A., Liu, P. J., & Manning, C. D. (2017). Get to the point: S ummarization with pointer- generator networks. In Proceedings of the 55th Annual Meeting of the Association fo r Computational Linguistics , pp. 1073–1083, Vancouver, Canada

work page 2017

[79] [79]

Shen, D., Sun, J.-T., Li, H., Yang, Q., & Chen, Z. (2007). Docu ment summarization using conditional random ﬁelds. In Proceedings of the 20th International Joint Conference on Artiﬁcal intelligence , pp. 2862–2867, Hyderabad, India

work page 2007

[80] [80]

Shi, X., Knight, K., & Yuret, D. (2016). Why neural translati ons are the right length. In Proceedings of the 2016 Conference on Empirical Methods in Na tural Language Processing, pp. 2278–2282, Austin, Texas

work page 2016