pith. sign in

arxiv: 1907.08722 · v1 · pith:ZGZEP63Qnew · submitted 2019-07-19 · 💻 cs.CL

What is this Article about? Extreme Summarization with Topic-aware Convolutional Neural Networks

Pith reviewed 2026-05-24 18:56 UTC · model grok-4.3

classification 💻 cs.CL
keywords extreme summarizationabstractive summarizationconvolutional neural networkstopic-aware modelsnews summarizationBBC datasetone-sentence summarylong-range dependencies
0
0 comments X

The pith

Topic-aware CNNs outperform extractive oracles and prior abstractive models on one-sentence BBC news summaries.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces extreme summarization as the task of producing a single sentence that answers what a news article is about. It argues that this task inherently requires abstractive modeling because extractive methods cannot synthesize sufficiently concise content. The authors harvest a large BBC news dataset to support the task and introduce a convolutional neural network architecture conditioned on the article's topics. Experiments demonstrate that the model captures long-range dependencies, identifies pertinent content, and exceeds both an oracle extractive baseline and existing abstractive systems in automatic and human evaluations.

Core claim

The central claim is that an abstractive summarization model built entirely from convolutional neural networks and conditioned on article topics can capture long-range dependencies in documents and recognize pertinent content, thereby outperforming an oracle extractive system and state-of-the-art abstractive approaches on the extreme summarization task when tested on the collected BBC dataset.

What carries the argument

Topic-aware convolutional neural network that conditions the entire summarization process on the article's topics to model document content.

If this is right

  • Extreme summarization requires abstractive rather than extractive methods.
  • Conditioning on topics improves recognition of relevant document content.
  • Convolutional networks alone can handle long-range dependencies for this task.
  • The BBC dataset serves as a benchmark for one-sentence abstractive summarization.
  • Human judgments align with automatic metrics in confirming model superiority.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The topic-conditioning mechanism could transfer to other short-form generation tasks such as title creation.
  • Pure CNN architectures may prove competitive with recurrent models on additional document-level NLP problems.
  • The dataset harvesting approach could be replicated for non-English news domains to test cross-lingual generalization.
  • Combining the topic signal with other conditioning variables might further improve content selection.

Load-bearing premise

Extreme summarization by its nature cannot be handled adequately by extractive strategies and therefore requires an abstractive approach.

What would settle it

An extractive system that matches or exceeds the proposed model's human evaluation scores on one-sentence summaries from the BBC dataset.

Figures

Figures reproduced from arXiv: 1907.08722 by Mirella Lapata, Shashi Narayan, Shay B. Cohen.

Figure 1
Figure 1. Figure 1: Example from our extreme summarization dataset sh [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Topic-conditioned convolutional model for extre [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Example output summaries on the XSum test set with [ [PITH_FULL_IMAGE:figures/full_fig_p020_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Example output summaries on the Newsroom Abstractive test set with [ROUGE-1, ROUGE-2 and ROUGE-L] scores, gold standard reference, and corresponding questions. Words highlighted in blue are either the right answer or constitute appropriate context for inferring it; words in red lead to the wrong answer. ranked worst with the lowest score of −0.397. In line with our findings in Section 6.3, participants fou… view at source ↗
Figure 5
Figure 5. Figure 5: Summaries ranked from least to most informative fr [PITH_FULL_IMAGE:figures/full_fig_p026_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Percentage of n-grams in test summaries seen in training summaries. summary and asked to decide whether it was informative (i.e., it relayed pertinent content from the document), partially informative, or uninformative. The study was conducted on Amazon Mechanical Turk with the same 100 test documents used of our judgment elicitation and QA studies on XSum and Newsroom-Abs. We collected judgments from thre… view at source ↗
Figure 7
Figure 7. Figure 7: Type-Token Ratio for summary n-grams in the entire dataset. phrases. As an example consider the summary from [PITH_FULL_IMAGE:figures/full_fig_p028_7.png] view at source ↗
read the original abstract

We introduce 'extreme summarization', a new single-document summarization task which aims at creating a short, one-sentence news summary answering the question ``What is the article about?''. We argue that extreme summarization, by nature, is not amenable to extractive strategies and requires an abstractive modeling approach. In the hope of driving research on this task further: (a) we collect a real-world, large scale dataset by harvesting online articles from the British Broadcasting Corporation (BBC); and (b) propose a novel abstractive model which is conditioned on the article's topics and based entirely on convolutional neural networks. We demonstrate experimentally that this architecture captures long-range dependencies in a document and recognizes pertinent content, outperforming an oracle extractive system and state-of-the-art abstractive approaches when evaluated automatically and by humans on the extreme summarization dataset.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces extreme summarization as a new single-document task that requires generating a one-sentence abstractive summary answering 'What is the article about?'. It collects a large BBC news dataset, presents a topic-aware CNN architecture, and claims that this model captures long-range dependencies, outperforms an oracle extractive baseline, and exceeds prior abstractive systems on both automatic metrics and human evaluation.

Significance. If the outperformance claims are robust, the work would be significant for defining a challenging new summarization benchmark and for showing that topic conditioning enables CNNs to handle document-level content selection for ultra-short outputs. The new dataset and the fully convolutional design are concrete contributions that could be reused.

major comments (2)
  1. [Abstract, §1] Abstract and §1 (motivation): the assertion that extreme summarization 'by nature' is not amenable to extractive strategies rests on an oracle extractive baseline that selects the sentence maximizing n-gram overlap with the reference summary. This baseline does not test whether an extractive system optimized for the 'what the article is about' criterion could suffice, leaving the necessity of the topic-aware CNN under-supported.
  2. [§4] §4 (experiments): the human evaluation protocol, including the exact instructions given to annotators, inter-annotator agreement statistics, and significance tests for the reported outperformance over the oracle and SOTA baselines, must be reported in full; without them the claim that the model 'recognizes pertinent content' cannot be assessed.
minor comments (2)
  1. [§3] §3 (model): the precise integration of the topic distribution into the CNN layers (e.g., whether it is concatenated at every filter or only at the first layer) should be stated explicitly with an equation.
  2. [Table 2] Table 2 or equivalent: report the number of parameters and training time for the proposed model alongside the baselines to allow direct comparison of computational cost.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. Below we respond point-by-point to the major comments and indicate the revisions we will make.

read point-by-point responses
  1. Referee: [Abstract, §1] Abstract and §1 (motivation): the assertion that extreme summarization 'by nature' is not amenable to extractive strategies rests on an oracle extractive baseline that selects the sentence maximizing n-gram overlap with the reference summary. This baseline does not test whether an extractive system optimized for the 'what the article is about' criterion could suffice, leaving the necessity of the topic-aware CNN under-supported.

    Authors: The reference summary itself embodies the 'what the article is about' criterion. Consequently, the oracle that selects the sentence maximizing n-gram overlap with this reference constitutes the theoretical upper bound achievable by any extractive system. No extractive model, regardless of its optimization criterion, can surpass this oracle. We will revise the abstract and §1 to explicitly state that the oracle demonstrates the inherent limitations of sentence extraction for this task, thereby supporting the need for an abstractive approach. revision: yes

  2. Referee: [§4] §4 (experiments): the human evaluation protocol, including the exact instructions given to annotators, inter-annotator agreement statistics, and significance tests for the reported outperformance over the oracle and SOTA baselines, must be reported in full; without them the claim that the model 'recognizes pertinent content' cannot be assessed.

    Authors: We agree that complete reporting of the human evaluation protocol is necessary. In the revised manuscript we will add a dedicated subsection detailing the exact instructions given to annotators, the inter-annotator agreement statistics, and the statistical significance tests (including p-values) for all reported comparisons against the oracle extractive baseline and prior abstractive systems. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical model proposal and evaluation on held-out data

full rationale

The paper defines a new task, collects a fresh BBC dataset, introduces a topic-aware CNN architecture, and reports automatic and human evaluations against an oracle extractive baseline and prior abstractive systems. The motivation that extreme summarization 'by nature' requires abstractive modeling is presented as an argument rather than a derived claim from equations or prior self-citations. No predictions reduce to fitted parameters by construction, no uniqueness theorems are imported from the authors' own work, and the central outperformance result rests on independent test-set measurements rather than self-referential definitions or renamings. This is standard empirical NLP work with no load-bearing circular steps.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

Based solely on abstract; the work centers on a new task and empirical model rather than heavy unstated axioms. Standard neural network assumptions apply.

free parameters (2)
  • CNN filter sizes and layer counts
    Standard tunable hyperparameters in the convolutional architecture for text modeling.
  • Topic distribution parameters
    Model conditions on article topics, implying fitted topic model outputs from data.
axioms (1)
  • domain assumption Convolutional neural networks can capture long-range dependencies in documents when appropriately configured.
    Directly invoked in the claim that the architecture captures long-range dependencies.

pith-pipeline@v0.9.0 · 5677 in / 1177 out tokens · 30589 ms · 2026-05-24T18:56:24.490673+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

104 extracted references · 104 canonical work pages · 8 internal anchors

  1. [1]

    B., & Martins, A

    Almeida, M. B., & Martins, A. F. T. (2013). Fast and robust com pressive summarization with dual decomposition and multi-task learning. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics , pp. 196–206, Sofia, Bul- garia

  2. [2]

    Angelidis, S., & Lapata, M. (2018). Summarizing opinions: A spect extraction meets sen- timent prediction and they are both weakly supervised. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Process ing, pp. 3675–3686

  3. [3]

    Bahdanau, D., Cho, K., & Bengio, Y. (2015). Neural machine tr anslation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning

  4. [4]

    Barzilay, R., & Elhadad, M. (1997). Using lexical chains for text summarization. In Pro- ceedings of the ACL Workshop on Intelligent Scalable Text Summ arization, pp. 10–17,

  5. [5]

    Barzilay, R., Elhadad, N., & McKeown, K. R. (2002). Inferrin g strategies for sentence ordering in multidocument news summarization. Journal of Artificial Intelligence Research, 17 (1), 35–55

  6. [6]

    Berg-Kirkpatrick, T., Gillick, D., & Klein, D. (2011). Join tly learning to extract and com- press. In Proceedings of the 49th Annual Meeting of the Association fo r Computational Linguistics: Human Language Technologies , pp. 481–490, Portland, Oregon, USA. 29

  7. [7]

    M., Ng, A

    Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichle t allocation. The Journal of Machine Learning Research, 3, 993–1022

  8. [8]

    Carbonell, J., & Goldstein, J. (1998). The use of MMR, divers ity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development i n Information Retrieval, pp. 335–336, Melbourne, Australia

  9. [9]

    Celikyilmaz, A., Bosselut, A., He, X., & Choi, Y. (2018). Dee p communicating agents for abstractive summarization. In Proceedings of the 16th Annual Conference of the North American Chapter of the Association for Computational Li nguistics: Human Language Technologies, New Orleans, USA

  10. [10]

    Chen, Q., Zhu, X., Ling, Z., Wei, S., & Jiang, H. (2016). Distr action-based neural networks for modeling documents. In Proceedings of the 25th International Joint Conference on Artificial Intelligence , pp. 2754–2760, New York, USA

  11. [11]

    Chen, Y.-C., & Bansal, M. (2018). Fast abstractive summariz ation with reinforce-selected sentence rewriting. In Proceedings of the 56th Annual Meeting of the Association fo r Computational Linguistics , pp. 675–686, Melbourne, Australia

  12. [12]

    Cheng, J., & Lapata, M. (2016). Neural summarization by extr acting sentences and words. In Proceedings of the 54th Annual Meeting of the Association fo r Computational Lin- guistics, pp. 484–494, Berlin, Germany

  13. [13]

    Bengio, Y. (2014). Learning phrase representations using R NN encoder–decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods on Natural Language Processing , pp. 1724–1734, Doha, Qatar

  14. [14]

    Clarke, J., & Lapata, M. (2010). Discourse constraints for d ocument compression. Compu- tational Linguistics , 36 (3), 411–441

  15. [15]

    S., Bui, T., Kim, S., Chan g, W., & Goharian, N

    Cohan, A., Dernoncourt, F., Kim, D. S., Bui, T., Kim, S., Chan g, W., & Goharian, N. (2018). A discourse-aware attention model for abstractive summarization of long documents. In Proceedings of the 2018 Conference of the North American Chapt er of the Association for Computational Linguistics: Human Lang uage Technologies, pp. 615–621, New Orleans, Louisiana

  16. [16]

    N., Fan, A., Auli, M., & Grangier, D

    Dauphin, Y. N., Fan, A., Auli, M., & Grangier, D. (2017). Lang uage modeling with gated convolutional networks. In Proceedings of the 34th International Conference on Ma- chine Learning, pp. 933–941, Sydney, Australia

  17. [17]

    C., & Hinkley, D

    Davison, A. C., & Hinkley, D. V. (1997). Bootstrap Methods and Their Application . Cam- bridge University Press

  18. [18]

    Devlin, J., Chang, M., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of deep bidirectional transformers for language understanding. CoRR, abs/1810.04805

  19. [19]

    B., Wang, C., Gao, J., & Paisley, J

    Dieng, A. B., Wang, C., Gao, J., & Paisley, J. (2017). TopicRN N: A recurrent neural network with long-range semantic dependency. In Proceedings of the 5th International Conference on Learning Representations , Toulon, France. 30 Topic-A ware Convolutional Neural Networks for Extreme Sum marization

  20. [20]

    Dong, L., Yang, N., Wang, W., Wei, F., Liu, X., Wang, Y., Gao, J ., Zhou, M., & Hon, H. (2019). Unified language model pre-training for natural lan guage understanding and generation. CoRR, abs/1905.03197

  21. [21]

    Dong, Y., Shen, Y., Crawford, E., van Hoof, H., & Cheung, J. C. K. (2018). BanditSum: Extractive summarization as a contextual bandit. In Proceedings of the 2018 Confer- ence on Empirical Methods in Natural Language Processing , pp. 3739–3748, Brussels, Belgium

  22. [22]

    Dorr, B., Zajic, D., & Schwartz, R. (2003). Hedge trimmer: A p arse-and-trim approach to headline generation. In Proceedings of the Text Summarization Workshop at NAACL , pp. 1–8, Edmonton, Canada

  23. [23]

    Duchi, J., Hazan, E., & Singer, Y. (2011). Adaptive subgradi ent methods for online learning and stochastic optimization. Journal of Machine Learning Research , 12, 2121–2159

  24. [24]

    Durrett, G., Berg-Kirkpatrick, T., & Klein, D. (2016). Lear ning-based single document summarization with compression and anaphoricity constrai nts. In Proceedings of the 54th Annual Meeting of the Association for Computational Ling uistics, pp. 1998–2008,

  25. [25]

    Erkan, G., & Radev, D. R. (2004). LexRank: Graph-based lexic al centrality as salience in text summarization. Journal of Artificial Intelligence Research , 22 (1), 457–479

  26. [26]

    Fan, A., Grangier, D., & Auli, M. (2017). Controllable abstr active summarization. CoRR, abs/1711.05217

  27. [27]

    Fan, A., Lewis, M., & Dauphin, Y. (2018). Hierarchical neura l story generation. In Pro- ceedings of the 56th Annual Meeting of the Association for Com putational Linguistics ,

  28. [28]

    Filatova, E., & Hatzivassiloglou, V. (2004). Event-based e xtractive summarization. In Proceedings of ACL Workshop on Text Summarization Branches O ut, pp. 104–111,

  29. [29]

    Gehrmann, S., Deng, Y., & Rush, A. (2018). Bottom-up abstrac tive summarization. In Proceedings of the 2018 Conference on Empirical Methods in Na tural Language Pro- cessing, pp. 4098–4109

  30. [30]

    Ghosh, S., Vinyals, O., Strope, B., Roy, S., Dean, T., & Heck, L. (2016). Contextual LSTM (CLSTM) models for large scale NLP tasks. CoRR, abs/1602.06291

  31. [31]

    Grusky, M., Naaman, M., & Artzi, Y. (2018). NEWSROOM: A datas et of 1.3 million sum- maries with diverse extractive strategies. In Proceedings of the 16th Annual Confer- ence of the North American Chapter of the Association for Compu tational Linguistics: Human Language Technologies, New Orleans, USA. 31

  32. [32]

    Gu, J., Lu, Z., Li, H., & Li, V. O. (2016). Incorporating copyi ng mechanism in sequence- to-sequence learning. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics , pp. 1631–1640, Berlin, Germany

  33. [33]

    Hardy, H., Narayan, S., & Vlachos, A. (2019). HighRES: Highl ight-based reference-less eval- uation of summarization. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics , Florence, Italy

  34. [34]

    Harman, D., & Over, P. (2004). The effects of human variation in duc summarization evaluation. In Text Summarization Branches Out

  35. [35]

    He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual lea rning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Patte rn Recognition, pp. 770–778, Las Vegas, USA

  36. [36]

    Blunsom, P. (2015). Teaching machines to read and comprehen d. In Advances in Neural Information Processing Systems 28 , pp. 1693–1701. Morgan, Kaufmann

  37. [37]

    Hsu, W.-T., Lin, C.-K., Lee, M.-Y., Min, K., Tang, J., & Sun, M . (2018). A unified model for extractive and abstractive summarization using incons istency loss. In Proceedings of the 56th Annual Meeting of the Association for Computation al Linguistics , pp. 132–141, Melbourne, Australia

  38. [38]

    Hu, B., Chen, Q., & Zhu, F. (2015). LCSTS: A large scale chines e short text summarization dataset. In Proceedings of the 2015 Conference on Empirical Methods in Na tural Language Processing, pp. 1967–1972, Lisbon, Portugal

  39. [39]

    Jing, H. (2002). Using hidden Markov modeling to decompose h uman-written summaries. Computational Linguistics , 28 (4), 527–544

  40. [40]

    Kim, B., Kim, H., & Kim, G. (2018). Abstractive summarizatio n of reddit posts with multi-level memory networks. CoRR, abs/1811.00783

  41. [41]

    Kiritchenko, S., & Mohammad, S. (2017). Best-worst scaling more reliable than rating scales: A case study on sentiment intensity annotation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics , pp. 465–470, Vancouver, Canada

  42. [42]

    Koupaee, M., & Wang, W. Y. (2018). WikiHow: A large scale text summarization dataset. CoRR, abs/1810.09305. K ˚ ageb¨ ack, M., Mogren, O., Tahmasebi, N., & Dubhashi, D. (2 014). Extractive summa- rization using continuous vector space models. In Proceedings of the Workshop on Continuous Vector Space Models and their Compositionality , pp. 31–39, Gothenbur...

  43. [43]

    Kupiec, J., Pedersen, J., & Chen, F. (1995). A trainable docu ment summarizer. In Pro- ceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval , pp. 406–407, Seattle, Washington, USA. 32 Topic-A ware Convolutional Neural Networks for Extreme Sum marization

  44. [44]

    Li, C., Xu, W., Li, S., & Gao, S. (2018). Guiding generation fo r abstractive text summariza- tion based on key information guide network. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computati onal Linguistics: Human Language Technologies, pp. 55–60, New Orleans, Louisiana

  45. [45]

    Li, L., Zhou, K., Xue, G.-R., Zha, H., & Yu, Y. (2009). Enhanci ng diversity, coverage and balance for summarization through structure learning. In Proceedings of the 18th international Conference on World Wide Web , pp. 71–80, Madrid, Spain

  46. [46]

    Li, P., Bing, L., & Lam, W. (2018a). Actor-critic based train ing framework for abstractive summarization. CoRR, abs/1803.11070

  47. [47]

    Li, W., Xiao, X., Lyu, Y., & Wang, Y. (2018b). Improving neura l abstractive document summarization with explicit information selection modeli ng. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Pro cessing, pp. 1787– 1796, Brussels, Belgium

  48. [48]

    Y., & Hovy, E

    Lin, C. Y., & Hovy, E. (2003). Automatic evaluation of summar ies using n-gram co- occurrence statistics. In Proceedings of the 2003 Human Language Technology Confer- ence of the North American Chapter of the Association for Compu tational Linguistics , pp. 71–78, Edmonton, Canada

  49. [49]

    J., Saleh, M., Pot, E., Goodrich, B., Sepassi, R., Kai ser, L., & Shazeer, N

    Liu, P. J., Saleh, M., Pot, E., Goodrich, B., Sepassi, R., Kai ser, L., & Shazeer, N. (2018). Generating wikipedia by summarizing long sequences. In Proceedings of the 6th In- ternational Conference on Learning Representations , Vancouver Canada

  50. [50]

    Liu, Y. (2019). Fine-tune BERT for extractive summarizatio n. CoRR, abs/1903.10318

  51. [51]

    J., Flynn, T

    Louviere, J. J., Flynn, T. N., & Marley, A. A. J. (2015). Best-worst scaling: Theory, methods and applications. Cambridge University Press

  52. [52]

    J., & Woodworth, G

    Louviere, J. J., & Woodworth, G. G. (1991). Best-worst scali ng: A model for the largest difference judgments. University of Alberta: Working Paper , -

  53. [53]

    Mani, I. (2001). Automatic Summarization . Natural language processing. John Benjamins Publishing Company

  54. [54]

    Martins, A., & Smith, N. A. (2009). Summarization with a join t model for sentence extrac- tion and compression. In Proceedings of the Workshop on Integer Linear Programming for Natural Language Processing , pp. 1–9, Boulder, Colorado

  55. [55]

    Mendes, A., Narayan, S., Miranda, S., Marinho, Z., Martins, A. F. T., & Cohen, S. B. (2019). Jointly extracting and compressing documents with summary state repre- sentations. In Proceedings of the 2019 Conference of the North American Chapt er of the Association for Computational Linguistics: Human Lang uage Technologies, pp. 3955–3966, Minneapolis, Minnesota

  56. [56]

    Mihalcea, R., & Tarau, P. (2004). TextRank: Bringing order i nto texts. In Proceedings of the 2004 Conference on Empirical Methods in Natural Langua ge Processing, pp. 404–411, Barcelona, Spain

  57. [57]

    Mikolov, T., & Zweig, G. (2012). Context dependent recurren t neural network language model. In Proceedings of the Spoken Language Technology Workshop , pp. 234–239. IEEE. 33

  58. [58]

    Nallapati, R., Zhai, F., & Zhou, B. (2017). SummaRuNNer: A re current neural network based sequence model for extractive summarization of docum ents. In Proceedings of the 31st AAAI Conference on Artificial Intelligence , pp. 3075–3081, San Francisco, California USA

  59. [59]

    d., Gulcehre, C., & Xiang , B

    Nallapati, R., Zhou, B., Santos, C. d., Gulcehre, C., & Xiang , B. (2016). Abstractive text summarization using sequence-to-sequence RNNs and beyond . In Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learn ing, pp. 280–290,

  60. [60]

    Napoles, C., Gormley, M., & Van Durme, B. (2012). Annotated g igaword. In Proceedings of the Joint Workshop on Automatic Knowledge Base Constructi on and Web-scale Knowledge Extraction at NAACL , pp. 95–100, Montreal, Canada

  61. [61]

    Neural Extractive Summarization with Side Information

    Narayan, S., Papasarantopoulos, N., Cohen, S. B., & Lapata, M. (2017). Neural extractive summarization with side information. CoRR, abs/1704.04530

  62. [62]

    Nenkova, A. (2005). Automatic text summarization of newswi re: Lessons learned from the Document Understanding Conference. In Proceedings of the 29th National Conference on Artificial Intelligence , pp. 1436–1441, Pittsburgh, Pennsylvania, USA

  63. [63]

    Nenkova, A., & McKeown, K. (2011). Automatic summarization . Foundations and Trends in Information Retrieval , 5 (2–3), 103–233

  64. [64]

    Nenkova, A., Vanderwende, L., & McKeown, K. (2006). A compos itional context sensitive multi-document summarizer: Exploring the factors that infl uence summarization. In Proceedings of the 29th Annual International ACM SIGIR Conferenc e on Research and Development in Information Retrieval , pp. 573–580, Seattle, Washington, USA

  65. [65]

    Over, P., Dang, H., & Harman, D. (2007). Duc in context. Information Processing and Management, 43 (6), 1506–1520

  66. [66]

    Parveen, D., Ramsl, H.-M., & Strube, M. (2015). Topical cohe rence for graph-based extrac- tive summarization. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1949–1954, Lisbon, Portugal

  67. [67]

    Pascanu, R., Mikolov, T., & Bengio, Y. (2013). On the difficult y of training recurrent neural networks. In Proceedings of the 30th International Conference on Internati onal Conference on Machine Learning , pp. 1310–1318, Atlanta, GA, USA

  68. [68]

    Pasunuru, R., & Bansal, M. (2018). Multi-reward reinforced summarization with saliency and entailment. In Proceedings of the 16th Annual Conference of the North Americ an Chapter of the Association for Computational Linguistics: Hum an Language Tech- nologies, New Orleans, USA. 34 Topic-A ware Convolutional Neural Networks for Extreme Sum marization

  69. [69]

    Paulus, R., Xiong, C., & Socher, R. (2018). A deep reinforced model for abstractive sum- marization. In Proceedings of the 6th International Conference on Learning R epre- sentations, Vancouver, BC, Canada

  70. [70]

    Perez-Beltrachini, L., Liu, Y., & Lapata, M. (2019). Genera ting summaries with topic templates and structured convolutional decoders. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics , Florence, Italy

  71. [71]

    Peters, M., Neumann, M., Iyyer, M., Gardner, M., Clark, C., L ee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. In Proceedings of the 2018 Confer- ence of the North American Chapter of the Association for Compu tational Linguistics: Human Language Technologies, pp. 2227–2237, New Orleans, Louisiana

  72. [72]

    Moon, T. (2013). Generating extractive summaries of scient ific paradigms. Journal of Artificial Intelligence Research , 46 (1), 165–201

  73. [73]

    Topper, M., Winkel, A., & Zhang, Z. (2004). MEAD — A platform f or multidocument multilingual text summarization. In Proceedings of the 4th International Conference on Language Resources and Evaluation , pp. 699–702, Lisbon, Portugal

  74. [74]

    M., Chopra, S., & Weston, J

    Rush, A. M., Chopra, S., & Weston, J. (2015). A neural attenti on model for abstractive sentence summarization. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing , pp. 379–389, Lisbon, Portugal

  75. [75]

    Sandhaus, E. (2008). The New York Times Annotated Corpus. Linguistic Data Consortium, Philadelphia, 6 (12)

  76. [76]

    Schilder, F., & Kondadadi, R. (2008). FastSum: Fast and accu rate query-based multi- document summarization. In Proceedings of the 45th Annual Meeting of the Associ- ation of Computational Linguistics and HLT: Short Papers , pp. 205–208, Columbus,

  77. [77]

    Schluter, N. (2017). The limits of automatic summarisation according to rouge. In Proceed- ings of the 15th Conference of the European Chapter of the Assoc iation for Compu- tational Linguistics: Short Papers , pp. 41–45, Valencia, Spain

  78. [78]

    J., & Manning, C

    See, A., Liu, P. J., & Manning, C. D. (2017). Get to the point: S ummarization with pointer- generator networks. In Proceedings of the 55th Annual Meeting of the Association fo r Computational Linguistics , pp. 1073–1083, Vancouver, Canada

  79. [79]

    Shen, D., Sun, J.-T., Li, H., Yang, Q., & Chen, Z. (2007). Docu ment summarization using conditional random fields. In Proceedings of the 20th International Joint Conference on Artifical intelligence , pp. 2862–2867, Hyderabad, India

  80. [80]

    Shi, X., Knight, K., & Yuret, D. (2016). Why neural translati ons are the right length. In Proceedings of the 2016 Conference on Empirical Methods in Na tural Language Processing, pp. 2278–2282, Austin, Texas

Showing first 80 references.