Enriching and Controlling Global Semantics for Text Summarization

Anh Tuan Luu; Thong Nguyen; Tho Quan; Truc Lu

arxiv: 2109.10616 · v2 · submitted 2021-09-22 · 💻 cs.CL

Enriching and Controlling Global Semantics for Text Summarization

Thong Nguyen , Anh Tuan Luu , Truc Lu , Tho Quan This is my paper

Pith reviewed 2026-05-24 13:54 UTC · model grok-4.3

classification 💻 cs.CL

keywords text summarizationneural topic modelnormalizing flowglobal semanticsabstractive summarizationtransformercontrol mechanism

0 comments

The pith

A neural topic model using normalizing flow and a control mechanism supplies global semantics to improve text summarization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper attempts to establish that capturing global semantics with a neural topic model empowered by normalizing flow and controlling their integration into the summarizer can address the short-range dependency problem in Transformer models. A reader would care if this leads to summaries that include the main ideas of documents rather than missing them. The method is tested by outperforming previous models on five datasets. It shows a way to combine global topic information with local context in generation tasks.

Core claim

The paper claims that introducing a neural topic model with normalizing flow to capture global semantics of the document, integrated into the summarization model along with a mechanism to control the amount of global semantics supplied to the text generation module, results in outperforming state-of-the-art summarization models on the CNN/DailyMail, XSum, Reddit TIFU, arXiv, and PubMed datasets.

What carries the argument

Neural topic model empowered with normalizing flow to capture global semantics, integrated with a control mechanism to regulate supply to the generation module.

If this is right

Addresses the short-range dependency problem causing summaries to miss key document points.
Produces more informative summaries by enriching with global semantics.
The control mechanism avoids overwhelming contextualized local representations.
Demonstrated effectiveness on five diverse summarization datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Could be applied to other sequence generation tasks to maintain global coherence.
The normalizing flow might allow better modeling of topic distributions in documents.
Future work could explore different ways to control the global-local balance.

Load-bearing premise

The neural topic model with normalizing flow will reliably extract the key global points of a document and the introduced control mechanism will successfully prevent those global signals from overwhelming the contextualized local representations.

What would settle it

If the proposed method does not show performance gains over baselines on the five datasets when the global semantics or control is ablated.

Figures

Figures reproduced from arXiv: 2109.10616 by Anh Tuan Luu, Thong Nguyen, Tho Quan, Truc Lu.

**Figure 1.** Figure 1: Self-attention weights of “commentary” in the PEGASUS model” to the previous generated word. As such, if the main content of the document is out of reach from the generated word, the final summary can miss that key information. For example, in [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: Our overall architecture Our model inherits the Transformer-based architecture. Particularly, it consists of an encoder and decoder. The encoder learns the context of the source text, and the decoder then predicts the target summary, by learning the context of the generated tokens and attending over encoder hidden states. In our case, we make both the encoder and decoder conditioned on the latent topic y… view at source ↗

read the original abstract

Recently, Transformer-based models have been proven effective in the abstractive summarization task by creating fluent and informative summaries. Nevertheless, these models still suffer from the short-range dependency problem, causing them to produce summaries that miss the key points of document. In this paper, we attempt to address this issue by introducing a neural topic model empowered with normalizing flow to capture the global semantics of the document, which are then integrated into the summarization model. In addition, to avoid the overwhelming effect of global semantics on contextualized representation, we introduce a mechanism to control the amount of global semantics supplied to the text generation module. Our method outperforms state-of-the-art summarization models on five common text summarization datasets, namely CNN/DailyMail, XSum, Reddit TIFU, arXiv, and PubMed.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adds a normalizing-flow topic model plus a control gate to inject global semantics into a Transformer summarizer and reports ROUGE gains on five standard datasets.

read the letter

The main point is that they use a neural topic model with normalizing flows to pull out global document semantics, then feed those into the summarizer through an explicit control mechanism that prevents the global signal from swamping local context. This targets the short-range dependency problem in abstractive models and produces better scores on CNN/DailyMail, XSum, Reddit TIFU, arXiv, and PubMed under standard ROUGE evaluation. The architecture and training details line up with the reported numbers, and the control gate is a straightforward addition that makes the integration practical. The results appear to follow from the stated design rather than hidden tuning. One minor soft spot is that the gains rest on the usual benchmark setup, so it would help to see how sensitive they are to seed or to stronger recent baselines, but nothing in the setup looks circular or broken. This is aimed at people working on applied summarization who need a way to bring in topic-level information without breaking fluency. It shows clear thinking about the integration problem and honest use of existing evaluation. I would send it to peer review.

Referee Report

0 major / 4 minor

Summary. The paper proposes enriching Transformer-based abstractive summarizers with global document semantics extracted via a neural topic model augmented by normalizing flows. These global signals are injected into the decoder through a learned control gate that modulates their influence to avoid overwhelming local contextual representations. The method is evaluated on five standard benchmarks (CNN/DailyMail, XSum, Reddit TIFU, arXiv, PubMed) and claims consistent ROUGE improvements over prior state-of-the-art models.

Significance. If the reported gains are reproducible and not artifacts of hyperparameter search, the work offers a concrete mechanism for mitigating short-range dependency limitations in Transformer summarizers by injecting controllable topic-level information. The normalizing-flow topic model and the explicit control gate are the two technical contributions that could be adopted more broadly; the paper supplies the architecture, training details, and result tables needed to assess internal consistency.

minor comments (4)

§3.2: the description of how the normalizing-flow topic posterior is sampled during training versus inference is terse; a short equation or pseudocode block would clarify the reparameterization path.
Table 2–6: the reported ROUGE scores lack confidence intervals or the number of random seeds; adding these would strengthen the claim of consistent outperformance.
§4.3: the ablation that isolates the control gate reports only aggregate ROUGE; a per-dataset breakdown would show whether the gate’s benefit is uniform or dataset-dependent.
Figure 3: the visualization of global-semantic injection would benefit from an explicit legend indicating which attention heads or layers receive the controlled topic vector.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the constructive review and the recommendation of minor revision. The report provides a clear summary of our contributions but does not list any specific major comments requiring response.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper proposes an empirical neural architecture for abstractive summarization that augments a Transformer with a neural topic model (using normalizing flow) plus a control gate for global semantics. The central claims consist of ROUGE-based performance gains on five standard benchmark datasets. No derivation chain, equation, or first-principles result is presented that reduces a prediction to its own fitted inputs by construction. No self-citation load-bearing uniqueness theorems, ansatzes, or renamings of known results appear in the provided text. The reported improvements rest on external evaluation protocols and are therefore not tautological with the model definition.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The central claim rests on two newly introduced components whose effectiveness is asserted but not independently evidenced in the abstract: the normalizing-flow neural topic model and the control mechanism. No free parameters or standard mathematical axioms are identifiable from the abstract alone.

axioms (1)

domain assumption Transformer-based summarizers suffer from the short-range dependency problem that causes them to miss key document points.
Explicitly stated in the first sentence of the abstract as the motivating problem.

invented entities (2)

neural topic model empowered with normalizing flow no independent evidence
purpose: to capture the global semantics of the document
Introduced in the abstract as the core new component for global semantics.
mechanism to control the amount of global semantics supplied to the text generation module no independent evidence
purpose: to avoid the overwhelming effect of global semantics on contextualized representation
Introduced in the abstract to balance the integration of the topic model output.

pith-pipeline@v0.9.0 · 5668 in / 1354 out tokens · 26597 ms · 2026-05-24T13:54:15.537044+00:00 · methodology

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Gradient-Boosted Decision Tree for Listwise Context Model in Multimodal Review Helpfulness Prediction
cs.CL 2023-05 unverdicted novelty 5.0

Introduces listwise attention, listwise loss, and GBDT predictor to improve multimodal review helpfulness ranking over prior FCNN and pairwise approaches.
Adaptive Contrastive Learning on Multimodal Transformer for Review Helpfulness Predictions
cs.CL 2022-11 unverdicted novelty 5.0

Multimodal contrastive learning with adaptive weighting and interaction module achieves state-of-the-art results on two MRHP benchmark datasets.

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages · cited by 2 Pith papers · 17 internal anchors

[1]

URL: " 'urlintro :=

ENTRY address author booktitle chapter edition editor howpublished institution journal key month note number organization pages publisher school series title type volume year eprint doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRINGS urlintro eprinturl eprintpr...

work page
[2]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page
[3]

Melissa Ailem, Bowen Zhang, and Fei Sha. 2019. Topic augmented generator for abstractive summarization. arXiv preprint arXiv:1908.07026

work page arXiv 2019
[4]

David M Blei, Alp Kucukelbir, and Jon D McAuliffe. 2017. Variational inference: A review for statisticians. Journal of the American statistical Association, 112(518):859--877

work page 2017
[5]

Rishi Bommasani and Claire Cardie. 2020. Intrinsic evaluation of summarization datasets. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 8075--8096

work page 2020
[6]

Jiaao Chen and Diyi Yang. 2020. Multi-view sequence-to-sequence models with conversational structure for abstractive dialogue summarization. arXiv preprint arXiv:2010.01672

work page arXiv 2020
[7]

Kyunghyun Cho, Bart Van Merri \"e nboer, Dzmitry Bahdanau, and Yoshua Bengio. 2014. On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259

work page internal anchor Pith review Pith/arXiv arXiv 2014
[8]

Arman Cohan, Franck Dernoncourt, Doo Soon Kim, Trung Bui, Seokhwan Kim, Walter Chang, and Nazli Goharian. 2018. A discourse-aware attention model for abstractive summarization of long documents. arXiv preprint arXiv:1804.05685

work page internal anchor Pith review Pith/arXiv arXiv 2018
[9]

Ran Ding, Ramesh Nallapati, and Bing Xiang. 2018. Coherence-aware neural topic modeling. arXiv preprint arXiv:1809.02687

work page internal anchor Pith review Pith/arXiv arXiv 2018
[10]

Li Dong, Nan Yang, Wenhui Wang, Furu Wei, Xiaodong Liu, Yu Wang, Jianfeng Gao, Ming Zhou, and Hsiao-Wuen Hon. 2019. Unified language model pre-training for natural language understanding and generation. arXiv preprint arXiv:1905.03197

work page arXiv 2019
[11]

Xiyan Fu, Jun Wang, Jinghan Zhang, Jinmao Wei, and Zhenglu Yang. 2020. Document summarization with vhtm: Variational hierarchical topic-aware mechanism. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 7740--7747

work page 2020
[12]

Sebastian Gehrmann, Yuntian Deng, and Alexander M Rush. 2018. Bottom-up abstractive summarization. arXiv preprint arXiv:1808.10792

work page internal anchor Pith review Pith/arXiv arXiv 2018
[13]

Karl Moritz Hermann, Tom \'a s Ko c isk \`y , Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom. 2015. Teaching machines to read and comprehend. arXiv preprint arXiv:1506.03340

work page internal anchor Pith review Pith/arXiv arXiv 2015
[14]

Matthew D Hoffman, David M Blei, Chong Wang, and John Paisley. 2013. Stochastic variational inference. Journal of Machine Learning Research, 14(5)

work page 2013
[15]

Ruipeng Jia, Yanan Cao, Hengzhu Tang, Fang Fang, Cong Cao, and Shi Wang. 2020. Neural extractive summarization with hierarchical attentive heterogeneous graph network. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 3622--3631

work page 2020
[16]

Byeongchang Kim, Hyunwoo Kim, and Gunhee Kim. 2018. Abstractive summarization of reddit posts with multi-level memory networks. arXiv preprint arXiv:1811.00783

work page internal anchor Pith review Pith/arXiv arXiv 2018
[17]

Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114

work page internal anchor Pith review Pith/arXiv arXiv 2013
[18]

Durk P Kingma, Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, and Max Welling. 2016. Improved variational inference with inverse autoregressive flow. Advances in neural information processing systems, 29:4743--4751

work page 2016
[19]

Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, and Luke Zettlemoyer. 2019. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461

work page internal anchor Pith review Pith/arXiv arXiv 2019
[20]

Xianming Li, Zongxi Li, Yingbin Zhao, Haoran Xie, and Qing Li. 2020. Incorporating effective global information via adaptive gate attention for text classification. arXiv preprint arXiv:2002.09673

work page arXiv 2020
[21]

Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pages 74--81

work page 2004
[22]

Yang Liu. 2019. Fine-tune bert for extractive summarization. arXiv preprint arXiv:1903.10318

work page arXiv 2019
[23]

Yang Liu and Mirella Lapata. 2019. Text summarization with pretrained encoders. arXiv preprint arXiv:1908.08345

work page arXiv 2019
[24]

Zhengyuan Liu, Angela Ng, Sheldon Lee, Ai Ti Aw, and Nancy F Chen. 2019. Topic-aware pointer-generator networks for summarizing spoken conversations. In 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pages 814--821. IEEE

work page 2019
[25]

Yishu Miao, Edward Grefenstette, and Phil Blunsom. 2017. Discovering discrete latent topics with neural variational inference. In International Conference on Machine Learning, pages 2410--2419. PMLR

work page 2017
[26]

Shashi Narayan, Shay B Cohen, and Mirella Lapata. 2018. Don't give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization. arXiv preprint arXiv:1808.08745

work page internal anchor Pith review Pith/arXiv arXiv 2018
[27]

Shashi Narayan, Joshua Maynez, Jakub Adamek, Daniele Pighin, Bla z Bratani c , and Ryan McDonald. 2020. Stepwise extractive summarization and planning with structured transformers. arXiv preprint arXiv:2010.02744

work page arXiv 2020
[28]

Romain Paulus, Caiming Xiong, and Richard Socher. 2017. A deep reinforced model for abstractive summarization. arXiv preprint arXiv:1705.04304

work page internal anchor Pith review Pith/arXiv arXiv 2017
[29]

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2019. Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683

work page internal anchor Pith review Pith/arXiv arXiv 2019
[30]

Danilo Rezende and Shakir Mohamed. 2015. Variational inference with normalizing flows. In International Conference on Machine Learning, pages 1530--1538. PMLR

work page 2015
[31]

Abigail See, Peter J Liu, and Christopher D Manning. 2017. Get to the point: Summarization with pointer-generator networks. arXiv preprint arXiv:1704.04368

work page internal anchor Pith review Pith/arXiv arXiv 2017
[32]

Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, and Tie-Yan Liu. 2019. Mass: Masked sequence to sequence pre-training for language generation. arXiv preprint arXiv:1905.02450

work page internal anchor Pith review Pith/arXiv arXiv 2019
[33]

Sandeep Subramanian, Raymond Li, Jonathan Pilault, and Christopher Pal. 2019. On extractive and abstractive neural document summarization with transformer language models. arXiv preprint arXiv:1909.03186

work page arXiv 2019
[34]

Ian Tenney, Patrick Xia, Berlin Chen, Alex Wang, Adam Poliak, R Thomas McCoy, Najoung Kim, Benjamin Van Durme, Samuel R Bowman, Dipanjan Das, et al. 2019. What do you learn from context? probing for sentence structure in contextualized word representations. arXiv preprint arXiv:1905.06316

work page internal anchor Pith review Pith/arXiv arXiv 2019
[35]

Luu Anh Tuan, Darsh Shah, and Regina Barzilay. 2020. Capturing greater context for question generation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 9065--9072

work page 2020
[36]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. arXiv preprint arXiv:1706.03762

work page internal anchor Pith review Pith/arXiv arXiv 2017
[37]

Li Wang, Junlin Yao, Yunzhe Tao, Li Zhong, Wei Liu, and Qiang Du. 2018. A reinforced topic-aware convolutional sequence-to-sequence model for abstractive text summarization. arXiv preprint arXiv:1805.03616

work page arXiv 2018
[38]

Zhengjue Wang, Zhibin Duan, Hao Zhang, Chaojie Wang, Long Tian, Bo Chen, and Mingyuan Zhou. 2020. Friendly topic assistant for transformer based abstractive summarization. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 485--497

work page 2020
[39]

Wen Xiao and Giuseppe Carenini. 2020. Systematically exploring redundancy reduction in summarizing long documents. arXiv preprint arXiv:2012.00052

work page arXiv 2020
[40]

Jichuan Zeng, Jing Li, Yan Song, Cuiyun Gao, Michael R Lyu, and Irwin King. 2018. Topic memory networks for short text classification. arXiv preprint arXiv:1809.03664

work page internal anchor Pith review Pith/arXiv arXiv 2018
[41]

Jingqing Zhang, Yao Zhao, Mohammad Saleh, and Peter Liu. 2020. Pegasus: Pre-training with extracted gap-sentences for abstractive summarization. In International Conference on Machine Learning, pages 11328--11339. PMLR

work page 2020
[42]

Xingxing Zhang, Furu Wei, and Ming Zhou. 2019. Hibert: Document level pre-training of hierarchical bidirectional transformers for document summarization. arXiv preprint arXiv:1905.06566

work page internal anchor Pith review Pith/arXiv arXiv 2019
[43]

Chujie Zheng, Kunpeng Zhang, Harry Jiannan Wang, and Ling Fan. 2020. Topic-aware abstractive text summarization. arXiv preprint arXiv:2010.10323

work page arXiv 2020
[44]

Ming Zhong, Pengfei Liu, Yiran Chen, Danqing Wang, Xipeng Qiu, and Xuanjing Huang. 2020. Extractive summarization as text matching. arXiv preprint arXiv:2004.08795

work page arXiv 2020
[45]

Yanyan Zou, Xingxing Zhang, Wei Lu, Furu Wei, and Ming Zhou. 2020. Pre-training for abstractive document summarization by reinstating source text. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 3646--3660

work page 2020

[1] [1]

URL: " 'urlintro :=

ENTRY address author booktitle chapter edition editor howpublished institution journal key month note number organization pages publisher school series title type volume year eprint doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRINGS urlintro eprinturl eprintpr...

work page

[2] [2]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page

[3] [3]

Melissa Ailem, Bowen Zhang, and Fei Sha. 2019. Topic augmented generator for abstractive summarization. arXiv preprint arXiv:1908.07026

work page arXiv 2019

[4] [4]

David M Blei, Alp Kucukelbir, and Jon D McAuliffe. 2017. Variational inference: A review for statisticians. Journal of the American statistical Association, 112(518):859--877

work page 2017

[5] [5]

Rishi Bommasani and Claire Cardie. 2020. Intrinsic evaluation of summarization datasets. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 8075--8096

work page 2020

[6] [6]

Jiaao Chen and Diyi Yang. 2020. Multi-view sequence-to-sequence models with conversational structure for abstractive dialogue summarization. arXiv preprint arXiv:2010.01672

work page arXiv 2020

[7] [7]

Kyunghyun Cho, Bart Van Merri \"e nboer, Dzmitry Bahdanau, and Yoshua Bengio. 2014. On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259

work page internal anchor Pith review Pith/arXiv arXiv 2014

[8] [8]

Arman Cohan, Franck Dernoncourt, Doo Soon Kim, Trung Bui, Seokhwan Kim, Walter Chang, and Nazli Goharian. 2018. A discourse-aware attention model for abstractive summarization of long documents. arXiv preprint arXiv:1804.05685

work page internal anchor Pith review Pith/arXiv arXiv 2018

[9] [9]

Ran Ding, Ramesh Nallapati, and Bing Xiang. 2018. Coherence-aware neural topic modeling. arXiv preprint arXiv:1809.02687

work page internal anchor Pith review Pith/arXiv arXiv 2018

[10] [10]

Li Dong, Nan Yang, Wenhui Wang, Furu Wei, Xiaodong Liu, Yu Wang, Jianfeng Gao, Ming Zhou, and Hsiao-Wuen Hon. 2019. Unified language model pre-training for natural language understanding and generation. arXiv preprint arXiv:1905.03197

work page arXiv 2019

[11] [11]

Xiyan Fu, Jun Wang, Jinghan Zhang, Jinmao Wei, and Zhenglu Yang. 2020. Document summarization with vhtm: Variational hierarchical topic-aware mechanism. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 7740--7747

work page 2020

[12] [12]

Sebastian Gehrmann, Yuntian Deng, and Alexander M Rush. 2018. Bottom-up abstractive summarization. arXiv preprint arXiv:1808.10792

work page internal anchor Pith review Pith/arXiv arXiv 2018

[13] [13]

Karl Moritz Hermann, Tom \'a s Ko c isk \`y , Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom. 2015. Teaching machines to read and comprehend. arXiv preprint arXiv:1506.03340

work page internal anchor Pith review Pith/arXiv arXiv 2015

[14] [14]

Matthew D Hoffman, David M Blei, Chong Wang, and John Paisley. 2013. Stochastic variational inference. Journal of Machine Learning Research, 14(5)

work page 2013

[15] [15]

Ruipeng Jia, Yanan Cao, Hengzhu Tang, Fang Fang, Cong Cao, and Shi Wang. 2020. Neural extractive summarization with hierarchical attentive heterogeneous graph network. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 3622--3631

work page 2020

[16] [16]

Byeongchang Kim, Hyunwoo Kim, and Gunhee Kim. 2018. Abstractive summarization of reddit posts with multi-level memory networks. arXiv preprint arXiv:1811.00783

work page internal anchor Pith review Pith/arXiv arXiv 2018

[17] [17]

Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114

work page internal anchor Pith review Pith/arXiv arXiv 2013

[18] [18]

Durk P Kingma, Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, and Max Welling. 2016. Improved variational inference with inverse autoregressive flow. Advances in neural information processing systems, 29:4743--4751

work page 2016

[19] [19]

Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, and Luke Zettlemoyer. 2019. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461

work page internal anchor Pith review Pith/arXiv arXiv 2019

[20] [20]

Xianming Li, Zongxi Li, Yingbin Zhao, Haoran Xie, and Qing Li. 2020. Incorporating effective global information via adaptive gate attention for text classification. arXiv preprint arXiv:2002.09673

work page arXiv 2020

[21] [21]

Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pages 74--81

work page 2004

[22] [22]

Yang Liu. 2019. Fine-tune bert for extractive summarization. arXiv preprint arXiv:1903.10318

work page arXiv 2019

[23] [23]

Yang Liu and Mirella Lapata. 2019. Text summarization with pretrained encoders. arXiv preprint arXiv:1908.08345

work page arXiv 2019

[24] [24]

Zhengyuan Liu, Angela Ng, Sheldon Lee, Ai Ti Aw, and Nancy F Chen. 2019. Topic-aware pointer-generator networks for summarizing spoken conversations. In 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pages 814--821. IEEE

work page 2019

[25] [25]

Yishu Miao, Edward Grefenstette, and Phil Blunsom. 2017. Discovering discrete latent topics with neural variational inference. In International Conference on Machine Learning, pages 2410--2419. PMLR

work page 2017

[26] [26]

Shashi Narayan, Shay B Cohen, and Mirella Lapata. 2018. Don't give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization. arXiv preprint arXiv:1808.08745

work page internal anchor Pith review Pith/arXiv arXiv 2018

[27] [27]

Shashi Narayan, Joshua Maynez, Jakub Adamek, Daniele Pighin, Bla z Bratani c , and Ryan McDonald. 2020. Stepwise extractive summarization and planning with structured transformers. arXiv preprint arXiv:2010.02744

work page arXiv 2020

[28] [28]

Romain Paulus, Caiming Xiong, and Richard Socher. 2017. A deep reinforced model for abstractive summarization. arXiv preprint arXiv:1705.04304

work page internal anchor Pith review Pith/arXiv arXiv 2017

[29] [29]

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2019. Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683

work page internal anchor Pith review Pith/arXiv arXiv 2019

[30] [30]

Danilo Rezende and Shakir Mohamed. 2015. Variational inference with normalizing flows. In International Conference on Machine Learning, pages 1530--1538. PMLR

work page 2015

[31] [31]

Abigail See, Peter J Liu, and Christopher D Manning. 2017. Get to the point: Summarization with pointer-generator networks. arXiv preprint arXiv:1704.04368

work page internal anchor Pith review Pith/arXiv arXiv 2017

[32] [32]

Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, and Tie-Yan Liu. 2019. Mass: Masked sequence to sequence pre-training for language generation. arXiv preprint arXiv:1905.02450

work page internal anchor Pith review Pith/arXiv arXiv 2019

[33] [33]

Sandeep Subramanian, Raymond Li, Jonathan Pilault, and Christopher Pal. 2019. On extractive and abstractive neural document summarization with transformer language models. arXiv preprint arXiv:1909.03186

work page arXiv 2019

[34] [34]

Ian Tenney, Patrick Xia, Berlin Chen, Alex Wang, Adam Poliak, R Thomas McCoy, Najoung Kim, Benjamin Van Durme, Samuel R Bowman, Dipanjan Das, et al. 2019. What do you learn from context? probing for sentence structure in contextualized word representations. arXiv preprint arXiv:1905.06316

work page internal anchor Pith review Pith/arXiv arXiv 2019

[35] [35]

Luu Anh Tuan, Darsh Shah, and Regina Barzilay. 2020. Capturing greater context for question generation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 9065--9072

work page 2020

[36] [36]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. arXiv preprint arXiv:1706.03762

work page internal anchor Pith review Pith/arXiv arXiv 2017

[37] [37]

Li Wang, Junlin Yao, Yunzhe Tao, Li Zhong, Wei Liu, and Qiang Du. 2018. A reinforced topic-aware convolutional sequence-to-sequence model for abstractive text summarization. arXiv preprint arXiv:1805.03616

work page arXiv 2018

[38] [38]

Zhengjue Wang, Zhibin Duan, Hao Zhang, Chaojie Wang, Long Tian, Bo Chen, and Mingyuan Zhou. 2020. Friendly topic assistant for transformer based abstractive summarization. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 485--497

work page 2020

[39] [39]

Wen Xiao and Giuseppe Carenini. 2020. Systematically exploring redundancy reduction in summarizing long documents. arXiv preprint arXiv:2012.00052

work page arXiv 2020

[40] [40]

Jichuan Zeng, Jing Li, Yan Song, Cuiyun Gao, Michael R Lyu, and Irwin King. 2018. Topic memory networks for short text classification. arXiv preprint arXiv:1809.03664

work page internal anchor Pith review Pith/arXiv arXiv 2018

[41] [41]

Jingqing Zhang, Yao Zhao, Mohammad Saleh, and Peter Liu. 2020. Pegasus: Pre-training with extracted gap-sentences for abstractive summarization. In International Conference on Machine Learning, pages 11328--11339. PMLR

work page 2020

[42] [42]

Xingxing Zhang, Furu Wei, and Ming Zhou. 2019. Hibert: Document level pre-training of hierarchical bidirectional transformers for document summarization. arXiv preprint arXiv:1905.06566

work page internal anchor Pith review Pith/arXiv arXiv 2019

[43] [43]

Chujie Zheng, Kunpeng Zhang, Harry Jiannan Wang, and Ling Fan. 2020. Topic-aware abstractive text summarization. arXiv preprint arXiv:2010.10323

work page arXiv 2020

[44] [44]

Ming Zhong, Pengfei Liu, Yiran Chen, Danqing Wang, Xipeng Qiu, and Xuanjing Huang. 2020. Extractive summarization as text matching. arXiv preprint arXiv:2004.08795

work page arXiv 2020

[45] [45]

Yanyan Zou, Xingxing Zhang, Wei Lu, Furu Wei, and Ming Zhou. 2020. Pre-training for abstractive document summarization by reinstating source text. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 3646--3660

work page 2020