WriterForcing: Generating more interesting story endings

Alan W Black; Mukul Bhutani; Prakhar Gupta; Vinayshekhar Bannihatti Kumar

arxiv: 1907.08259 · v1 · pith:56DCKPLInew · submitted 2019-07-18 · 💻 cs.LG · cs.CL· stat.ML

WriterForcing: Generating more interesting story endings

Prakhar Gupta , Vinayshekhar Bannihatti Kumar , Mukul Bhutani , Alan W Black This is my paper

Pith reviewed 2026-05-24 19:36 UTC · model grok-4.3

classification 💻 cs.LG cs.CLstat.ML

keywords story ending generationsequence to sequencetext diversitykeyphrase attentionneural text generationwriter forcing

0 comments

The pith

Seq2Seq models trained to focus on story keyphrases and non-generic words produce more diverse endings.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard sequence-to-sequence models for story continuation often ignore context and default to generic endings. The paper tests two targeted training changes: forcing the model to attend to salient keyphrases from the given story prefix and explicitly encouraging less common vocabulary. When applied together these adjustments yield endings that human raters judge both more varied and more interesting than those from unmodified baselines.

Core claim

Training models to focus attention on important keyphrases of the story and promoting generation of non-generic words leads to more diverse and interesting story endings.

What carries the argument

WriterForcing, a training procedure that combines keyphrase-guided attention with explicit promotion of non-generic words inside a sequence-to-sequence generator.

If this is right

The two modifications together increase measured diversity of generated endings relative to unmodified seq2seq training.
The same combination increases human ratings of ending interestingness.
Keyphrase attention alone helps the model stay grounded in the supplied story context.
Penalizing generic words alone reduces the tendency toward dull, high-probability continuations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same pair of training signals could be tested on other conditional generation tasks such as dialogue response or news continuation.
If the gains hold, post-hoc diversity techniques such as nucleus sampling might become less necessary.
The approach leaves open whether similar gains appear when the input is longer or drawn from different genres.

Load-bearing premise

Directing attention to keyphrases and discouraging generic words will improve human judgments of interest and diversity without harming coherence or overall quality.

What would settle it

A controlled human evaluation in which the modified models receive equal or lower scores than standard seq2seq models on interest, diversity, or coherence.

read the original abstract

We study the problem of generating interesting endings for stories. Neural generative models have shown promising results for various text generation problems. Sequence to Sequence (Seq2Seq) models are typically trained to generate a single output sequence for a given input sequence. However, in the context of a story, multiple endings are possible. Seq2Seq models tend to ignore the context and generate generic and dull responses. Very few works have studied generating diverse and interesting story endings for a given story context. In this paper, we propose models which generate more diverse and interesting outputs by 1) training models to focus attention on important keyphrases of the story, and 2) promoting generation of non-generic words. We show that the combination of the two leads to more diverse and interesting endings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper offers a straightforward fix for generic story endings via keyphrase attention and non-generic word promotion, with no major internal contradictions visible.

read the letter

The main thing to know is that the authors target the known tendency of seq2seq models to produce dull, context-ignoring endings for stories. They propose training the model to focus attention on important keyphrases from the input and to favor less common words, claiming the combination yields more diverse and interesting outputs. This is presented as filling a gap since few prior works have focused specifically on diverse story endings. The idea is a direct extension of existing attention and decoding adjustments rather than a new architecture. The paper does a clear job stating the problem and why standard training falls short on variety. The two changes are simple enough that they could be implemented on top of common models without major overhead. Soft spots are mostly about missing details rather than flaws in the logic. The abstract states the claim but gives no datasets, metrics, baselines, or human eval results, so it is impossible to judge whether the outputs improve on interest or diversity without losing coherence. The central assumption that these modifications will deliver better judged quality is reasonable on its face but untested in the text provided. There is no circular reasoning or invented entities. This work is aimed at researchers in neural text generation who care about output diversity in creative tasks like story or dialogue continuation. Someone already running seq2seq experiments might try the tweaks to see if they help on their data. I would recommend sending the paper for peer review. The motivation is solid and the proposal is modest, so referees can assess the experiments on their merits even if revisions are needed.

Referee Report

0 major / 1 minor

Summary. The manuscript proposes WriterForcing, a training modification for Seq2Seq models that combines (1) forcing attention on important keyphrases from the story context and (2) promoting generation of non-generic words, with the central claim that this combination yields more diverse and interesting story endings than standard training.

Significance. If the empirical results hold, the method offers a lightweight way to mitigate the generic-response problem that is well-documented in neural story generation; the two components are simple to implement and could be adopted as a baseline or ablation target in future work on creative text generation.

minor comments (1)

[Abstract] Abstract: the central claim is stated without any mention of datasets, metrics (e.g., diversity scores, human evaluation criteria), baselines, or quantitative results, which prevents assessment of whether the evidence supports the claim.

Simulated Author's Rebuttal

0 responses · 1 unresolved

We thank the referee for reviewing our manuscript. The provided summary accurately captures the core idea of WriterForcing as a combination of keyphrase attention and non-generic word promotion to improve story ending diversity. The recommendation is listed as uncertain, but the report contains no enumerated major comments following the 'MAJOR COMMENTS:' heading. We therefore have no specific points to rebut point-by-point and stand ready to address any additional feedback the referee may wish to supply.

standing simulated objections not resolved

No specific major comments were supplied in the referee report, preventing any point-by-point response.

Circularity Check

0 steps flagged

No significant circularity; empirical claims rest on external evaluation

full rationale

The paper proposes two training modifications (keyphrase attention focus and non-generic word promotion) for story-ending generation and claims their combination yields more diverse/interesting outputs. This is presented as an empirical result evaluated on the target task, with no equations, fitted parameters renamed as predictions, self-definitional loops, or load-bearing self-citations visible in the provided text. The central claim does not reduce to its inputs by construction and is supported by human evaluation rather than internal redefinition.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review based on abstract only; no specific free parameters, axioms, or invented entities are detailed in the provided text. The approach implicitly relies on standard assumptions about neural text generation.

axioms (1)

domain assumption Seq2Seq models can be trained with modified attention and loss functions to improve output diversity and interestingness.
The proposal depends on this assumption about what modified training achieves.

pith-pipeline@v0.9.0 · 5670 in / 1240 out tokens · 29224 ms · 2026-05-24T19:36:45.873303+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages · 11 internal anchors

[1]

Neural Machine Translation by Jointly Learning to Align and Translate

Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473. Ashutosh Baheti, Alan Ritter, Jiwei Li, and Bill Dolan

work page internal anchor Pith review Pith/arXiv arXiv
[2]

In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Pro- cessing, pages 3970–3980, Brussels, Belgium

Generating more interesting responses in neural conversation models with distributional constraints. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Pro- cessing, pages 3970–3980, Brussels, Belgium. Asso- ciation for Computational Linguistics. J. S. Chen, Jiaao Chen, and Zhou Yu

work page 2018
[3]

Incorporating Structured Commonsense Knowledge in Story Completion

Incor- porating structured commonsense knowledge in story completion. CoRR, abs/1811.00625. Elizabeth Clark, Yangfeng Ji, and Noah A Smith

work page internal anchor Pith review Pith/arXiv arXiv
[4]

Neural text generation in stories using entity rep- resentations as context. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language T echnologies, Volume 1 (Long Papers) , pages 2250–2260. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina T outanova

work page 2018
[5]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Bert: Pre-training of deep bidirectional transformers for language un- derstanding. arXiv preprint arXiv:1810.04805. Carl Doersch

work page internal anchor Pith review Pith/arXiv arXiv
[6]

Tutorial on variational autoencoders,

Tutorial on variational autoen- coders. arXiv preprint arXiv:1606.05908. Angela Fan, Mike Lewis, and Yann Dauphin

work page arXiv
[7]

Hierarchical Neural Story Generation

Hierarchical neural story generation. arXiv preprint arXiv:1805.04833. Jian Guan, Yansen Wang, and Minlie Huang

work page internal anchor Pith review Pith/arXiv arXiv
[9]

Story Ending Generation with Incremental Encoding and Commonsense Knowledge

Story ending generation with incremental en- coding and commonsense knowledge. CoRR, abs/1808.10113. Zhiting Hu, Zichao Yang, Xiaodan Liang, Ruslan Salakhutdinov , and Eric P . Xing. 2017a. T oward controlled generation of text. In Proceedings of the 34th International Conference on Machine Learn- ing, volume 70 of Proceedings of Machine Learning Research...

work page internal anchor Pith review Pith/arXiv arXiv
[10]

Hierarchically Structured Reinforcement Learning for Topically Coherent Visual Story Generation

Hi- erarchically structured reinforcement learning for topically coherent visual story generation. arXiv preprint arXiv:1805.08191. Parag Jain, Priyanka Agrawal, Abhijit Mishra, Mohak Sukhwani, Anirban Laha, and Karthik Sankara- narayanan

work page internal anchor Pith review Pith/arXiv arXiv
[11]

Story Generation from Sequence of Independent Short Descriptions

Story generation from sequence of independent short descriptions. arXiv preprint arXiv:1707.05501. Jiwei Li, Michel Galley, Chris Brockett, Jianfeng Gao, and Bill Dolan. 2016a. A diversity-promoting ob- jective function for neural conversation models. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computa- tiona...

work page internal anchor Pith review Pith/arXiv arXiv 2016
[12]

In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing , pages 2157–2169, Copenhagen, Denmark

Adversar- ial learning for neural dialogue generation. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing , pages 2157–2169, Copenhagen, Denmark. Association for Computational Linguistics. Zhongyang Li, Xiao Ding, and Ting Liu

work page 2017
[13]

In Pro- ceedings of the 2nd Workshop on Linking Models of Lexical, Sentential and Discourse-level Semantics , pages 46–51

Lsd- sem 2017 shared task: The story cloze test. In Pro- ceedings of the 2nd Workshop on Linking Models of Lexical, Sentential and Discourse-level Semantics , pages 46–51. Ryo Nakamura, Katsuhito Sudoh, Koichiro Yoshino, and Satoshi Nakamura

work page 2017
[14]

Another Diversity-Promoting Objective Function for Neural Dialogue Generation

Another diversity- promoting objective function for neural dialogue generation. arXiv preprint arXiv:1811.08100. Nanyun Peng, Marjan Ghazvininejad, Jonathan May, and Kevin Knight

work page internal anchor Pith review Pith/arXiv arXiv
[15]

In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing , pages 2210–2219, Copenhagen, Denmark

Gen- erating high-quality and informative conversation responses with sequence-to-sequence models. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing , pages 2210–2219, Copenhagen, Denmark. Association for Computational Linguistics. Ilya Sutskever, Oriol Vinyals, and Quoc V Le

work page 2017
[16]

Diverse beam search for improved description of com- plex scenes. In Proceedings of the Thirty-Second AAAI Conference on Artiﬁcial Intelligence, (AAAI- 18), the 30th innovative Applications of Artiﬁcial Intelligence (IAAI-18), and the 8th AAAI Sympo- sium on Educational Advances in Artiﬁcial Intel- ligence (EAAI-18), New Orleans, Louisiana, USA, February ...

work page 2018
[17]

Topic Aware Neural Response Generation

T opic augmented neural response generation with a joint attention mechanism. arXiv preprint arXiv:1606.08340, 2(2). Jingjing Xu, Xuancheng Ren, Junyang Lin, and Xu Sun

work page internal anchor Pith review Pith/arXiv arXiv
[18]

In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3940–3949

Diversity-promoting gan: A cross- entropy based generative adversarial network for diversiﬁed text generation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3940–3949. Lili Yao, Nanyun Peng, Weischedel Ralph, Kevin Knight, Dongyan Zhao, and Rui Yan

work page 2018
[19]

Plan-And-Write: Towards Better Automatic Storytelling

Plan- and-write: T owards better automatic storytelling. arXiv preprint arXiv:1811.05701. Saizheng Zhang, Emily Dinan, Jack Urbanek, Arthur Szlam, Douwe Kiela, and Jason Weston

work page internal anchor Pith review Pith/arXiv arXiv
[20]

Personalizing Dialogue Agents: I have a dog, do you have pets too?

Per- sonalizing dialogue agents: I have a dog, do you have pets too? arXiv preprint arXiv:1801.07243. Hao Zhou, Minlie Huang, Tianyang Zhang, Xiaoyan Zhu, and Bing Liu

work page internal anchor Pith review Pith/arXiv arXiv

[1] [1]

Neural Machine Translation by Jointly Learning to Align and Translate

Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473. Ashutosh Baheti, Alan Ritter, Jiwei Li, and Bill Dolan

work page internal anchor Pith review Pith/arXiv arXiv

[2] [2]

In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Pro- cessing, pages 3970–3980, Brussels, Belgium

Generating more interesting responses in neural conversation models with distributional constraints. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Pro- cessing, pages 3970–3980, Brussels, Belgium. Asso- ciation for Computational Linguistics. J. S. Chen, Jiaao Chen, and Zhou Yu

work page 2018

[3] [3]

Incorporating Structured Commonsense Knowledge in Story Completion

Incor- porating structured commonsense knowledge in story completion. CoRR, abs/1811.00625. Elizabeth Clark, Yangfeng Ji, and Noah A Smith

work page internal anchor Pith review Pith/arXiv arXiv

[4] [4]

Neural text generation in stories using entity rep- resentations as context. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language T echnologies, Volume 1 (Long Papers) , pages 2250–2260. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina T outanova

work page 2018

[5] [5]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Bert: Pre-training of deep bidirectional transformers for language un- derstanding. arXiv preprint arXiv:1810.04805. Carl Doersch

work page internal anchor Pith review Pith/arXiv arXiv

[6] [6]

Tutorial on variational autoencoders,

Tutorial on variational autoen- coders. arXiv preprint arXiv:1606.05908. Angela Fan, Mike Lewis, and Yann Dauphin

work page arXiv

[7] [7]

Hierarchical Neural Story Generation

Hierarchical neural story generation. arXiv preprint arXiv:1805.04833. Jian Guan, Yansen Wang, and Minlie Huang

work page internal anchor Pith review Pith/arXiv arXiv

[8] [9]

Story Ending Generation with Incremental Encoding and Commonsense Knowledge

Story ending generation with incremental en- coding and commonsense knowledge. CoRR, abs/1808.10113. Zhiting Hu, Zichao Yang, Xiaodan Liang, Ruslan Salakhutdinov , and Eric P . Xing. 2017a. T oward controlled generation of text. In Proceedings of the 34th International Conference on Machine Learn- ing, volume 70 of Proceedings of Machine Learning Research...

work page internal anchor Pith review Pith/arXiv arXiv

[9] [10]

Hierarchically Structured Reinforcement Learning for Topically Coherent Visual Story Generation

Hi- erarchically structured reinforcement learning for topically coherent visual story generation. arXiv preprint arXiv:1805.08191. Parag Jain, Priyanka Agrawal, Abhijit Mishra, Mohak Sukhwani, Anirban Laha, and Karthik Sankara- narayanan

work page internal anchor Pith review Pith/arXiv arXiv

[10] [11]

Story Generation from Sequence of Independent Short Descriptions

Story generation from sequence of independent short descriptions. arXiv preprint arXiv:1707.05501. Jiwei Li, Michel Galley, Chris Brockett, Jianfeng Gao, and Bill Dolan. 2016a. A diversity-promoting ob- jective function for neural conversation models. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computa- tiona...

work page internal anchor Pith review Pith/arXiv arXiv 2016

[11] [12]

In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing , pages 2157–2169, Copenhagen, Denmark

Adversar- ial learning for neural dialogue generation. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing , pages 2157–2169, Copenhagen, Denmark. Association for Computational Linguistics. Zhongyang Li, Xiao Ding, and Ting Liu

work page 2017

[12] [13]

In Pro- ceedings of the 2nd Workshop on Linking Models of Lexical, Sentential and Discourse-level Semantics , pages 46–51

Lsd- sem 2017 shared task: The story cloze test. In Pro- ceedings of the 2nd Workshop on Linking Models of Lexical, Sentential and Discourse-level Semantics , pages 46–51. Ryo Nakamura, Katsuhito Sudoh, Koichiro Yoshino, and Satoshi Nakamura

work page 2017

[13] [14]

Another Diversity-Promoting Objective Function for Neural Dialogue Generation

Another diversity- promoting objective function for neural dialogue generation. arXiv preprint arXiv:1811.08100. Nanyun Peng, Marjan Ghazvininejad, Jonathan May, and Kevin Knight

work page internal anchor Pith review Pith/arXiv arXiv

[14] [15]

In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing , pages 2210–2219, Copenhagen, Denmark

Gen- erating high-quality and informative conversation responses with sequence-to-sequence models. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing , pages 2210–2219, Copenhagen, Denmark. Association for Computational Linguistics. Ilya Sutskever, Oriol Vinyals, and Quoc V Le

work page 2017

[15] [16]

Diverse beam search for improved description of com- plex scenes. In Proceedings of the Thirty-Second AAAI Conference on Artiﬁcial Intelligence, (AAAI- 18), the 30th innovative Applications of Artiﬁcial Intelligence (IAAI-18), and the 8th AAAI Sympo- sium on Educational Advances in Artiﬁcial Intel- ligence (EAAI-18), New Orleans, Louisiana, USA, February ...

work page 2018

[16] [17]

Topic Aware Neural Response Generation

T opic augmented neural response generation with a joint attention mechanism. arXiv preprint arXiv:1606.08340, 2(2). Jingjing Xu, Xuancheng Ren, Junyang Lin, and Xu Sun

work page internal anchor Pith review Pith/arXiv arXiv

[17] [18]

In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3940–3949

Diversity-promoting gan: A cross- entropy based generative adversarial network for diversiﬁed text generation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3940–3949. Lili Yao, Nanyun Peng, Weischedel Ralph, Kevin Knight, Dongyan Zhao, and Rui Yan

work page 2018

[18] [19]

Plan-And-Write: Towards Better Automatic Storytelling

Plan- and-write: T owards better automatic storytelling. arXiv preprint arXiv:1811.05701. Saizheng Zhang, Emily Dinan, Jack Urbanek, Arthur Szlam, Douwe Kiela, and Jason Weston

work page internal anchor Pith review Pith/arXiv arXiv

[19] [20]

Personalizing Dialogue Agents: I have a dog, do you have pets too?

Per- sonalizing dialogue agents: I have a dog, do you have pets too? arXiv preprint arXiv:1801.07243. Hao Zhou, Minlie Huang, Tianyang Zhang, Xiaoyan Zhu, and Bing Liu

work page internal anchor Pith review Pith/arXiv arXiv