Sequence Generation: From Both Sides to the Middle

Chengqing Zong; Heng Yu; Jiajun Zhang; Long Zhou

arxiv: 1906.09601 · v1 · pith:SPFBC7HBnew · submitted 2019-06-23 · 💻 cs.CL · cs.AI· cs.LG

Sequence Generation: From Both Sides to the Middle

Long Zhou , Jiajun Zhang , Chengqing Zong , Heng Yu This is my paper

Pith reviewed 2026-05-25 17:43 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.LG

keywords sequence generationbidirectional decodingneural machine translationtext summarizationsynchronous generationautoregressive transformerattention network

0 comments

The pith

A synchronous bidirectional model generates sequences from both ends toward the middle at once, speeding up decoding while raising output quality.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard encoder-decoder models generate token by token from left to right, which becomes slow for long outputs and misses future context that could prevent under-translation. The paper replaces this with a synchronous bidirectional sequence generation approach that builds the output simultaneously from both sides inward. Left-to-right and right-to-left streams interact through a shared bidirectional attention network so each direction guides the other. Experiments on English-German, Chinese-English, and English-Romanian translation plus text summarization show the model decodes faster and produces higher-quality results than the autoregressive Transformer baseline.

Core claim

The SBSG model predicts its outputs from both sides to the middle simultaneously, with the left-to-right and right-to-left generation processes enabled to help and interact with each other by an interactive bidirectional attention network, yielding faster decoding and better generation quality than autoregressive baselines on neural machine translation and summarization.

What carries the argument

The interactive bidirectional attention network that lets the left-to-right and right-to-left decoders mutually guide each other during simultaneous generation.

If this is right

Decoding time decreases because tokens are produced in parallel from both ends rather than sequentially.
Output quality rises on machine translation and summarization because each direction supplies future context to the other.
Under-translation is reduced by the availability of right-side information during left-side generation.
The same architecture applies without modification to both translation and summarization tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same bidirectional interaction might reduce error propagation in tasks that require global coherence such as dialogue generation.
Combining the approach with other non-autoregressive techniques could produce further latency reductions on long outputs.
The method assumes the middle of the sequence can be reached reliably from both ends; failures there would require additional mechanisms to align the two halves.

Load-bearing premise

The interactive bidirectional attention network lets the two directional processes improve each other without creating inconsistencies or coherence problems in the final sequence.

What would settle it

A direct comparison on the En-De translation test set that measures wall-clock decoding time and BLEU score and finds no statistically significant speedup or quality gain versus the autoregressive Transformer would falsify the central claim.

Figures

Figures reproduced from arXiv: 1906.09601 by Chengqing Zong, Heng Yu, Jiajun Zhang, Long Zhou.

**Figure 3.** Figure 3: (left) Bidirectional Scaled Dot-Product Attention operates [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 5.** Figure 5: The bidirectional beam search process of our proposed [PITH_FULL_IMAGE:figures/full_fig_p004_5.png] view at source ↗

**Figure 4.** Figure 4: The smoothing model introduced to connect L2R and R2L [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 6.** Figure 6: Length Analysis - Performance of the generated trans [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

read the original abstract

The encoder-decoder framework has achieved promising process for many sequence generation tasks, such as neural machine translation and text summarization. Such a framework usually generates a sequence token by token from left to right, hence (1) this autoregressive decoding procedure is time-consuming when the output sentence becomes longer, and (2) it lacks the guidance of future context which is crucial to avoid under translation. To alleviate these issues, we propose a synchronous bidirectional sequence generation (SBSG) model which predicts its outputs from both sides to the middle simultaneously. In the SBSG model, we enable the left-to-right (L2R) and right-to-left (R2L) generation to help and interact with each other by leveraging interactive bidirectional attention network. Experiments on neural machine translation (En-De, Ch-En, and En-Ro) and text summarization tasks show that the proposed model significantly speeds up decoding while improving the generation quality compared to the autoregressive Transformer.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The SBSG model gives a workable synchronous end-to-middle generation scheme with interactive attention that reports speed and quality gains over autoregressive baselines on standard tasks.

read the letter

The main thing to know is that this paper describes a synchronous bidirectional sequence generation model that starts decoding from both ends and meets in the middle, with an interactive attention network so the left-to-right and right-to-left streams can inform each other during generation. The experiments on En-De, Ch-En, En-Ro translation and summarization indicate faster inference and better output quality than the standard autoregressive Transformer baseline.

Referee Report

0 major / 3 minor

Summary. The manuscript proposes a synchronous bidirectional sequence generation (SBSG) model that generates output sequences simultaneously from both ends toward the middle. It introduces an interactive bidirectional attention network so that left-to-right and right-to-left decoders can mutually condition each other during synchronous decoding. Experiments on En-De, Ch-En, En-Ro translation and summarization are reported to show both faster decoding and higher quality than the autoregressive Transformer baseline.

Significance. If the empirical gains hold under rigorous controls, the work would be significant because it directly targets the sequential bottleneck and missing future context of standard autoregressive decoding with a concrete cross-direction attention mechanism. The synchronous bidirectional construction supplies a falsifiable alternative to purely left-to-right generation.

minor comments (3)

[Abstract] Abstract: 'promising process' is a typographical error and should read 'promising progress'.
[Abstract] The abstract asserts 'significantly speeds up decoding while improving the generation quality' without any numerical deltas, speed-up factors, or BLEU/ROUGE scores; the full paper should ensure all headline claims are immediately supported by the first results table or figure.
[Introduction / Model description] The description of how synchronous decoding with cross-direction attention prevents coherence failures or length mismatches between the two directions is only sketched at a high level; a short algorithmic outline or pseudocode would improve clarity.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary of our SBSG model and the recommendation for minor revision. The report correctly identifies the core contribution of synchronous bidirectional decoding with interactive attention to mitigate the sequential bottleneck and lack of future context in autoregressive generation. No specific major comments were provided in the report, so we have no individual points to address at this time.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper presents an architectural proposal for synchronous bidirectional sequence generation using interactive bidirectional attention, with experimental results on translation and summarization tasks. No equations, fitted parameters, or derivation chains appear in the provided text that reduce a claimed prediction or result to an input by construction. The central claims rest on the described mechanism and empirical comparisons to autoregressive baselines, which are externally falsifiable via the reported metrics rather than self-referential. No self-citation load-bearing steps, ansatz smuggling, or renaming of known results are present.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no information on free parameters, axioms, or invented entities used by the model.

pith-pipeline@v0.9.0 · 5697 in / 962 out tokens · 41562 ms · 2026-05-25T17:43:00.631981+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages · 4 internal anchors

[1]

Neural machine translation by jointly learning to align and translate

[Bahdanau et al., 2015] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. In ICLR,

work page 2015
[2]

Sharp models on dull hard- ware: Fast and accurate neural machine translation decod- ing on the cpu

[Devlin, 2017] Jacob Devlin. Sharp models on dull hard- ware: Fast and accurate neural machine translation decod- ing on the cpu. In EMNLP, pages 2820–2825,

work page 2017
[3]

Bidirectional phrase-based statistical machine translation

[Finch and Sumita, 2009] Andrew Finch and Eiichiro Sumita. Bidirectional phrase-based statistical machine translation. In EMNLP, pages 1124–1132,

work page 2009
[4]

Convolutional sequence to sequence learning

[Gehring et al., 2017] Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann Dauphin. Convolutional sequence to sequence learning. In ICML,

work page 2017
[5]

Non-Autoregressive Neural Machine Translation

[Gu et al., 2017] Jiatao Gu, James Bradbury, Caiming Xiong, Victor OK Li, and Richard Socher. Non- autoregressive neural machine translation. arXiv preprint arXiv:1711.02281,

work page internal anchor Pith review Pith/arXiv arXiv 2017
[6]

Fast Decoding in Sequence Models using Discrete Latent Variables

[Kaiser et al., 2018] Łukasz Kaiser, Aurko Roy, Ashish Vaswani, Niki Pamar, Samy Bengio, Jakob Uszkoreit, and Noam Shazeer. Fast decoding in sequence models using discrete latent variables. arXiv preprint arXiv:1803.03382,

work page internal anchor Pith review Pith/arXiv arXiv 2018
[7]

[Kim and Rush, 2016] Yoon Kim and Alexander M. Rush. Sequence-level knowledge distillation. In EMNLP,

work page 2016
[8]

Moses: Open source toolkit for statistical machine translation

[Koehn et al., 2007] Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Con- stantin, and Evan Herbst. Moses: Open source toolkit for statistical machine translation. In ACL,

work page 2007
[9]

Deterministic non-autoregressive neural sequence modeling by iterative reﬁnement

[Lee et al., 2018] Jason Lee, Elman Mansimov, and Kyunghyun Cho. Deterministic non-autoregressive neural sequence modeling by iterative reﬁnement. In EMNLP, pages 1173–1182,

work page 2018
[10]

Ensure the correctness of the sum- mary: Incorporate entailment knowledge into abstractive sentence summarization

[Li et al., 2018] Haoran Li, Junnan Zhu, Jiajun Zhang, and Chengqing Zong. Ensure the correctness of the sum- mary: Incorporate entailment knowledge into abstractive sentence summarization. In COLING,

work page 2018
[11]

A compa- rable study on model averaging, ensembling and reranking in nmt

[Liu et al., 2018] Yuchen Liu, Long Zhou, Yining Wang, Yang Zhao, Jiajun Zhang, and Chengqing Zong. A compa- rable study on model averaging, ensembling and reranking in nmt. In NLPCC, pages 299–308,

work page 2018
[12]

V ocabulary manipulation for neural machine translation

[Mi et al., 2016] Haitao Mi, Zhiguo Wang, and Abe Itty- cheriah. V ocabulary manipulation for neural machine translation. In ACL, pages 124–129,

work page 2016
[13]

Parallel WaveNet: Fast High-Fidelity Speech Synthesis

[Oord et al., 2017] Aaron van den Oord, Yazhe Li, Igor Babuschkin, Karen Simonyan, Oriol Vinyals, Koray Kavukcuoglu, George van den Driessche, Edward Lock- hart, Luis C Cobo, Florian Stimberg, et al. Paral- lel wavenet: Fast high-ﬁdelity speech synthesis. arXiv preprint arXiv:1711.10433,

work page internal anchor Pith review Pith/arXiv arXiv 2017
[14]

Bleu: a methof for auto- matic evaluation of machine translation

[Papineni et al., 2002] Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. Bleu: a methof for auto- matic evaluation of machine translation. In ACL,

work page 2002
[15]

Rush, Sumit Chopra, and Jason Weston

[Rush et al., 2015] Alexander M. Rush, Sumit Chopra, and Jason Weston. A neural attention model for abstractive sentence summarization. In EMNLP,

work page 2015
[16]

Twin networks: Matching the future for sequence generation

[Serdyuk et al., 2018] Dmitriy Serdyuk, Nan Rosemary Ke, Alessandro Sordoni, Adam Trischler, Chris Pal, and Yoshua Bengio. Twin networks: Matching the future for sequence generation. In ICLR,

work page 2018
[17]

Sequence to sequence learning with neu- ral networks

[Sutskever et al., 2014] Ilya Sutskever, Oriol Vinyals, and Quoc VV Le. Sequence to sequence learning with neu- ral networks. In NIPS, pages 3104–3112,

work page 2014
[18]

Attention is all you need

[Vaswani et al., 2017] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In NIPS, pages 5998–6008,

work page 2017
[19]

Show and tell: A neu- ral image caption generator

[Vinyals et al., 2015] Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. Show and tell: A neu- ral image caption generator. In CVPR,

work page 2015
[20]

Semi-Autoregressive Neural Machine Translation

[Wang et al., 2018] Chunqi Wang, Ji Zhang, and Haiqing Chen. Semi-autoregressive neural machine translation. arXiv preprint arXiv:1808.08583,

work page internal anchor Pith review Pith/arXiv arXiv 2018
[21]

Bidirectional decoding for statistical machine translation

[Watanabe and Sumita, 2002] Taro Watanabe and Eiichiro Sumita. Bidirectional decoding for statistical machine translation. In COLING,

work page 2002
[22]

Show, attend and tell: Neural image caption generation with visual atten- tion

[Xu et al., 2015] Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel, and Yoshua Bengio. Show, attend and tell: Neural image caption generation with visual atten- tion. Computer Science, pages 2048–2057,

work page 2015
[23]

Selective encoding for abstractive sentence summarization

[Zhou et al., 2017] Qingyu Zhou, Nan Yang, Furu Wei, and Ming Zhou. Selective encoding for abstractive sentence summarization. In ACL, pages 1095–1104,

work page 2017
[24]

Synchronous bidirectional neural machine translation

[Zhou et al., 2019] Long Zhou, Jiajun Zhang, and Chengqing Zong. Synchronous bidirectional neural machine translation. In TACL, pages 91–105, 2019

work page 2019

[1] [1]

Neural machine translation by jointly learning to align and translate

[Bahdanau et al., 2015] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. In ICLR,

work page 2015

[2] [2]

Sharp models on dull hard- ware: Fast and accurate neural machine translation decod- ing on the cpu

[Devlin, 2017] Jacob Devlin. Sharp models on dull hard- ware: Fast and accurate neural machine translation decod- ing on the cpu. In EMNLP, pages 2820–2825,

work page 2017

[3] [3]

Bidirectional phrase-based statistical machine translation

[Finch and Sumita, 2009] Andrew Finch and Eiichiro Sumita. Bidirectional phrase-based statistical machine translation. In EMNLP, pages 1124–1132,

work page 2009

[4] [4]

Convolutional sequence to sequence learning

[Gehring et al., 2017] Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann Dauphin. Convolutional sequence to sequence learning. In ICML,

work page 2017

[5] [5]

Non-Autoregressive Neural Machine Translation

[Gu et al., 2017] Jiatao Gu, James Bradbury, Caiming Xiong, Victor OK Li, and Richard Socher. Non- autoregressive neural machine translation. arXiv preprint arXiv:1711.02281,

work page internal anchor Pith review Pith/arXiv arXiv 2017

[6] [6]

Fast Decoding in Sequence Models using Discrete Latent Variables

[Kaiser et al., 2018] Łukasz Kaiser, Aurko Roy, Ashish Vaswani, Niki Pamar, Samy Bengio, Jakob Uszkoreit, and Noam Shazeer. Fast decoding in sequence models using discrete latent variables. arXiv preprint arXiv:1803.03382,

work page internal anchor Pith review Pith/arXiv arXiv 2018

[7] [7]

[Kim and Rush, 2016] Yoon Kim and Alexander M. Rush. Sequence-level knowledge distillation. In EMNLP,

work page 2016

[8] [8]

Moses: Open source toolkit for statistical machine translation

[Koehn et al., 2007] Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Con- stantin, and Evan Herbst. Moses: Open source toolkit for statistical machine translation. In ACL,

work page 2007

[9] [9]

Deterministic non-autoregressive neural sequence modeling by iterative reﬁnement

[Lee et al., 2018] Jason Lee, Elman Mansimov, and Kyunghyun Cho. Deterministic non-autoregressive neural sequence modeling by iterative reﬁnement. In EMNLP, pages 1173–1182,

work page 2018

[10] [10]

Ensure the correctness of the sum- mary: Incorporate entailment knowledge into abstractive sentence summarization

[Li et al., 2018] Haoran Li, Junnan Zhu, Jiajun Zhang, and Chengqing Zong. Ensure the correctness of the sum- mary: Incorporate entailment knowledge into abstractive sentence summarization. In COLING,

work page 2018

[11] [11]

A compa- rable study on model averaging, ensembling and reranking in nmt

[Liu et al., 2018] Yuchen Liu, Long Zhou, Yining Wang, Yang Zhao, Jiajun Zhang, and Chengqing Zong. A compa- rable study on model averaging, ensembling and reranking in nmt. In NLPCC, pages 299–308,

work page 2018

[12] [12]

V ocabulary manipulation for neural machine translation

[Mi et al., 2016] Haitao Mi, Zhiguo Wang, and Abe Itty- cheriah. V ocabulary manipulation for neural machine translation. In ACL, pages 124–129,

work page 2016

[13] [13]

Parallel WaveNet: Fast High-Fidelity Speech Synthesis

[Oord et al., 2017] Aaron van den Oord, Yazhe Li, Igor Babuschkin, Karen Simonyan, Oriol Vinyals, Koray Kavukcuoglu, George van den Driessche, Edward Lock- hart, Luis C Cobo, Florian Stimberg, et al. Paral- lel wavenet: Fast high-ﬁdelity speech synthesis. arXiv preprint arXiv:1711.10433,

work page internal anchor Pith review Pith/arXiv arXiv 2017

[14] [14]

Bleu: a methof for auto- matic evaluation of machine translation

[Papineni et al., 2002] Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. Bleu: a methof for auto- matic evaluation of machine translation. In ACL,

work page 2002

[15] [15]

Rush, Sumit Chopra, and Jason Weston

[Rush et al., 2015] Alexander M. Rush, Sumit Chopra, and Jason Weston. A neural attention model for abstractive sentence summarization. In EMNLP,

work page 2015

[16] [16]

Twin networks: Matching the future for sequence generation

[Serdyuk et al., 2018] Dmitriy Serdyuk, Nan Rosemary Ke, Alessandro Sordoni, Adam Trischler, Chris Pal, and Yoshua Bengio. Twin networks: Matching the future for sequence generation. In ICLR,

work page 2018

[17] [17]

Sequence to sequence learning with neu- ral networks

[Sutskever et al., 2014] Ilya Sutskever, Oriol Vinyals, and Quoc VV Le. Sequence to sequence learning with neu- ral networks. In NIPS, pages 3104–3112,

work page 2014

[18] [18]

Attention is all you need

[Vaswani et al., 2017] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In NIPS, pages 5998–6008,

work page 2017

[19] [19]

Show and tell: A neu- ral image caption generator

[Vinyals et al., 2015] Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. Show and tell: A neu- ral image caption generator. In CVPR,

work page 2015

[20] [20]

Semi-Autoregressive Neural Machine Translation

[Wang et al., 2018] Chunqi Wang, Ji Zhang, and Haiqing Chen. Semi-autoregressive neural machine translation. arXiv preprint arXiv:1808.08583,

work page internal anchor Pith review Pith/arXiv arXiv 2018

[21] [21]

Bidirectional decoding for statistical machine translation

[Watanabe and Sumita, 2002] Taro Watanabe and Eiichiro Sumita. Bidirectional decoding for statistical machine translation. In COLING,

work page 2002

[22] [22]

Show, attend and tell: Neural image caption generation with visual atten- tion

[Xu et al., 2015] Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel, and Yoshua Bengio. Show, attend and tell: Neural image caption generation with visual atten- tion. Computer Science, pages 2048–2057,

work page 2015

[23] [23]

Selective encoding for abstractive sentence summarization

[Zhou et al., 2017] Qingyu Zhou, Nan Yang, Furu Wei, and Ming Zhou. Selective encoding for abstractive sentence summarization. In ACL, pages 1095–1104,

work page 2017

[24] [24]

Synchronous bidirectional neural machine translation

[Zhou et al., 2019] Long Zhou, Jiajun Zhang, and Chengqing Zong. Synchronous bidirectional neural machine translation. In TACL, pages 91–105, 2019

work page 2019