Decomposable Neural Paraphrase Generation
Pith reviewed 2026-05-25 17:55 UTC · model grok-4.3
The pith
A neural paraphrase model with separate encoders and decoders for lexical, phrasal and sentential levels produces more controllable and domain-adaptable outputs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
DNPG consists of multiple encoders and decoders with differing structures, each responsible for paraphrasing at one granularity level (lexical, phrasal or sentential). The model learns to generate paraphrases in a disentangled fashion so that modifications at one level do not bleed into others. Empirical results indicate that this decomposition improves interpretability and controllability of the generation process. An unsupervised domain-adaptation method built on the same decomposition yields competitive in-domain accuracy and markedly stronger performance when the target domain differs from the training distribution.
What carries the argument
Multiple encoders and decoders with different structures, each assigned to a distinct granularity level of paraphrasing.
If this is right
- Paraphrase outputs can be controlled by intervening on individual granularity-specific components.
- The generation process becomes more interpretable because each component's contribution can be examined separately.
- In-domain paraphrase quality remains competitive with existing neural models.
- Unsupervised adaptation to a new domain produces significantly better results than non-decomposed baselines.
Where Pith is reading between the lines
- The modular structure may allow targeted retraining or editing of only the components that need adjustment when new paraphrase styles are required.
- Similar decomposition could be tested on other text-generation tasks where control at multiple scales is useful.
- If the disentanglement holds, the model could support fine-grained debugging by isolating failures to specific granularity levels.
Load-bearing premise
The separate encoders and decoders can be trained to keep their representations disentangled by granularity level without substantial leakage or interference between components.
What would settle it
An ablation or inspection experiment that finds the output of one granularity component strongly influences or correlates with outputs from the other components, or that shows no gain in domain-adaptation performance, would falsify the central claim.
Figures
read the original abstract
Paraphrasing exists at different granularity levels, such as lexical level, phrasal level and sentential level. This paper presents Decomposable Neural Paraphrase Generator (DNPG), a Transformer-based model that can learn and generate paraphrases of a sentence at different levels of granularity in a disentangled way. Specifically, the model is composed of multiple encoders and decoders with different structures, each of which corresponds to a specific granularity. The empirical study shows that the decomposition mechanism of DNPG makes paraphrase generation more interpretable and controllable. Based on DNPG, we further develop an unsupervised domain adaptation method for paraphrase generation. Experimental results show that the proposed model achieves competitive in-domain performance compared to the state-of-the-art neural models, and significantly better performance when adapting to a new domain.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes Decomposable Neural Paraphrase Generator (DNPG), a Transformer-based model with multiple encoders and decoders of differing structures, each tied to a granularity level (lexical, phrasal, sentential). It claims the decomposition yields disentangled representations that improve interpretability and controllability of paraphrase generation, while also enabling an unsupervised domain-adaptation method that achieves competitive in-domain results and significantly better out-of-domain performance than prior neural models.
Significance. If the disentanglement is realized and empirically verified, the architecture would supply a concrete mechanism for level-specific control in paraphrase generation and a practical route to domain adaptation without parallel data. The explicit multi-component design is a clear architectural contribution worth testing against standard sequence-to-sequence baselines.
major comments (2)
- [Model architecture] Model architecture section: the description states that the encoders/decoders have different structures and are trained jointly on standard paraphrase objectives, yet supplies no auxiliary loss, orthogonality penalty, routing gate, or information-bottleneck term that would enforce separation of the granularity-specific representations. Without such a mechanism, joint training on identical sentence pairs leaves open collapse or leakage across components, directly threatening both the interpretability/controllability claim and the domain-adaptation gains attributed to decomposition.
- [Experiments] Experimental section: the abstract asserts 'significantly better performance when adapting to a new domain' and 'competitive in-domain performance,' but the manuscript must report concrete metrics (BLEU, iBLEU, or human scores), the exact baselines, the adaptation datasets, and at least one ablation that isolates the contribution of the decomposed components versus a monolithic Transformer. Absent these, the central empirical claims cannot be evaluated.
minor comments (2)
- [Model architecture] Clarify the precise input/output interfaces between the granularity-specific modules and the shared components; the current description leaves the information flow ambiguous.
- [Experiments] Add a figure or table that visualizes example outputs at each granularity level to substantiate the interpretability claim.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. Below we respond point by point to the major comments.
read point-by-point responses
-
Referee: [Model architecture] Model architecture section: the description states that the encoders/decoders have different structures and are trained jointly on standard paraphrase objectives, yet supplies no auxiliary loss, orthogonality penalty, routing gate, or information-bottleneck term that would enforce separation of the granularity-specific representations. Without such a mechanism, joint training on identical sentence pairs leaves open collapse or leakage across components, directly threatening both the interpretability/controllability claim and the domain-adaptation gains attributed to decomposition.
Authors: The architecture assigns encoders and decoders with explicitly different structures to each granularity level (lexical, phrasal, sentential) and trains them jointly. We argue that these structural differences, rather than an auxiliary loss, are the primary mechanism for encouraging separation. That said, we acknowledge the possibility of leakage under pure joint training and will add a dedicated paragraph in the model section discussing this design choice together with an ablation that measures cross-component information flow. revision: partial
-
Referee: [Experiments] Experimental section: the abstract asserts 'significantly better performance when adapting to a new domain' and 'competitive in-domain performance,' but the manuscript must report concrete metrics (BLEU, iBLEU, or human scores), the exact baselines, the adaptation datasets, and at least one ablation that isolates the contribution of the decomposed components versus a monolithic Transformer. Absent these, the central empirical claims cannot be evaluated.
Authors: Section 4 already lists the concrete BLEU/iBLEU scores, the full set of baselines (including standard Transformer seq2seq), the in-domain and out-of-domain datasets, and the unsupervised adaptation protocol. To directly isolate the decomposition, we will insert a new ablation table that replaces the multi-component DNPG with a single monolithic Transformer of comparable capacity while keeping all other training details fixed. revision: yes
Circularity Check
No circularity; empirical architecture evaluated on standard objectives
full rationale
The paper introduces DNPG as a Transformer variant with multiple encoders/decoders tied to lexical/phrasal/sentential granularity, claiming disentangled generation and improved domain adaptation. All load-bearing claims rest on experimental results rather than any derivation, equation, or self-referential definition. No fitted parameters are renamed as predictions, no uniqueness theorems are imported via self-citation, and no ansatz is smuggled in. The architecture is presented as a proposal whose benefits are measured directly against baselines; the absence of explicit disentanglement losses is a modeling choice, not a circular reduction of the claimed outcome to its inputs.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
On Using Monolingual Corpora in Neural Machine Translation
Joint copying and restricted generation for paraphrase. In Thirty-First AAAI Conference on Ar- tificial Intelligence. Hanjun Dai, Bo Dai, Yan-Ming Zhang, Shuang Li, and Le Song. 2016. Recurrent hidden semi-markov model. In International Conference on Learning Representations. Tobias Domhan and Felix Hieber. 2017. Using target- side monolingual data for neu...
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[2]
Deep Recurrent Generative Decoder for Abstractive Text Summarization
Deep recurrent generative decoder for abstractive text summarization. arXiv preprint arXiv:1708.00625. Zichao Li, Xin Jiang, Lifeng Shang, and Hang Li
work page internal anchor Pith review Pith/arXiv arXiv
-
[3]
Neural Paraphrase Generation with Stacked Residual LSTM Networks
Paraphrase generation with deep reinforce- ment learning. In Proceedings of the 2018 Con- ference on Empirical Methods in Natural Language Processing, pages 3865–3878. Yi Liao, Lidong Bing, Piji Li, Shuming Shi, Wai Lam, and Tong Zhang. 2018. Quase: Sequence editing under quantifiable guidance. In Proceedings of the 2018 Conference on Empirical Methods in ...
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[4]
A task in a suit and a tie: paraphrase generation with semantic augmentation
Get to the point: Summarization with pointer- generator networks. In Proceedings of the 55th An- nual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages 1073– 1083. Yu Su and Xifeng Yan. 2017. Cross-domain seman- tic parsing via paraphrasing. In Proceedings of the 2017 Conference on Empirical Methods in Natural Langua...
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[5]
Learning neural templates for text generation. In Proceedings of the 2018 Conference on Empiri- cal Methods in Natural Language Processing, pages 3174–3187. A Algorithm for extracting templates Algorithm 1 ExtractSentParaPattern INPUT: X,Y ,Zx,Zy,α′,V OUTPUT: ¯X, ¯Y 1: procedure EXTRACT ¯X 2: L ← |X|; 3: ¯X ← [ ]; 4: c ← 1; 5: p ← [ ]; 6: forl := 1 toL do...
work page 2018
-
[6]
The generated paraphrase does not make sense and is not human- generated text
Non-readable. The generated paraphrase does not make sense and is not human- generated text. Please note that readable is not equivalent to grammatical correct. That is, considered there are non-English speaker, a readable paraphrase can have grammar mis- takes
-
[7]
The answer to the paraphrased question is not helpful to the owner of the original question
Readable but is not accurate. The answer to the paraphrased question is not helpful to the owner of the original question. For instance, how can i study c++ → what be c++. Here are some examples of accurate paraphrase: (a) how can i learn c++ → what be the best way to learn c++ (b) can i learn c++ in a easy way → be learn c++ hard (c) do you have some sug...
-
[8]
Just remove or add some stop words
Accurate but with trivial paraphrasing. Just remove or add some stop words. For in- stance, why can trump win the president elec- tion → why can trump win president election
-
[9]
More or loss, there is information loss of a non-trivial paraphrase
Novel paraphrasing. More or loss, there is information loss of a non-trivial paraphrase. Thus, again, determine whether the para- phrase is equivalent to the original question from the perspective of question owner. Fur- thermore, it is not necessary for a non-trivial paraphrase contains rare paraphrasing pat- tern. For instance, maybe there is lot of par...
-
[10]
A generated para- phrase with [UNK] should generally have higher rank
There maybe special token, that is, [UNK] in the generated paraphrase. A generated para- phrase with [UNK] should generally have higher rank
-
[11]
Otherwise, please try your best to distin- guish the quality of paraphrase
The same paraphrase should have same rank- ing. Otherwise, please try your best to distin- guish the quality of paraphrase
-
[12]
Please do Google search first when you see some strange word or phrase for better evalu- ation
-
[13]
Just assume all the words are in their right form
Please note that all the words are stemmed and lower case. Just assume all the words are in their right form. For instance, what be you suggestion of some english movie is equiv- alent to What are your suggestions of some English movies
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.