Conflict as an Inverse of Attention in Sequence Relationship

Rajarshee Mitra

arxiv: 1906.08593 · v2 · pith:O7E4AA4Tnew · submitted 2019-06-20 · 💻 cs.CL

Conflict as an Inverse of Attention in Sequence Relationship

Rajarshee Mitra This is my paper

Pith reviewed 2026-05-25 19:40 UTC · model grok-4.3

classification 💻 cs.CL

keywords attentionconflict modelsequence relationshipnatural language understandingrepulsioncontrastiveneural networksNLU

0 comments

The pith

A conflict model that measures repulsion between sequences can complement attention to improve NLU performance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Attention models sequence relationships by comparing similarities but performs poorly when sequences have no match or are contrastive. The paper proposes a conflict model that instead emphasizes repulsion between sequences, structured similarly to attention. Combining the two approaches leads to better results on natural language understanding tasks. This matters because language often involves both alignment and opposition, which pure attention may miss.

Core claim

Attention by its composition works best only when there is a match somewhere between two sequences. It does not adapt well to cases of no similarity or contrastive relationships. A conflict model similar to attention but focused on repulsion, when used with attention, boosts overall performance.

What carries the argument

The conflict model, which computes how well two sequences repel each other rather than their similarity.

If this is right

Attention alone is insufficient for contrastive sequence pairs.
Repulsion measurement can be added to existing attention mechanisms.
Combined attention-conflict models show performance gains on NLU tasks.
The approach handles cases where sequences have no similarity.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Hybrid models could be applied to tasks requiring detection of contradictions.
Future work might explore repulsion in other sequence domains like vision or speech.
Parameter sharing or joint training between attention and conflict could be optimized.

Load-bearing premise

That the repulsion-focused conflict model can be combined with attention to produce measurable gains on real natural language understanding tasks.

What would settle it

Running the proposed conflict model with attention on standard benchmarks and finding no improvement over attention alone.

Figures

Figures reproduced from arXiv: 1906.08593 by Rajarshee Mitra.

**Figure 2.** Figure 2: Conflict Heatmaps 4 Limits of using only Attention Attention operates by using dot product or sometimes addition followed by linear projection to a scalar which models the similarity between two vectors. Subsequently, softmax is applied which gives high probabilities to most matching word representations. This assumes that there is some highly matched word pairs already existing and high scores will be as… view at source ↗

**Figure 3.** Figure 3: Generic Model containing interaction layer. We use atten [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 5.** Figure 5: Training loss curve for Task 2 model in [PITH_FULL_IMAGE:figures/full_fig_p004_5.png] view at source ↗

**Figure 4.** Figure 4: Training loss curve for Task 1 9.2 Task 2: Ranking questions in Bing’s People Also Ask People Also Ask is a feature in Bing search result page where related questions are recommended to the user. User may click on a question to view the answer. Clicking is a positive feedback that shows user’s interest in the question. We use this click logs to build a question classifier using the same 1 https://data.quor… view at source ↗

read the original abstract

Attention is a very efficient way to model the relationship between two sequences by comparing how similar two intermediate representations are. Initially demonstrated in NMT, it is a standard in all NLU tasks today when efficient interaction between sequences is considered. However, we show that attention, by virtue of its composition, works best only when it is given that there is a match somewhere between two sequences. It does not very well adapt to cases when there is no similarity between two sequences or if the relationship is contrastive. We propose an Conflict model which is very similar to how attention works but which emphasizes mostly on how well two sequences repel each other and finally empirically show how this method in conjunction with attention can boost the overall performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper names a Conflict model as an attention inverse for contrastive sequences but the performance-boost claim has no experiments, datasets, or equations behind it.

read the letter

The main takeaway is that this work flags a possible limitation in how attention handles cases with no similarity or outright contrast between sequences, then proposes a separate Conflict mechanism that emphasizes repulsion. It claims the two together improve NLU results. That framing is the only real novelty on offer; the rest reads like a restatement of existing dissimilarity or contrastive ideas without a clear structural difference shown.

Referee Report

1 major / 0 minor

Summary. The manuscript argues that standard attention mechanisms in sequence modeling are effective primarily when sequences share similarities but perform poorly on contrastive or dissimilar relationships. It introduces a Conflict model that mirrors attention's structure but emphasizes repulsion between sequences instead, and asserts that combining Conflict with attention yields performance gains on NLU tasks.

Significance. If the empirical claim holds, a repulsion-focused complement to attention could address a genuine limitation in modeling contrastive relationships, with potential value for tasks like natural language inference. The conceptual framing of Conflict as an inverse to attention is a clear contribution, but the complete absence of any equations, model definitions, datasets, baselines, or results prevents assessment of whether the idea delivers measurable or reproducible gains.

major comments (1)

[Abstract] Abstract: the central claim that 'this method in conjunction with attention can boost the overall performance' is asserted without any description of tasks, datasets, baselines, controls, statistical tests, or quantitative results, rendering the sole support for the model's practical utility unevaluable.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the feedback. The point raised about the abstract is valid, and we address it directly below with a commitment to revise.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that 'this method in conjunction with attention can boost the overall performance' is asserted without any description of tasks, datasets, baselines, controls, statistical tests, or quantitative results, rendering the sole support for the model's practical utility unevaluable.

Authors: We agree with this assessment. The abstract currently states the performance claim at a high level without the supporting details needed for evaluation. In the revised manuscript we will expand the abstract to briefly reference the NLU tasks, datasets, baselines, and observed quantitative gains, while ensuring the main text supplies the model equations, definitions, experimental controls, and results. revision: yes

Circularity Check

0 steps flagged

No circularity: proposal lacks any derivation chain or fitted parameters

full rationale

The abstract and description present a conceptual analogy (Conflict as repulsion-focused counterpart to attention) followed by an asserted empirical claim. No equations, parameter-fitting steps, self-citations, or uniqueness theorems are supplied that would allow any reduction of outputs to inputs by construction. The performance-boost assertion is simply unsupported rather than internally circular.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

The abstract introduces the Conflict model as a new construct without citing prior formalization; no free parameters, axioms, or invented entities beyond the model name itself are stated.

invented entities (1)

Conflict model no independent evidence
purpose: To score repulsion between sequences as an inverse to attention
Introduced in the abstract as the core proposal; no independent evidence or falsifiable prediction supplied.

pith-pipeline@v0.9.0 · 5634 in / 1160 out tokens · 18356 ms · 2026-05-25T19:40:36.325296+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages · 2 internal anchors

[1]

Neural Machine Translation by Jointly Learning to Align and Translate

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. CoRR , abs/1409.0473, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[2]

Deep CTR prediction in display advertising

Junxuan Chen, Baigui Sun, Hao Li, Hongtao Lu, and Xian - Sheng Hua. Deep CTR prediction in display advertising. In Proceedings of the 2016 ACM Conference on Multimedia Conference, MM 2016, Amsterdam, The Netherlands, October 15-19, 2016 , pages 811--820, 2016

work page 2016
[3]

Attention-fused deep matching network for natural language inference

Chaoqun Duan, Lei Cui, Xinchi Chen, Furu Wei, Conghui Zhu, and Tiejun Zhao. Attention-fused deep matching network for natural language inference. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2018, July 13-19, 2018, Stockholm, Sweden. , pages 4033--4040, 2018

work page 2018
[4]

Stochastic Answer Networks for Natural Language Inference

Xiaodong Liu, Kevin Duh, and Jianfeng Gao. Stochastic answer networks for natural language inference. CoRR , abs/1804.07888, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[5]

Thang Luong, Hieu Pham, and Christopher D. Manning. Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal, September 17-21, 2015 , pages 1412--1421, 2015

work page 2015
[6]

Discriminative word alignment via alignment matrix modeling

Jan Niehues and Stephan Vogel. Discriminative word alignment via alignment matrix modeling. In Proceedings of the Third Workshop on Statistical Machine Translation, WMT@ACL 2008, Columbus, Ohio, USA, June 19, 2008 , pages 18--25, 2008

work page 2008
[7]

Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer

Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1-6, ...

work page 2018
[8]

Arafat Sultan, Steven Bethard, and Tamara Sumner

Md. Arafat Sultan, Steven Bethard, and Tamara Sumner. Dls @ cu: Sentence similarity from word alignment and semantic vector composition. In Daniel M. Cer, David Jurgens, Preslav Nakov, and Torsten Zesch, editors, Proceedings of the 9th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2015, Denver, Colorado, USA, June 4-5, 2015 , pages 148-...

work page 2015
[9]

a ckstr \

Gaurav Singh Tomar, Thyago Duque, Oscar T \" a ckstr \" o m, Jakob Uszkoreit, and Dipanjan Das. Neural paraphrase identification of questions with noisy pretraining. In Proceedings of the First Workshop on Subword and Character Level Models in NLP, Copenhagen, Denmark, September 7, 2017 , pages 142--147, 2017

work page 2017
[10]

Gomez, Lukasz Kaiser, and Illia Polosukhin

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA , pages 6000--6010, 2017

work page 2017
[11]

Bilateral multi-perspective matching for natural language sentences

Zhiguo Wang, Wael Hamza, and Radu Florian. Bilateral multi-perspective matching for natural language sentences. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI 2017, Melbourne, Australia, August 19-25, 2017 , pages 4144--4150, 2017

work page 2017
[12]

write newline

" write newline "" before.all 'output.state := FUNCTION fin.entry add.period write newline FUNCTION new.block output.state before.all = 'skip after.block 'output.state := if FUNCTION new.sentence output.state after.block = 'skip output.state before.all = 'skip after.sentence 'output.state := if if FUNCTION not #0 #1 if FUNCTION and 'skip pop #0 if FUNCTIO...

work page

[1] [1]

Neural Machine Translation by Jointly Learning to Align and Translate

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. CoRR , abs/1409.0473, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[2] [2]

Deep CTR prediction in display advertising

Junxuan Chen, Baigui Sun, Hao Li, Hongtao Lu, and Xian - Sheng Hua. Deep CTR prediction in display advertising. In Proceedings of the 2016 ACM Conference on Multimedia Conference, MM 2016, Amsterdam, The Netherlands, October 15-19, 2016 , pages 811--820, 2016

work page 2016

[3] [3]

Attention-fused deep matching network for natural language inference

Chaoqun Duan, Lei Cui, Xinchi Chen, Furu Wei, Conghui Zhu, and Tiejun Zhao. Attention-fused deep matching network for natural language inference. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2018, July 13-19, 2018, Stockholm, Sweden. , pages 4033--4040, 2018

work page 2018

[4] [4]

Stochastic Answer Networks for Natural Language Inference

Xiaodong Liu, Kevin Duh, and Jianfeng Gao. Stochastic answer networks for natural language inference. CoRR , abs/1804.07888, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[5] [5]

Thang Luong, Hieu Pham, and Christopher D. Manning. Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal, September 17-21, 2015 , pages 1412--1421, 2015

work page 2015

[6] [6]

Discriminative word alignment via alignment matrix modeling

Jan Niehues and Stephan Vogel. Discriminative word alignment via alignment matrix modeling. In Proceedings of the Third Workshop on Statistical Machine Translation, WMT@ACL 2008, Columbus, Ohio, USA, June 19, 2008 , pages 18--25, 2008

work page 2008

[7] [7]

Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer

Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1-6, ...

work page 2018

[8] [8]

Arafat Sultan, Steven Bethard, and Tamara Sumner

Md. Arafat Sultan, Steven Bethard, and Tamara Sumner. Dls @ cu: Sentence similarity from word alignment and semantic vector composition. In Daniel M. Cer, David Jurgens, Preslav Nakov, and Torsten Zesch, editors, Proceedings of the 9th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2015, Denver, Colorado, USA, June 4-5, 2015 , pages 148-...

work page 2015

[9] [9]

a ckstr \

Gaurav Singh Tomar, Thyago Duque, Oscar T \" a ckstr \" o m, Jakob Uszkoreit, and Dipanjan Das. Neural paraphrase identification of questions with noisy pretraining. In Proceedings of the First Workshop on Subword and Character Level Models in NLP, Copenhagen, Denmark, September 7, 2017 , pages 142--147, 2017

work page 2017

[10] [10]

Gomez, Lukasz Kaiser, and Illia Polosukhin

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA , pages 6000--6010, 2017

work page 2017

[11] [11]

Bilateral multi-perspective matching for natural language sentences

Zhiguo Wang, Wael Hamza, and Radu Florian. Bilateral multi-perspective matching for natural language sentences. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI 2017, Melbourne, Australia, August 19-25, 2017 , pages 4144--4150, 2017

work page 2017

[12] [12]

write newline

" write newline "" before.all 'output.state := FUNCTION fin.entry add.period write newline FUNCTION new.block output.state before.all = 'skip after.block 'output.state := if FUNCTION new.sentence output.state after.block = 'skip output.state before.all = 'skip after.sentence 'output.state := if if FUNCTION not #0 #1 if FUNCTION and 'skip pop #0 if FUNCTIO...

work page