Are You Convinced? Choosing the More Convincing Evidence with a Siamese Network

Eyal Shnarch; Guy Moshkowich; Lena Dankin; Leshem Choshen; Martin Gleize; Noam Slonim; Ranit Aharonov

arxiv: 1907.08971 · v2 · pith:VBFQJWM4new · submitted 2019-07-21 · 💻 cs.LG · cs.CL· stat.ML

Are You Convinced? Choosing the More Convincing Evidence with a Siamese Network

Martin Gleize , Eyal Shnarch , Leshem Choshen , Lena Dankin , Guy Moshkowich , Ranit Aharonov , Noam Slonim This is my paper

Pith reviewed 2026-05-24 18:38 UTC · model grok-4.3

classification 💻 cs.LG cs.CLstat.ML

keywords convincingnesssiamese networkevidence pairsargument detectionpersuasive evidenceneural comparison

0 comments

The pith

A Siamese neural network outperforms baselines when selecting the more convincing evidence from pairs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a new dataset of evidence pairs labeled for relative convincingness, created to be harder than prior collections. It proposes a Siamese neural network that processes two pieces of evidence through shared weights and learns to identify which one is stronger. The approach is shown to beat several baselines on both the new dataset and an earlier convincingness collection. The work matters because automated systems are increasingly expected to handle persuasive exchanges on complex topics. The authors also examine what forms of argumentative strength their method can detect.

Core claim

The central claim is that a Siamese neural network architecture can be trained to determine which of two evidence items is more convincing, and that this architecture achieves higher accuracy than several baselines when tested on both a prior convincingness dataset and the new IBM-EviConv collection of labeled pairs.

What carries the argument

The Siamese neural network architecture, which shares weights across two input branches to produce comparable representations of each evidence item for a convincingness decision.

If this is right

Automated systems can more reliably surface stronger evidence during discussions or debates.
The new labeled pairs supply a benchmark for measuring progress on relative convincingness.
The network can identify multiple kinds of argumentative value beyond surface features.
Performance gains on the harder dataset suggest the method handles subtle differences in evidence strength.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same comparison structure could be applied to rank longer arguments or entire debate turns rather than isolated evidence.
Training on the dataset might reduce selection of weak evidence in question-answering systems.
Domain-specific retraining could test whether the learned notion of convincingness transfers across topics.

Load-bearing premise

Human labels on which evidence is more convincing capture a stable notion that does not depend heavily on missing context or individual differences.

What would settle it

A new collection of evidence pairs where the Siamese model shows no accuracy advantage over the baselines, or where repeated human labeling of the same pairs produces low agreement.

read the original abstract

With the advancement in argument detection, we suggest to pay more attention to the challenging task of identifying the more convincing arguments. Machines capable of responding and interacting with humans in helpful ways have become ubiquitous. We now expect them to discuss with us the more delicate questions in our world, and they should do so armed with effective arguments. But what makes an argument more persuasive? What will convince you? In this paper, we present a new data set, IBM-EviConv, of pairs of evidence labeled for convincingness, designed to be more challenging than existing alternatives. We also propose a Siamese neural network architecture shown to outperform several baselines on both a prior convincingness data set and our own. Finally, we provide insights into our experimental results and the various kinds of argumentative value our method is capable of detecting.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces the IBM-EviConv dataset of evidence pairs labeled for convincingness (designed to be more challenging than prior sets) and proposes a Siamese neural network architecture that outperforms several baselines on both IBM-EviConv and an existing convincingness dataset. It additionally provides insights into the experimental results and the argumentative values detected by the model.

Significance. The introduction of a new labeled dataset focused on evidence convincingness is a positive contribution to computational argumentation and argument mining. If the empirical outperformance claim is shown to be robust (with statistical validation and reliable labels), the work would meaningfully advance the field by moving beyond argument detection toward selection of more persuasive evidence for improved human-AI dialogue systems.

major comments (2)

[Abstract] Abstract: The claim that the Siamese network 'outperforms several baselines' on IBM-EviConv and a prior dataset is stated without any accompanying dataset sizes, statistical tests, error bars, or experimental controls. This leaves the central empirical claim weakly supported.
[IBM-EviConv dataset description] IBM-EviConv dataset description: No inter-annotator agreement figures, annotation protocol details, or analysis of how broader argumentative context or individual annotator bias was controlled are provided. This is load-bearing for the central claim, as superior held-out performance does not demonstrate learning of convincingness if the labels primarily reflect annotator idiosyncrasies or missing context.

minor comments (1)

[Abstract] The abstract could briefly note the scale of IBM-EviConv (number of pairs) to help readers assess the scope of the new resource.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our work. We address the major comments point by point below and indicate planned revisions where appropriate.

read point-by-point responses

Referee: [Abstract] Abstract: The claim that the Siamese network 'outperforms several baselines' on IBM-EviConv and a prior dataset is stated without any accompanying dataset sizes, statistical tests, error bars, or experimental controls. This leaves the central empirical claim weakly supported.

Authors: We agree the abstract is concise and omits quantitative details. The manuscript body reports dataset sizes, baseline comparisons, and performance metrics in the experimental evaluation. We will revise the abstract to incorporate key figures such as dataset scale and performance deltas to strengthen the central claim. revision: yes
Referee: [IBM-EviConv dataset description] IBM-EviConv dataset description: No inter-annotator agreement figures, annotation protocol details, or analysis of how broader argumentative context or individual annotator bias was controlled are provided. This is load-bearing for the central claim, as superior held-out performance does not demonstrate learning of convincingness if the labels primarily reflect annotator idiosyncrasies or missing context.

Authors: We concur that explicit reporting of annotation quality metrics is necessary. The current manuscript describes the dataset but omits inter-annotator agreement, full protocol, and bias controls. We will revise the dataset section to include these elements along with any available measures for context and annotator effects. revision: yes

Circularity Check

0 steps flagged

No circularity: standard empirical ML evaluation on new dataset

full rationale

The paper introduces IBM-EviConv (pairs of evidence labeled for convincingness) and trains a Siamese network to predict which is more convincing, reporting outperformance vs. baselines on this set and a prior one. This is supervised learning with held-out evaluation; no equations, predictions, or claims reduce by construction to fitted inputs, self-citations, or renamed patterns. The central result is an empirical comparison whose validity rests on the (external) quality of the human labels rather than any definitional loop inside the method.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on the validity of human convincingness labels and standard supervised learning assumptions; no additional free parameters, axioms, or invented entities are introduced beyond typical neural network training.

pith-pipeline@v0.9.0 · 5698 in / 881 out tokens · 21259 ms · 2026-05-24T18:38:21.805251+00:00 · methodology

Are You Convinced? Choosing the More Convincing Evidence with a Siamese Network

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)