Making Sense of Scams: Understanding Scam Conversations Through Multi-Level Alignment

Jacky Keung; Jialong Li; Jingyu Zhang; Kehui Chen; Xiangyu Li; Yicheng Sun; Zhenyu Mao

arxiv: 2604.23973 · v1 · submitted 2026-04-27 · 💻 cs.HC

Making Sense of Scams: Understanding Scam Conversations Through Multi-Level Alignment

Zhenyu Mao , Jacky Keung , Xiangyu Li , Yicheng Sun , Kehui Chen , Jingyu Zhang , Jialong Li This is my paper

Pith reviewed 2026-05-08 02:20 UTC · model grok-4.3

classification 💻 cs.HC

keywords scam detectionconversational alignmentmulti-level signalsuser studyonline scamssensemakinginteraction designdeception detection

0 comments

The pith

Multi-level alignment hints raise scam detection F1 by 0.21

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to establish that multi-level alignment-based hints can serve as effective signals for users to make sense of scam risks in ongoing conversations. By tracking how speakers align at different levels, these hints reveal a pattern where high-level alignments decline as scams approach. This leads to better user performance in detecting scams. A sympathetic reader would care because it offers a way to support awareness without disruptive pop-ups, addressing limitations in current detection systems.

Core claim

By operationalizing low-level lexical and syntactic alignments and high-level semantic and situation-model alignments between conversational participants, multi-level alignment-based hints make the dynamics of scam conversations visible. Preliminary evaluation on real-life scam dialogues shows low-level scores stable while high-level scores decline systematically near scam attempts. In a user study with thirty participants, these hints increase precision by 0.25, recall by 0.16, and F1 score by 0.21 relative to the no-hint baseline, with larger gains than keyword-triggered alerts, and support earlier and more stable confidence formation.

What carries the argument

Multi-level alignment-based hints operationalizing low-level lexical and syntactic alignments and high-level semantic and situation-model alignments between participants.

If this is right

Users can identify scams with substantially higher accuracy using the hints compared to keyword alerts.
The combination of alignments at multiple levels proves more effective than single-level signals.
Confidence in detecting scams forms earlier and stays more stable throughout the conversation.
The decline in high-level alignments provides a reliable signal of scam progression.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

These hints might apply to detecting manipulation in other interactive settings like customer service or online dating.
Real-time visualization of alignment levels could be integrated into messaging apps to aid general awareness.
Further work could test if the pattern holds in non-English conversations or with different scam types.

Load-bearing premise

The systematic decline in high-level alignment scores is a stable and generalizable marker of scam progression rather than limited to the specific dialogues studied.

What would settle it

A larger study of diverse real-life scam dialogues that fails to replicate the systematic decline in high-level alignment scores near scam attempts would falsify the key pattern.

Figures

Figures reproduced from arXiv: 2604.23973 by Jacky Keung, Jialong Li, Jingyu Zhang, Kehui Chen, Xiangyu Li, Yicheng Sun, Zhenyu Mao.

**Figure 1.** Figure 1: Alignment analysis on example dialogue round view at source ↗

**Figure 2.** Figure 2: User study UI. Left: ongoing dialogue. Upper right: view at source ↗

**Figure 3.** Figure 3: Mean Alignment Score Change per Dialogue Round view at source ↗

**Figure 4.** Figure 4: Confidence Change per Round with 95% Confidence view at source ↗

read the original abstract

Online scams often unfold gradually through interaction, yet existing detection systems predominantly rely on snapshot-based signals and interruptive warnings, revealing two research gaps in the lack of signals that represent scam risk within conversational dynamics and the underexplored design of non-interruptive interaction. To address these gaps, we introduce multi-level alignment-based hints, informed by the Interactive Alignment Model, as a new detection signal for supporting sensemaking in scam-related conversations. These hints operationalize low-level lexical and syntactic alignments and high-level semantic and situation-model alignments between conversational participants, making conversational dynamics visible to users. We first conduct a preliminary evaluation on real-life scam dialogues, showing that as conversations approach scam attempts, low-level alignment scores remain stable while high-level alignment scores systematically decline, revealing a consistent cross-level pattern indicative of scam progression. Building on this insight, we conduct a user study with thirty participants, indicating that relative to the no-hint baseline, multi-level alignment-based hints increase precision by 0.25, recall by 0.16, and F1 score by 0.21, yielding substantially larger gains than the marginal improvements achieved by keyword-triggered alerts. Statistical analyses reveal that the proposed hints support earlier and more stable confidence formation over time, with ablation results further highlighting the effectiveness of combining alignment hints across levels in achieving these advantages.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The multi-level alignment approach offers a fresh signal for scam conversations, but the preliminary pattern needs independent checks before the user study gains can be trusted.

read the letter

This paper's core contribution is showing that high-level semantic alignment drops in scam conversations while low-level stays stable, and then using that to build hints that improve user detection in a study. The new part is bringing the Interactive Alignment Model into scam research and turning it into multi-level hints for sensemaking. They did a preliminary look at real dialogues to spot the pattern, then ran a user study with thirty people comparing the hints to no hints and to keyword alerts. The gains are decent: 0.25 better precision, 0.16 recall, 0.21 F1. What works is the focus on non-interruptive support for users figuring out conversations over time. The ablation showing that combining levels helps is a nice touch, and the claim about earlier confidence formation sounds useful for interface design. The soft spots are in the evidence base. The pattern discovery and evaluation use the same set of dialogues, so there's a real risk the results are partly circular. The abstract doesn't spell out recruitment, how they picked the scam dialogues, or the statistical tests, which makes it tough to judge if the decline in high-level alignment is a stable signal or just something in their sample. The stress-test point about generalizability seems to land here. This kind of work is for people in HCI and online safety who are thinking about conversational dynamics rather than just keyword filters. A reader looking for fresh signals in scam prevention would find the idea worth considering. I would recommend sending it for peer review. The idea has legs, but it needs tighter methods and probably some hold-out validation to hold up.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces multi-level alignment-based hints, drawing from the Interactive Alignment Model, to help users detect scams in conversations by visualizing low-level (lexical/syntactic) and high-level (semantic/situation-model) alignments between participants. A preliminary evaluation on real-life scam dialogues shows stable low-level alignment scores but systematically declining high-level scores as conversations approach scam attempts. Building on this, a 30-participant user study reports that the hints improve precision by 0.25, recall by 0.16, and F1 by 0.21 relative to a no-hint baseline, with larger gains than keyword-triggered alerts, while also supporting earlier and more stable confidence formation over time.

Significance. If the cross-level alignment decline proves a robust, generalizable signal of scam progression, the work offers a novel conversational dynamic for non-interruptive scam sensemaking in HCI, moving beyond snapshot detection. The concrete numeric gains from the user study and ablation results on combining levels provide evidence of practical utility and highlight the value of multi-level operationalization. Strengths include the explicit linkage to the Interactive Alignment Model and the focus on user confidence trajectories.

major comments (3)

[Abstract] Abstract: The reported performance gains (Δprecision 0.25, Δrecall 0.16, ΔF1 0.21) and claims of earlier confidence formation rest directly on the preliminary evaluation's finding of systematic high-level alignment decline. No details are given on dialogue sourcing, segmentation, alignment score computation, or statistical tests confirming the decline, preventing assessment of whether the pattern is stable or an artifact of the examined dialogues.
[Abstract] Abstract: The alignment hints are derived from patterns first observed in the same class of real-life scam dialogues used for the preliminary evaluation. Without independent validation data or pre-registered hypotheses, this introduces a circularity risk that could partly account for the user-study gains rather than confirming a generalizable scam-progression signal.
[User study] User study: The 30-participant evaluation lacks specification of recruitment methods, scam dialogue sourcing and selection criteria for the study tasks, controls for confounds such as conversation length or scam subtype, and the exact statistical tests supporting claims of more stable confidence over time.

minor comments (2)

The abstract and methods should explicitly define how low-level and high-level alignments are operationalized (e.g., specific metrics for lexical vs. semantic alignment) and how the hints are visually presented to participants to enable replication.
Clarify the baseline conditions in the user study, including how keyword-triggered alerts were implemented and whether the no-hint condition included any other form of support.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback and positive assessment of the work's significance. We address each major comment below and will revise the manuscript to provide the requested details and clarifications.

read point-by-point responses

Referee: The reported performance gains (Δprecision 0.25, Δrecall 0.16, ΔF1 0.21) and claims of earlier confidence formation rest directly on the preliminary evaluation's finding of systematic high-level alignment decline. No details are given on dialogue sourcing, segmentation, alignment score computation, or statistical tests confirming the decline, preventing assessment of whether the pattern is stable or an artifact of the examined dialogues.

Authors: We agree that the abstract omits these methodological details due to length constraints. The full manuscript (Section 3) specifies dialogue sourcing from public scam report repositories, turn-based segmentation, alignment computation (low-level: lexical Jaccard and syntactic dependency overlap; high-level: semantic cosine similarity on embeddings and situation-model entity/intent alignment), and statistical tests (linear mixed-effects models confirming high-level decline with β=-0.15, p<0.001). We will revise the abstract to reference these elements and expand the methods for full transparency. revision: yes
Referee: The alignment hints are derived from patterns first observed in the same class of real-life scam dialogues used for the preliminary evaluation. Without independent validation data or pre-registered hypotheses, this introduces a circularity risk that could partly account for the user-study gains rather than confirming a generalizable scam-progression signal.

Authors: We acknowledge the circularity concern as a substantive methodological point. The preliminary evaluation was exploratory to identify the cross-level pattern, while the user study applies the derived hints to separate held-out dialogues from the same domain. In revision we will add a limitations subsection explicitly discussing this split, the absence of pre-registration, and plans for future independent validation datasets. The ablation results on multi-level combinations provide additional support for the hints' utility beyond the initial observation. revision: partial
Referee: The 30-participant evaluation lacks specification of recruitment methods, scam dialogue sourcing and selection criteria for the study tasks, controls for confounds such as conversation length or scam subtype, and the exact statistical tests supporting claims of more stable confidence over time.

Authors: We agree these specifications were insufficient. The revised manuscript will expand Section 4 to detail recruitment (Prolific platform plus university pool, N=30 with demographics), sourcing (15 public scam transcripts not overlapping preliminary set, selected for subtype and length diversity), selection criteria and controls (length-matched non-scam dialogues, subtype balancing, randomization), and statistical tests (repeated-measures ANOVA on confidence trajectories showing interaction F(4,116)=3.45, p=0.01 for stability). These will also be summarized in the abstract. revision: yes

Circularity Check

0 steps flagged

No significant circularity; user-study gains are independently measured

full rationale

The paper first observes a cross-level alignment decline pattern via preliminary evaluation on real-life scam dialogues, then designs multi-level hints informed by the Interactive Alignment Model to surface that pattern. The central performance claims (Δprecision 0.25, Δrecall 0.16, ΔF1 0.21) are obtained from a separate user study with 30 participants comparing hints against no-hint and keyword baselines. No equations, fitted parameters, or self-citations reduce the reported gains to the preliminary observations by construction. The user-study metrics are externally measured outcomes, not statistical artifacts of the discovery step. The derivation chain therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the Interactive Alignment Model and on the assumption that observed alignment changes in scam dialogues are diagnostic.

axioms (1)

domain assumption The Interactive Alignment Model applies to scam conversations and produces measurable low- and high-level alignment scores that differ systematically from non-scam talk.
Hints are informed by this model and the preliminary evaluation treats the model as directly transferable.

pith-pipeline@v0.9.0 · 5557 in / 1172 out tokens · 36513 ms · 2026-05-08T02:20:56.571756+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages

[1]

Measuring the cost of cybercrime,

R. Anderson, C. Barton, R. B ¨ohme, R. Clayton, M. J. Van Eeten, M. Levi, T. Moore, and S. Savage, “Measuring the cost of cybercrime,” inThe economics of information security and privacy. Springer, 2013, pp. 265–300

work page 2013
[2]

Victims, vigilantes, and advice givers: An analy- sis of{Scam-Related}discourse on reddit,

R. Oak and Z. Shafiq, “Victims, vigilantes, and advice givers: An analy- sis of{Scam-Related}discourse on reddit,” inTwenty-First Symposium on Usable Privacy and Security (SOUPS 2025), 2025, pp. 57–71

work page 2025
[3]

Crying wolf: An empirical study of ssl warning effectiveness

J. Sunshine, S. Egelman, H. Almuhimedi, N. Atri, and L. F. Cranor, “Crying wolf: An empirical study of ssl warning effectiveness.” in USENIX security symposium. Montreal, Canada, 2009, pp. 399–416

work page 2009
[4]

Alice in warningland: a{Large-Scale} field study of browser security warning effectiveness,

D. Akhawe and A. P. Felt, “Alice in warningland: a{Large-Scale} field study of browser security warning effectiveness,” in22nd USENIX security symposium (USENIX Security 13), 2013, pp. 257–272

work page 2013
[5]

Sok: a comprehensive reexamination of phishing research from the security perspective,

A. Das, S. Baki, A. El Aassal, R. Verma, and A. Dunbar, “Sok: a comprehensive reexamination of phishing research from the security perspective,”IEEE Communications Surveys & Tutorials, vol. 22, no. 1, pp. 671–708, 2019

work page 2019
[6]

Alignment as the basis for successful communication,

M. J. Pickering and S. Garrod, “Alignment as the basis for successful communication,”Research on language and Computation, vol. 4, no. 2, pp. 203–228, 2006

work page 2006
[7]

Toward a mechanistic psychology of dialogue,

——, “Toward a mechanistic psychology of dialogue,”Behavioral and brain sciences, vol. 27, no. 2, pp. 169–190, 2004

work page 2004
[8]

Unveiling suspicious phishing attacks: enhancing detection with an op- timal feature vectorization algorithm and supervised machine learning,

M. A. Tamal, M. K. Islam, T. Bhuiyan, A. Sattar, and N. U. Prince, “Unveiling suspicious phishing attacks: enhancing detection with an op- timal feature vectorization algorithm and supervised machine learning,” Frontiers in Computer Science, vol. 6, p. 1428013, 2024

work page 2024
[9]

Let warnings interrupt the interaction and explain: designing and evaluating phishing email warnings,

P. Buono, G. Desolda, F. Greco, and A. Piccinno, “Let warnings interrupt the interaction and explain: designing and evaluating phishing email warnings,” inExtended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems, 2023, pp. 1–6

work page 2023
[10]

Explanations in warning dialogs to help users defend against phishing attacks,

G. Desolda, J. Aneke, C. Ardito, R. Lanzilotti, and M. F. Costabile, “Explanations in warning dialogs to help users defend against phishing attacks,”International Journal of Human-Computer Studies, vol. 176, p. 103056, 2023

work page 2023
[11]

Understanding by addressees and overhearers,

M. F. Schober and H. H. Clark, “Understanding by addressees and overhearers,”Cognitive psychology, vol. 21, no. 2, pp. 211–232, 1989

work page 1989
[12]

Syntactic co- ordination in dialogue,

H. P. Branigan, M. J. Pickering, and A. A. Cleland, “Syntactic co- ordination in dialogue,”Cognition, vol. 75, no. 2, pp. B13–B25, 2000

work page 2000
[13]

Joint action, interactive alignment, and dialog,

S. Garrod and M. J. Pickering, “Joint action, interactive alignment, and dialog,”Topics in Cognitive Science, vol. 1, no. 2, pp. 292–304, 2009

work page 2009
[14]

Lovefraud02,

P. Faber, “Lovefraud02,” 2024. [Online]. Available: https://data. mendeley.com/datasets/kmhvb4x5d8/1

work page 2024
[15]

Mark my words! linguistic style accommodation in social media,

C. Danescu-Niculescu-Mizil, M. Gamon, and S. Dumais, “Mark my words! linguistic style accommodation in social media,” inProceedings of the 20th international conference on World wide web, 2011, pp. 745– 754

work page 2011
[16]

H. H. Clark,Using language. Cambridge university press, 1996

work page 1996
[17]

Llm- based class diagram derivation from user stories with chain-of-thought promptings,

Y . Li, J. Keung, X. Ma, C. Y . Chong, J. Zhang, and Y . Liao, “Llm- based class diagram derivation from user stories with chain-of-thought promptings,” in2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC). IEEE, 2024, pp. 45–50

work page 2024

[1] [1]

Measuring the cost of cybercrime,

R. Anderson, C. Barton, R. B ¨ohme, R. Clayton, M. J. Van Eeten, M. Levi, T. Moore, and S. Savage, “Measuring the cost of cybercrime,” inThe economics of information security and privacy. Springer, 2013, pp. 265–300

work page 2013

[2] [2]

Victims, vigilantes, and advice givers: An analy- sis of{Scam-Related}discourse on reddit,

R. Oak and Z. Shafiq, “Victims, vigilantes, and advice givers: An analy- sis of{Scam-Related}discourse on reddit,” inTwenty-First Symposium on Usable Privacy and Security (SOUPS 2025), 2025, pp. 57–71

work page 2025

[3] [3]

Crying wolf: An empirical study of ssl warning effectiveness

J. Sunshine, S. Egelman, H. Almuhimedi, N. Atri, and L. F. Cranor, “Crying wolf: An empirical study of ssl warning effectiveness.” in USENIX security symposium. Montreal, Canada, 2009, pp. 399–416

work page 2009

[4] [4]

Alice in warningland: a{Large-Scale} field study of browser security warning effectiveness,

D. Akhawe and A. P. Felt, “Alice in warningland: a{Large-Scale} field study of browser security warning effectiveness,” in22nd USENIX security symposium (USENIX Security 13), 2013, pp. 257–272

work page 2013

[5] [5]

Sok: a comprehensive reexamination of phishing research from the security perspective,

A. Das, S. Baki, A. El Aassal, R. Verma, and A. Dunbar, “Sok: a comprehensive reexamination of phishing research from the security perspective,”IEEE Communications Surveys & Tutorials, vol. 22, no. 1, pp. 671–708, 2019

work page 2019

[6] [6]

Alignment as the basis for successful communication,

M. J. Pickering and S. Garrod, “Alignment as the basis for successful communication,”Research on language and Computation, vol. 4, no. 2, pp. 203–228, 2006

work page 2006

[7] [7]

Toward a mechanistic psychology of dialogue,

——, “Toward a mechanistic psychology of dialogue,”Behavioral and brain sciences, vol. 27, no. 2, pp. 169–190, 2004

work page 2004

[8] [8]

Unveiling suspicious phishing attacks: enhancing detection with an op- timal feature vectorization algorithm and supervised machine learning,

M. A. Tamal, M. K. Islam, T. Bhuiyan, A. Sattar, and N. U. Prince, “Unveiling suspicious phishing attacks: enhancing detection with an op- timal feature vectorization algorithm and supervised machine learning,” Frontiers in Computer Science, vol. 6, p. 1428013, 2024

work page 2024

[9] [9]

Let warnings interrupt the interaction and explain: designing and evaluating phishing email warnings,

P. Buono, G. Desolda, F. Greco, and A. Piccinno, “Let warnings interrupt the interaction and explain: designing and evaluating phishing email warnings,” inExtended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems, 2023, pp. 1–6

work page 2023

[10] [10]

Explanations in warning dialogs to help users defend against phishing attacks,

G. Desolda, J. Aneke, C. Ardito, R. Lanzilotti, and M. F. Costabile, “Explanations in warning dialogs to help users defend against phishing attacks,”International Journal of Human-Computer Studies, vol. 176, p. 103056, 2023

work page 2023

[11] [11]

Understanding by addressees and overhearers,

M. F. Schober and H. H. Clark, “Understanding by addressees and overhearers,”Cognitive psychology, vol. 21, no. 2, pp. 211–232, 1989

work page 1989

[12] [12]

Syntactic co- ordination in dialogue,

H. P. Branigan, M. J. Pickering, and A. A. Cleland, “Syntactic co- ordination in dialogue,”Cognition, vol. 75, no. 2, pp. B13–B25, 2000

work page 2000

[13] [13]

Joint action, interactive alignment, and dialog,

S. Garrod and M. J. Pickering, “Joint action, interactive alignment, and dialog,”Topics in Cognitive Science, vol. 1, no. 2, pp. 292–304, 2009

work page 2009

[14] [14]

Lovefraud02,

P. Faber, “Lovefraud02,” 2024. [Online]. Available: https://data. mendeley.com/datasets/kmhvb4x5d8/1

work page 2024

[15] [15]

Mark my words! linguistic style accommodation in social media,

C. Danescu-Niculescu-Mizil, M. Gamon, and S. Dumais, “Mark my words! linguistic style accommodation in social media,” inProceedings of the 20th international conference on World wide web, 2011, pp. 745– 754

work page 2011

[16] [16]

H. H. Clark,Using language. Cambridge university press, 1996

work page 1996

[17] [17]

Llm- based class diagram derivation from user stories with chain-of-thought promptings,

Y . Li, J. Keung, X. Ma, C. Y . Chong, J. Zhang, and Y . Liao, “Llm- based class diagram derivation from user stories with chain-of-thought promptings,” in2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC). IEEE, 2024, pp. 45–50

work page 2024