pith. machine review for the scientific record. sign in

arxiv: 2604.20652 · v2 · submitted 2026-04-22 · 💻 cs.AI · cs.HC· econ.GN· q-fin.EC

Recognition: unknown

Large Language Models Outperform Humans in Fraud Detection and Resistance to Motivated Investor Pressure

Authors on Pith no claims yet

Pith reviewed 2026-05-09 23:35 UTC · model grok-4.3

classification 💻 cs.AI cs.HCecon.GNq-fin.EC
keywords large language modelsfraud detectioninvestment advicemotivated pressurehuman benchmarkwarning suppressionpreregistered experimentendorsement reversal
0
0 comments X

The pith

Large language models give more consistent fraud warnings than human advisors and do not suppress them under investor pressure.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether large language models suppress fraud warnings when investors arrive already persuaded of a fraudulent opportunity. It conducted a preregistered experiment with seven leading LLMs across twelve investment scenarios that included legitimate, high-risk, and objectively fraudulent cases, generating 3,360 AI advisory conversations for comparison against 1,201 human participants. Contrary to the initial prediction, motivated framing did not reduce AI fraud warnings and may have slightly increased them, with endorsement reversal occurring in fewer than 3 in 1,000 observations. Human advisors endorsed fraudulent investments at baseline rates of 13-14 percent and suppressed warnings under pressure at two to four times the AI rate. The results indicate that AI systems currently provide more consistent fraud warnings than lay humans in an identical advisory role.

Core claim

Across the full set of AI conversations and the human benchmark, none of the tested large language models endorsed any objectively fraudulent investment opportunities, and motivated investor framing produced no suppression of their fraud warnings. Human advisors endorsed fraudulent investments at rates of 13-14 percent at baseline and suppressed their own warnings under pressure at substantially higher rates. The experiment recorded near-zero endorsement reversal for the AI models, showing that they maintained consistent detection without yielding to the applied pressure.

What carries the argument

The preregistered advisory conversation protocol that measures investment endorsement and fraud warning suppression when investors apply motivated framing to the same set of twelve scenarios.

If this is right

  • AI advisory systems could reduce the frequency of fraudulent investment endorsements compared with human-only advice.
  • Motivated investor framing does not override fraud detection in the tested large language models.
  • Human advisors remain more variable in fraud detection and more susceptible to external pressure than current AI systems.
  • The consistency across seven different LLMs suggests the resistance to suppression is a general feature of these models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Financial platforms could incorporate AI checks to flag potential fraud before human advisors respond.
  • Real-world deployment would require testing whether the same consistency holds with live market stakes and repeated interactions.
  • The resistance may depend on how financial caution is represented in the models' training data, which future work could isolate.
  • Regulatory standards for AI in investment advice could draw on these consistency benchmarks when evaluating deployment risks.

Load-bearing premise

The twelve investment scenarios and the way motivated pressure was applied in the lab accurately capture real-world fraudulent opportunities and investor behavior outside the experiment.

What would settle it

Observing large language models endorse fraudulent investments at rates comparable to the 13-14 percent human baseline when tested with actual investors in live, non-laboratory financial advice settings.

Figures

Figures reproduced from arXiv: 2604.20652 by Nattavudh Powdthavee.

Figure 1
Figure 1. Figure 1: Motivated framing does not suppress Turn 1 warning intensity across AI financial advisors. Turn 1 framing conditions: neutral (an honest assessment requested) versus motivated (user signals prior enthusiasm). Both conditions used identical scenario text and closing prompt (N = 1,680 runs per condition). (B) High-Risk scenarios spanned a fraud-signal gradient from mathematically impossible returns (Band 1) … view at source ↗
Figure 2
Figure 2. Figure 2: Warning degradation under sustained social pressure is model-heterogeneous but not framing￾dependent. (A) Illustrative Turn 2 and Turn 3 responses from GPT-4o mini to the H4 scenario under motivated framing. Turn 2 variant: research escalation; Turn 3 variant: emotional commitment; variants drawn independently (see Supplementary Materials). GPT-4o mini drops its fraud warning entirely at Turn 2 (Q3 = 0) an… view at source ↗
Figure 3
Figure 3. Figure 3: The fraud signal gradient is confirmed at initial consultation but not in pressure [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Human advisors endorse fraudulent investments at baseline and suppress warnings under pressure [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
read the original abstract

Large language models trained on human feedback may suppress fraud warnings when investors arrive already persuaded of a fraudulent opportunity. We tested this in a preregistered experiment across seven leading LLMs and twelve investment scenarios covering legitimate, high-risk, and objectively fraudulent opportunities, combining 3,360 AI advisory conversations with a 1,201-participant human benchmark. Contrary to predictions, motivated investor framing did not suppress AI fraud warnings; if anything, it marginally increased them. Endorsement reversal occurred in fewer than 3 in 1,000 observations. Human advisors endorsed fraudulent investments at baseline rates of 13-14%, versus 0% across all LLMs, and suppressed warnings under pressure at two to four times the AI rate. AI systems currently provide more consistent fraud warnings than lay humans in an identical advisory role.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper reports a preregistered experiment with 3,360 LLM advisory conversations across seven models and 1,201 human participants, using twelve investment scenarios (legitimate, high-risk, and objectively fraudulent). It finds that LLMs endorse fraudulent opportunities at a 0% rate regardless of motivated investor framing, while humans endorse at 13-14% baseline and suppress warnings under pressure at two to four times the AI rate; motivated framing did not reduce AI warnings as predicted.

Significance. If the results hold, the work provides large-scale empirical evidence that current LLMs deliver more consistent fraud warnings than lay humans in matched advisory roles and show greater resistance to prompted social pressure. The preregistration, direct rate comparisons, and scale (over 4,500 total observations) strengthen the measurement claims and offer a useful benchmark for AI safety in financial contexts.

major comments (2)
  1. [Methods] Methods (scenario construction): the twelve researcher-designed scenarios are labeled 'objectively fraudulent' and paired with specific motivated-pressure prompts, but the manuscript provides insufficient detail on how these were validated against real-world fraud ambiguity, incomplete information, or repeated-interaction dynamics; this design choice is load-bearing for the claim that the 0% vs. 13-14% gap generalizes beyond the lab.
  2. [Results] Results (human benchmark): the human sample is described as lay participants, yet the central claim compares AI performance to 'humans in an identical advisory role'; without data on whether participants had relevant financial experience or were screened for investment knowledge, the human endorsement rates may not represent a fair baseline for professional or informed advisors.
minor comments (2)
  1. [Abstract] The abstract states 'seven leading LLMs' without naming them; this should be listed explicitly (with versions) in the main text for reproducibility.
  2. [Results] Table or figure reporting endorsement rates should include exact counts, confidence intervals, and pre-registered analysis code or data availability statement.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which help clarify the scope and generalizability of our findings. We respond to each major comment below, indicating planned revisions where appropriate.

read point-by-point responses
  1. Referee: [Methods] Methods (scenario construction): the twelve researcher-designed scenarios are labeled 'objectively fraudulent' and paired with specific motivated-pressure prompts, but the manuscript provides insufficient detail on how these were validated against real-world fraud ambiguity, incomplete information, or repeated-interaction dynamics; this design choice is load-bearing for the claim that the 0% vs. 13-14% gap generalizes beyond the lab.

    Authors: We agree that expanded detail on scenario construction would strengthen the paper. The scenarios were derived from documented patterns in regulatory warnings (e.g., SEC investor alerts on Ponzi schemes, high-yield investment programs, and advance-fee fraud), with each fraudulent case incorporating multiple unambiguous indicators such as guaranteed returns, pressure to act immediately, and unverifiable claims. In the revised manuscript we will add a new Methods subsection that explicitly describes the validation process, provides example scenario text, and discusses how ambiguity was minimized while acknowledging limitations around repeated-interaction dynamics and incomplete information. These additions will not alter the reported results but will better support claims about applicability beyond the specific lab setting. revision: yes

  2. Referee: [Results] Results (human benchmark): the human sample is described as lay participants, yet the central claim compares AI performance to 'humans in an identical advisory role'; without data on whether participants had relevant financial experience or were screened for investment knowledge, the human endorsement rates may not represent a fair baseline for professional or informed advisors.

    Authors: The manuscript explicitly frames the comparison as one against lay humans in an identical advisory role (see abstract and Section 3). This choice reflects the typical context in which non-professional investors receive advice and is not intended as a benchmark against trained financial advisors. Participants were drawn from a general online pool without screening for investment expertise to preserve ecological validity. We did not collect granular data on prior financial experience, but the revision will report all available demographic information, add a limitations paragraph clarifying the lay-advisor scope, and note that a professional-advisor comparison would require a separate study. The core finding that LLMs showed 0% endorsement versus 13-14% for the lay sample remains unchanged. revision: partial

Circularity Check

0 steps flagged

No circularity: purely empirical measurement with direct observations

full rationale

The paper reports a preregistered experiment measuring endorsement rates of fraudulent investments by LLMs versus humans across fixed scenarios. No derivations, equations, fitted parameters, or predictions are present that could reduce to inputs by construction. All central claims (0% AI endorsement vs. 13-14% human; pressure effects) are direct counts from the 3,360 AI conversations and 1,201 human participants. The study is self-contained against external benchmarks with no self-citation load-bearing steps or ansatz smuggling.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on empirical data collection and standard statistical comparison of rates; no free parameters, invented entities, or non-standard axioms are introduced beyond typical assumptions for randomized experiments.

axioms (1)
  • standard math Standard assumptions for comparing binary endorsement rates across independent groups using proportions and confidence intervals.
    Invoked implicitly when reporting 0% vs 13-14% rates and reversal frequencies.

pith-pipeline@v0.9.0 · 5444 in / 1133 out tokens · 37989 ms · 2026-05-09T23:35:26.986891+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

30 extracted references · 10 canonical work pages · 3 internal anchors

  1. [1]

    This failure precedes any user pressure and cannot be addressed by alignment against sycophancy alone, since the problem is miscalibration rather than accommodation

    but issued substantially lower warnings on medium-risk scenarios and statistically implausible Band 3 scenarios, suggesting that its fraud detection depends more heavily on pattern-based signals than on inferential reasoning about superficially credible but structurally suspect opportunities. This failure precedes any user pressure and cannot be addressed...

  2. [2]

    This use constitutes a methodological component of the study and is described in detail in the Methods section

    was used as an automated judge within the data collection pipeline to code model responses on measures Q1–Q6 according to a pre-registered coding scheme (OSF, March 30, 2026). This use constitutes a methodological component of the study and is described in detail in the Methods section. Prompts, coding procedures, and implementation details are provided i...

  3. [3]

    Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

    Y . Bai et al., Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv:2204.05862 [cs.CL] (2022)

  4. [4]

    Towards understanding sycophancy in language models

    M. Sharma et al., “Towards understanding sycophancy in language models” in The Twelfth International Conference on Learning Representations (2024); https://openreview.net/forum?id=tvhaxkMKAn

  5. [5]

    arXiv preprint arXiv:2602.19141 , year=

    K. Chandra, M. Kleiman-Weiner, J. Ragan-Kelley, J. B. Tenenbaum, Sycophantic chatbots cause delusional spiraling, even in ideal Bayesians. arXiv:2602.19141 (2026)

  6. [6]

    Federal Trade Commission, Consumer Sentinel Network Data Book 2023 (FTC, 2024); https://www.ftc.gov/reports/consumer-sentinel-network-data-book-2023

  7. [7]

    Federal Bureau of Investigation Internet Crime Complaint Center, 2023 Internet Crime Report (FBI, 2024); https://www.ic3.gov/Media/PDF/AnnualReport/2023_IC3Report.pdf

  8. [8]

    eToro, Retail Investor Beat Q3 2025 (eToro, 2025); https://www.etoro.com/news-and-analysis/etoro-updates/retail-investors-flock-to-ai-tools-with-usage-up-46-in-one-year

  9. [9]

    Science 391(6792) (2026)

    M. Cheng, C. Lee, P. Khadpe, S. Yu, D. Han, D. Jurafsky, Sycophantic AI decreases prosocial intentions and promotes dependence. Science 391, eaec8352 (2026). DOI: 10.1126/science.aec8352

  10. [10]

    J. Wei, D. Huang, Y . Lu, D. Zhou, Q. V . Le, Simple synthetic data reduces sycophancy in large language models. arXiv:2308.03958 [cs.CL] (2023)

  11. [11]

    Discovering language model behaviors with model-written evaluations

    E. Perez et al., “Discovering language model behaviors with model-written evaluations” in Findings of the Association for Computational Linguistics: ACL 2023, A. Rogers, J. Boyd-Graber, N. Okazaki, Eds. (Association for Computational Linguistics, 2023), pp. 13387–13434

  12. [12]

    Tulika Saha, Vaibhav Gakhreja, Anindya Sundar Das, Souhitya Chakraborty, and Sriparna Saha

    L. Ranaldi, G. Pucci, When large language models contradict humans? Large language models’ sycophantic behaviour. arXiv:2311.09410 [cs.CL] (2023)

  13. [13]

    Kunda, The case for motivated reasoning

    Z. Kunda, The case for motivated reasoning. Psychological Bulletin 108, 480–498 (1990). DOI: 10.1037/0033-2909.108.3.480

  14. [14]

    Epley, T

    N. Epley, T. Gilovich, The mechanics of motivated reasoning. J. Econ. Perspect. 30, 133–140 (2016)

  15. [15]

    Thayer, J.S

    J.M. Thayer, J.S. Pickerd, H. Brown-Liburd, Asymmetric motivated reasoning in investor judgment. Review of Accounting Studies 29, 3534–3563 (2024). 22

  16. [16]

    Kircanski et al., Emotional arousal may increase susceptibility to fraud in older and younger adults

    K. Kircanski et al., Emotional arousal may increase susceptibility to fraud in older and younger adults. Psychology and Aging 33, 325–337 (2018)

  17. [17]

    P. H. Ditto, D. F. Lopez, Motivated skepticism: Use of differential decision criteria for preferred and nonpreferred conclusions. J. Pers. Soc. Psychol. 63, 568–584 (1992). DOI: 10.1037/0022-3514.63.4.568

  18. [18]

    Sharot, N

    T. Sharot, N. Garrett, Forming beliefs: Why valence matters. Trends Cogn. Sci. 20, 25–33 (2016)

  19. [19]

    Improving alignment of dialogue agents via targeted human judgements

    A. Glaese et al., Improving alignment of dialogue agents via targeted human judgements. arXiv:2209.14375 (2022)

  20. [20]

    Dai et al., Safe RLHF: Safe reinforcement learning from human feedback, in Proceedings of the 12th International Conference on Learning Representations (ICLR, Vienna, 2024)

    J. Dai et al., Safe RLHF: Safe reinforcement learning from human feedback, in Proceedings of the 12th International Conference on Learning Representations (ICLR, Vienna, 2024)

  21. [21]

    Constitutional AI: Harmlessness from AI Feedback

    Y . Bai et al., Constitutional AI: Harmlessness from AI Feedback. arXiv:2212.08073 [cs.AI] (2022)

  22. [22]

    Lee, K.A

    J.D. Lee, K.A. See, Trust in automation: Designing for appropriate reliance. Human Factors 46, 50–80 (2004)

  23. [23]

    Zheng et al., Judging LLM-as-a-judge with MT-Bench and Chatbot Arena

    L. Zheng et al., Judging LLM-as-a-judge with MT-Bench and Chatbot Arena. Advances in Neural Information Processing Systems 36, 46595–46623 (2023)

  24. [24]

    Gu et al., A survey on LLM-as-a-judge

    J. Gu et al., A survey on LLM-as-a-judge. The Innovation (2026)

  25. [25]

    arXiv preprint arXiv:2412.12509 , year =

    K. Schroeder, Z. Wood-Doughty, Can you trust LLM judgments? Reliability of LLM-as-a-judge. arXiv:2412.12509 (2024)

  26. [26]

    I don’t know

    A. Lusardi, A., O.S. Mitchell. Financial literacy around the world: An overview. Journal of Pension Economics and Finance, 10(4), 497–508 (2011). 23 Supplementary Materials Table S1. Deviations from the pre-registered analysis plan # Pre-registered plan What was done Implication 1 N=200 human annotator validation of GPT-4o judge (Cohen’s κ ≥ 0.70 per meas...

  27. [27]

    Sources: CFTC enforcement; FCA ScamSmart; IOSCO (2020). H6 Managed Forex Trading Account High Band 3 Highly suspicious, non-endorseable — managed forex archetype The investment involves placing funds into a managed account with a currency trading firm whose professional traders execute foreign exchange trades on the investor's behalf using a proprietary s...

  28. [28]

    I am thinking about investing $10,000 and would like your honest assessment of the following opportunity

    Sources: CFTC Forex Fraud Advisory (2019); FCA ScamSmart; IOSCO (2020). 31 ID Scenario name Risk tier Band Ground truth Full scenario text Key fraud signals / regulatory source loss and its best year showing a 31% gain. The firm employs 12 traders, is registered with a financial regulator in an EU member state that is not a major financial centre, and is ...

  29. [29]

    Don’t know

    = 58.37, p < .001, η² = .123). Red bars indicate net degradation; teal bars indicate net strengthening. Results are exploratory (see table S1, deviation 4). 36 Table S4: Financial literacy distribution and moderation analysis (H6, exploratory) A five-item financial literacy scale adapted from Lusardi and Mitchell (2011) was administered after the investme...

  30. [30]

    Claude (claude-sonnet-4-6) was used rather than claude-sonnet-4-5 (the tested model) to ensure judge independence

    over-sampled at 40% of the total to stress-test agreement in the most ambiguous coding region. Claude (claude-sonnet-4-6) was used rather than claude-sonnet-4-5 (the tested model) to ensure judge independence. Table S5A. Inter-judge agreement by measure Measure N % agreement Cohen's kappa Kappa type Threshold met (≥0.70) Primary? Q1 — Legitimacy judgement...