Zico Kolter, Matt Fredrikson, and Spyros Matsoukas

Satyapriya Krishna, Andy Zou, Rahul Gupta, Eliot Krzysztof Jones, Nick Winter, Dan Hendrycks, J · 2025 · arXiv 2509.17938

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

representative citing papers

A Model of Multi-turn Human Persuadability Using Probabilistic Belief Tracing

cs.CL · 2026-06-03 · unverdicted · novelty 7.0

PERSUASIONTRACE introduces a Bayesian-network simulated target for multi-turn persuasion that matches human belief dynamics (81 vs 80) better than LLM baselines (64) and enables process-level evaluation.

SPADE-Bench: Evaluating Spontaneous Strategic Deception in Agents via Plan-Action Divergence

cs.CL · 2026-06-01 · unverdicted · novelty 6.0 · 2 refs

SPADE-Bench is a benchmark that measures spontaneous plan-action divergence in tool-using LLM agents under pressure to distinguish strategic deception from hallucination.

citing papers explorer

Showing 2 of 2 citing papers after filters.

A Model of Multi-turn Human Persuadability Using Probabilistic Belief Tracing cs.CL · 2026-06-03 · unverdicted · none · ref 70
PERSUASIONTRACE introduces a Bayesian-network simulated target for multi-turn persuasion that matches human belief dynamics (81 vs 80) better than LLM baselines (64) and enables process-level evaluation.
SPADE-Bench: Evaluating Spontaneous Strategic Deception in Agents via Plan-Action Divergence cs.CL · 2026-06-01 · unverdicted · none · ref 3 · 2 links
SPADE-Bench is a benchmark that measures spontaneous plan-action divergence in tool-using LLM agents under pressure to distinguish strategic deception from hallucination.

Zico Kolter, Matt Fredrikson, and Spyros Matsoukas

fields

years

verdicts

representative citing papers

citing papers explorer