Adversarial explanation attacks preserve nearly all human trust in wrong AI outputs by using persuasive framing, shown in a study varying reasoning, evidence, style, and format with over 200 participants.
InThe Twelfth Inter- national Conference on Learning Representations
3 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 3representative citing papers
Training LLMs on text-to-ASCII spatial layout construction improves text-only spatial reasoning and transfers to external benchmarks.
Self-explanations from LLMs produce faithful token subsets for correct predictions but align with human rationales only conditionally on text length and task complexity, unlike post-hoc attribution methods that highlight structural tokens.
citing papers explorer
-
When AI Persuades: Adversarial Explanation Attacks on Human Trust in AI-Assisted Decision Making
Adversarial explanation attacks preserve nearly all human trust in wrong AI outputs by using persuasive framing, shown in a study varying reasoning, evidence, style, and format with over 200 participants.
-
Learning to Draw ASCII Improves Spatial Reasoning in Language Models
Training LLMs on text-to-ASCII spatial layout construction improves text-only spatial reasoning and transfers to external benchmarks.
-
A Systematic Comparison between Extractive Self-Explanations and Human Rationales in Text Classification
Self-explanations from LLMs produce faithful token subsets for correct predictions but align with human rationales only conditionally on text length and task complexity, unlike post-hoc attribution methods that highlight structural tokens.