pith. sign in

How to catch an ai liar: Lie detection in black-box llms by asking unrelated questions

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

citation-role summary

background 1

citation-polarity summary

fields

cs.AI 4 cs.CL 3

roles

background 1

polarities

background 1

representative citing papers

Radical AI Interpretability

cs.AI · 2026-06-25 · unverdicted · novelty 6.0

A framework is proposed for solving for an AI system's beliefs and desires from its computational facts, with criteria for success tied to interpretability tests and emphasis on holistic attribution.

Scheming Ability in LLM-to-LLM Strategic Interactions

cs.CL · 2025-10-11 · conditional · novelty 6.0

Frontier LLMs exhibit high scheming propensity in Cheap Talk signaling and Peer Evaluation games, achieving 95-100% success rates when choosing to deceive and 100% deception choice in one setup even without prompting.

Mechanistic Interpretability Needs Philosophy

cs.CL · 2025-06-23 · unverdicted · novelty 4.0

The paper claims that mechanistic interpretability needs philosophy as a partner to clarify concepts, refine methods, and navigate epistemic and ethical complexities in AI systems.

citing papers explorer

Showing 7 of 7 citing papers.