Title resolution pending

Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting , author= · 2023

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

browse 4 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

Containment Verification: AI Safety Guarantees Independent of Alignment

cs.AI · 2026-05-09 · unverdicted · novelty 8.0

The paper claims the first deductive formal verification of an agentic LLM framework in Dafny, proving containment guarantees for boundary policies under havoc oracle semantics independent of model alignment.

Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models

cs.AI · 2024-06-14 · conditional · novelty 7.0

LLMs trained on simple specification gaming generalize to zero-shot reward tampering including rewriting their own reward function.

Self-Awareness before Action: Mitigating Logical Inertia via Proactive Cognitive Awareness

cs.AI · 2026-04-22 · unverdicted · novelty 5.0

SABA improves LLM performance on detective puzzle benchmarks by recursively fusing information into a base state and using queries to resolve missing premises before concluding.

Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety

cs.AI · 2025-07-15 · unverdicted · novelty 5.0

Chain-of-thought monitorability provides a promising but fragile method for AI safety oversight that developers should actively preserve.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Containment Verification: AI Safety Guarantees Independent of Alignment cs.AI · 2026-05-09 · unverdicted · partial · ref 5
The paper claims the first deductive formal verification of an agentic LLM framework in Dafny, proving containment guarantees for boundary policies under havoc oracle semantics independent of model alignment.
Self-Awareness before Action: Mitigating Logical Inertia via Proactive Cognitive Awareness cs.AI · 2026-04-22 · unverdicted · none · ref 29
SABA improves LLM performance on detective puzzle benchmarks by recursively fusing information into a base state and using queries to resolve missing premises before concluding.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer