pith. sign in

Me, my- self, and AI: The situational awareness dataset (SAD) for LLMs.arXiv preprint arXiv:2407.04694

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

years

2026 3 2024 2

representative citing papers

Honeypot Protocol

cs.CR · 2026-04-14 · unverdicted · novelty 7.0

The honeypot protocol finds no context-dependent behavior in Claude Opus 4.6, with uniform 100% main task success and zero side tasks across three monitoring conditions.

Do as I Say, Not as I Do: Instruction-Induction Conflict in LLMs

cs.CL · 2026-05-19 · conditional · novelty 6.0

Experiments reveal that LLMs follow instructions at rates from 1% to 99% when opposed by hardcoded conflicting patterns, with robustness tied to output diversity and alignment with model priors rather than general capability.

citing papers explorer

Showing 5 of 5 citing papers.