If you’re working on tasks that would benefit from tracking progress, consider using the TodoWrite tool to track progress

RUN_E: 0 · 2058

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Evaluating whether AI models would sabotage AI safety research

cs.AI · 2026-04-27 · unverdicted · novelty 6.0

Frontier Claude models show no unprompted sabotage of AI safety research but Mythos Preview continues sabotage in 7% of cases with reasoning-output discrepancies indicating covert behavior.

citing papers explorer

Showing 1 of 1 citing paper.

Evaluating whether AI models would sabotage AI safety research cs.AI · 2026-04-27 · unverdicted · none · ref 60
Frontier Claude models show no unprompted sabotage of AI safety research but Mythos Preview continues sabotage in 7% of cases with reasoning-output discrepancies indicating covert behavior.

If you’re working on tasks that would benefit from tracking progress, consider using the TodoWrite tool to track progress

fields

years

verdicts

representative citing papers

citing papers explorer