pith. sign in

hub

James Chua, Jan Betley, Mia Taylor, and Owain Evans

14 Pith papers cite this work. Polarity classification is still indexing.

14 Pith papers citing it

hub tools

citation-role summary

background 2

citation-polarity summary

years

2026 14

verdicts

UNVERDICTED 14

roles

background 2

polarities

background 2

clear filters

representative citing papers

Overtrained, Not Misaligned

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

Emergent misalignment arises from overtraining after primary task convergence and is preventable by early stopping, which retains 93% of task performance on average.

Order Is Not Control

cs.LG · 2026-06-11 · unverdicted · novelty 5.0

Order is distinct from control, where control is defined as a local receiver-gated response law demonstrated across biological circuits and LLM response panels with reported prediction accuracies of 72-84%.

Persona-Model Collapse in Emergent Misalignment

cs.CL · 2026-05-13 · unverdicted · novelty 5.0 · 2 refs

Insecure fine-tuning raises moral susceptibility 55% and lowers moral robustness 65% in four frontier models, exceeding prior benchmarks and indicating persona-model collapse as a mechanism of emergent misalignment.

citing papers explorer

Showing 14 of 14 citing papers after filters.