arXiv:1912.01683v10 [cs.AI] (2019)

Turner, A · 1912 · arXiv 1912.01683

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

Consistency Training while Mitigating Obfuscation via Rate Matching

cs.CL · 2026-06-01 · unverdicted · novelty 6.0

RMCT matches the rate of target behaviors like bias-following across input perturbations to reduce sycophancy in LLMs while preserving verbalization of bias cues.

Reframing AGI Confrontation with Off Earth Autonomy

cs.CY · 2026-06-18 · unverdicted · novelty 4.0

An off-Earth autonomy pathway can reduce AGI confrontation incentives by making early cooperation preferable to power-seeking on Earth.

Position: Anthropomorphic Misalignment Research Needs Stronger Evidence

cs.CY · 2026-05-29 · unverdicted · novelty 3.0

Position paper calling for stronger evidentiary standards and a diagnostic checklist in anthropomorphic misalignment research.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Consistency Training while Mitigating Obfuscation via Rate Matching cs.CL · 2026-06-01 · unverdicted · none · ref 139
RMCT matches the rate of target behaviors like bias-following across input perturbations to reduce sycophancy in LLMs while preserving verbalization of bias cues.

arXiv:1912.01683v10 [cs.AI] (2019)

fields

years

verdicts

representative citing papers

citing papers explorer