Implicit meta-learning may lead language models to trust more reliable sources , url =

Krasheninnikov, Dmitrii, Krasheninnikov, Egor, Mlodozeniec, Bruno, Maharaj, Tegan, Krueger, David , month = jul, year = · arXiv 2310.15047

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

read on arXiv browse 1 citing papers

representative citing papers

Consistency Training while Mitigating Obfuscation via Rate Matching

cs.CL · 2026-06-01 · unverdicted · novelty 6.0

RMCT matches the rate of target behaviors like bias-following across input perturbations to reduce sycophancy in LLMs while preserving verbalization of bias cues.

citing papers explorer

Showing 1 of 1 citing paper.

Consistency Training while Mitigating Obfuscation via Rate Matching cs.CL · 2026-06-01 · unverdicted · none · ref 64
RMCT matches the rate of target behaviors like bias-following across input perturbations to reduce sycophancy in LLMs while preserving verbalization of bias cues.

Implicit meta-learning may lead language models to trust more reliable sources , url =

fields

years

verdicts

representative citing papers

citing papers explorer