pith. sign in

Lin,et al., Mitigating the alignment tax of rlhf, inProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing(2024), pp

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

fields

cs.CL 1

years

2026 1

verdicts

UNVERDICTED 1

representative citing papers

Post-training makes large language models less human-like

cs.CL · 2026-05-08 · unverdicted · novelty 6.0

Post-training reduces LLMs' behavioral alignment with humans across families and sizes, with the misalignment increasing in newer generations while persona induction fails to improve individual-level predictions.

citing papers explorer

Showing 1 of 1 citing paper.

  • Post-training makes large language models less human-like cs.CL · 2026-05-08 · unverdicted · none · ref 44

    Post-training reduces LLMs' behavioral alignment with humans across families and sizes, with the misalignment increasing in newer generations while persona induction fails to improve individual-level predictions.