pith. sign in

Open Character Training : Shaping the Persona of AI Assistants through Constitutional AI , November 2025

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

citation-role summary

background 1

citation-polarity summary

years

2026 10

verdicts

UNVERDICTED 10

roles

background 1

polarities

unclear 1

clear filters

representative citing papers

Understanding Goal Generalisation in Sequential Reinforcement Learning

cs.LG · 2026-05-22 · unverdicted · novelty 6.0

Empirical analysis of over 100 sequential RL training pipelines across 250+ OOD environments finds salient features drive generalization and early goals persist, with latent policy gradients simulating latent variable evolution to predict OOD behavior from training history.

Probing Persona-Dependent Preferences in Language Models

cs.CL · 2026-05-13 · unverdicted · novelty 6.0

Linear probes on residual-stream activations identify a shared preference vector in LLMs that tracks choices across prompts and causally steers decisions even for anti-correlated personas.

When Role-playing, Do Models Believe What They Say?

cs.CL · 2026-06-09 · unverdicted · novelty 5.0

Different persona induction methods produce a spectrum of belief internalization: prompting, ICL and SFT mainly alter outputs while Emergent Misalignment produces large representational shifts and Open Character Training produces smaller ones clearest in larger models.

citing papers explorer

Showing 10 of 10 citing papers after filters.