pith. sign in

Open Character Training : Shaping the Persona of AI Assistants through Constitutional AI , November 2025

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

years

2026 4

representative citing papers

Understanding Goal Generalisation in Sequential Reinforcement Learning

cs.LG · 2026-05-22 · unverdicted · novelty 6.0

Empirical analysis of over 100 sequential RL training pipelines across 250+ OOD environments finds salient features drive generalization and early goals persist, with latent policy gradients simulating latent variable evolution to predict OOD behavior from training history.

Probing Persona-Dependent Preferences in Language Models

cs.CL · 2026-05-13 · unverdicted · novelty 6.0

Linear probes on residual-stream activations identify a shared preference vector in LLMs that tracks choices across prompts and causally steers decisions even for anti-correlated personas.

citing papers explorer

Showing 4 of 4 citing papers.