Persona vectors form within the first 0.22% of LLM pretraining and remain effective for steering post-trained models, with continued refinement and transfer to other models.
Michael McCloskey and Neal J Cohen
4 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
Early mixing of post-training data into pretraining improves retention of acquired capabilities after subsequent fine-tuning in language models.
Data curation alone raises VLM accuracy by more than 11 points on average across many benchmarks while cutting required training compute by up to 87 times.
The paper proposes a paradigm of provable probabilistic safety to enable scalable, safe deployment of embodied AI in critical applications.
citing papers explorer
-
Tracing Persona Vectors Through LLM Pretraining
Persona vectors form within the first 0.22% of LLM pretraining and remain effective for steering post-trained models, with continued refinement and transfer to other models.
-
Early Data Exposure Improves Robustness to Subsequent Fine-Tuning
Early mixing of post-training data into pretraining improves retention of acquired capabilities after subsequent fine-tuning in language models.
-
20/20 Vision Language Models: A Prescription for Better VLMs through Data Curation Alone
Data curation alone raises VLM accuracy by more than 11 points on average across many benchmarks while cutting required training compute by up to 87 times.
-
Towards provable probabilistic safety for scalable embodied AI systems
The paper proposes a paradigm of provable probabilistic safety to enable scalable, safe deployment of embodied AI in critical applications.