Language models transmit behavioural traits through hidden signals in data.Nature, 652:615–621

Alex Cloud, Minh Le, James Chua, Jan Betley, Anna Sztyber-Betley, S¨ oren Mindermann, Jacob Hilton, Samuel Marks, Owain Evans · 2026 · DOI 10.1038/s41586-026-10319-8

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

open at publisher browse 4 citing papers

representative citing papers

Rethinking Psychometric Evaluation of LLMs: When and Why Self-Reports Predict Behavior

cs.AI · 2026-06-10 · unverdicted · novelty 6.0

LLM self-reports predict behavior selectively: TPB reaches human-level coherence within shared conversations but collapses across sessions for primed behaviors, unlike Big 5, with persona prompting stabilizing reports but not actions.

Advancing the State-of-the-Art in Empirical Privacy Auditing

cs.LG · 2026-06-09 · unverdicted · novelty 6.0

Proposes high-temperature synthetic canaries and auxiliary-model auditing to improve empirical privacy measurement for LLM fine-tuning and synthetic data generation.

Asking Back: Interaction-Layer Antidistillation Watermarks

cs.CR · 2026-05-15 · unverdicted · novelty 6.0

Interaction-layer antidistillation watermarks use system-prompt-induced behavioral markers like explicit follow-up questions that transfer to distilled student models at 45-89% relative fidelity and can be audited via black-box LLM-as-judge queries.

A Negative Result on Cross-Model Activation Transfer in a Pythia Multi-Hop Setting

cs.AI · 2026-06-02 · unverdicted · novelty 4.0

A learned linear activation bridge achieves high alignment (cosine ~0.97) between Pythia-160M and Pythia-410M states but produces no improvement in downstream multi-hop answering when injected into the receiver.

citing papers explorer

Showing 4 of 4 citing papers.

Rethinking Psychometric Evaluation of LLMs: When and Why Self-Reports Predict Behavior cs.AI · 2026-06-10 · unverdicted · none · ref 9
LLM self-reports predict behavior selectively: TPB reaches human-level coherence within shared conversations but collapses across sessions for primed behaviors, unlike Big 5, with persona prompting stabilizing reports but not actions.
Advancing the State-of-the-Art in Empirical Privacy Auditing cs.LG · 2026-06-09 · unverdicted · none · ref 55
Proposes high-temperature synthetic canaries and auxiliary-model auditing to improve empirical privacy measurement for LLM fine-tuning and synthetic data generation.
Asking Back: Interaction-Layer Antidistillation Watermarks cs.CR · 2026-05-15 · unverdicted · none · ref 5
Interaction-layer antidistillation watermarks use system-prompt-induced behavioral markers like explicit follow-up questions that transfer to distilled student models at 45-89% relative fidelity and can be audited via black-box LLM-as-judge queries.
A Negative Result on Cross-Model Activation Transfer in a Pythia Multi-Hop Setting cs.AI · 2026-06-02 · unverdicted · none · ref 5
A learned linear activation bridge achieves high alignment (cosine ~0.97) between Pythia-160M and Pythia-410M states but produces no improvement in downstream multi-hop answering when injected into the receiver.

Language models transmit behavioural traits through hidden signals in data.Nature, 652:615–621

fields

years

verdicts

representative citing papers

citing papers explorer