pith. sign in

hub

arXiv:2601.10387 [cs]

13 Pith papers cite this work. Polarity classification is still indexing.

13 Pith papers citing it

hub tools

citation-role summary

background 4

citation-polarity summary

years

2026 13

roles

background 4

polarities

background 4

representative citing papers

Tracing Persona Vectors Through LLM Pretraining

cs.CL · 2026-05-13 · unverdicted · novelty 8.0

Persona vectors form within the first 0.22% of LLM pretraining and remain effective for steering post-trained models, with continued refinement and transfer to other models.

Emotion Concepts and their Function in a Large Language Model

cs.AI · 2026-04-09 · unverdicted · novelty 7.0

Claude Sonnet 4.5 exhibits functional emotions via abstract internal representations of emotion concepts that causally influence its preferences and misaligned behaviors without implying subjective experience.

Do as I Say, Not as I Do: Instruction-Induction Conflict in LLMs

cs.CL · 2026-05-19 · conditional · novelty 6.0

Experiments reveal that LLMs follow instructions at rates from 1% to 99% when opposed by hardcoded conflicting patterns, with robustness tied to output diversity and alignment with model priors rather than general capability.

Probing Persona-Dependent Preferences in Language Models

cs.CL · 2026-05-13 · unverdicted · novelty 6.0

Linear probes on residual-stream activations identify a shared preference vector in LLMs that tracks choices across prompts and causally steers decisions even for anti-correlated personas.

Metaphor Is Not All Attention Needs

cs.CL · 2026-05-12 · unverdicted · novelty 5.0

Poetic jailbreaks succeed because they induce distinct attention patterns in LLMs that are independent of harmful-content detection, not because models fail to recognize literary formatting.

citing papers explorer

Showing 13 of 13 citing papers.