Controlling large language model agents with entropic activation steering.arXiv preprint arXiv:2406.00244, 2024

Nate Rahn, Pierluca D’Oro, Marc G Bellemare · 2024 · arXiv 2406.00244

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

Activation Steering for Synthetic Data Generation: The Role of Diversity in Downstream Safety Detection

cs.LG · 2026-05-27 · unverdicted · novelty 6.0

Activation steering produces synthetic safety-violating data that improves downstream classifiers over prompting on most tested concepts when a harmonic mean of alignment, coherence, and diversity is optimized.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Activation Steering for Synthetic Data Generation: The Role of Diversity in Downstream Safety Detection cs.LG · 2026-05-27 · unverdicted · none · ref 39
Activation steering produces synthetic safety-violating data that improves downstream classifiers over prompting on most tested concepts when a harmonic mean of alignment, coherence, and diversity is optimized.

Controlling large language model agents with entropic activation steering.arXiv preprint arXiv:2406.00244, 2024

fields

years

verdicts

representative citing papers

citing papers explorer