arXiv preprint arXiv:2406.00244 , year =

· 2024 · arXiv 2406.00244

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

On The Effectiveness-Fluency Trade-Off In LLM Conditioning: A Systematic Study

cs.CL · 2026-06-10 · unverdicted · novelty 6.0

Systematic experiments reveal that activation steering trades fluency for concept control, is less effective on instruction-tuned models, and that prompting/SFT excel at injection but not removal, with textual metrics correlating to LLM judges.

Activation Steering for Synthetic Data Generation: The Role of Diversity in Downstream Safety Detection

cs.LG · 2026-05-27 · unverdicted · novelty 6.0

Activation steering produces synthetic safety-violating data that improves downstream classifiers over prompting on most tested concepts when a harmonic mean of alignment, coherence, and diversity is optimized.

citing papers explorer

Showing 1 of 1 citing paper after filters.

On The Effectiveness-Fluency Trade-Off In LLM Conditioning: A Systematic Study cs.CL · 2026-06-10 · unverdicted · none · ref 109
Systematic experiments reveal that activation steering trades fluency for concept control, is less effective on instruction-tuned models, and that prompting/SFT excel at injection but not removal, with textual metrics correlating to LLM judges.

arXiv preprint arXiv:2406.00244 , year =

fields

years

verdicts

representative citing papers

citing papers explorer