Multi-property Steering of Large Language Models with Dynamic Activation Composition

Scalena, Daniel, Sarti, Gabriele, Nissim, Malvina · 2024 · DOI 10.18653/v1/2024.blackboxnlp-1.34

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open at publisher browse 2 citing papers

representative citing papers

Predicting Future Behaviors in Reasoning Models Enables Better Steering

cs.LG · 2026-06-09 · unverdicted · novelty 7.0

Probes predicting future behaviors from intermediate steps enable Future Probe Controlled Generation for steering large reasoning models with minimal quality degradation.

Adversarial Robustness of Activation Steering in Large Language Models

cs.LG · 2026-06-05 · unverdicted · novelty 7.0

First systematic test shows activation steering robustness drops sharply (up to 64%) under adversarial input perturbations across multiple extraction methods, models, and personas.

citing papers explorer

Showing 2 of 2 citing papers.

Predicting Future Behaviors in Reasoning Models Enables Better Steering cs.LG · 2026-06-09 · unverdicted · none · ref 16
Probes predicting future behaviors from intermediate steps enable Future Probe Controlled Generation for steering large reasoning models with minimal quality degradation.
Adversarial Robustness of Activation Steering in Large Language Models cs.LG · 2026-06-05 · unverdicted · none · ref 35
First systematic test shows activation steering robustness drops sharply (up to 64%) under adversarial input perturbations across multiple extraction methods, models, and personas.

Multi-property Steering of Large Language Models with Dynamic Activation Composition

fields

years

verdicts

representative citing papers

citing papers explorer