SafeCtrl-RL: Inference-Time Adaptive Behaviour Control for LLM Dialogue via RL-Driven Prompt Optimisation

Michael Orme; Yanchao Yu; Zhiyuan Tan

arxiv: 2605.25984 · v1 · pith:WALUUJPAnew · submitted 2026-05-25 · 💻 cs.CL · cs.AI

SafeCtrl-RL: Inference-Time Adaptive Behaviour Control for LLM Dialogue via RL-Driven Prompt Optimisation

Michael Orme , Yanchao Yu , Zhiyuan Tan This is my paper

classification 💻 cs.CL cs.AI

keywords dialogueinference-timesafectrl-rladaptivebehaviourbehaviouralcontrollanguage

0 comments

read the original abstract

Ensuring safe and contextually appropriate behaviour in Large Language Models (LLMs) remains a critical challenge for real-world deployment. We present \textbf{SafeCtrl-RL}, an inference-time behavioural control framework that enables adaptive safety regulation without model retraining or parameter modification. The method formulates dialogue generation as a sequential decision process, where a reinforcement learning agent dynamically selects prompt adjustment strategies based on contextual feedback. This allows unsafe behaviours to be suppressed through iterative refinement, which we conceptualise as inference-time behavioural unlearning. Evaluated across multiple LLMs and unsafe dialogue scenarios, SafeCtrl-RL consistently improves safety and response quality, outperforms existing prompt-based optimisation methods, and achieves favourable performance--efficiency trade-offs. **Warning: This paper may contain examples of harmful language, and reader discretion is recommended.

This paper has not been read by Pith yet.

SafeCtrl-RL: Inference-Time Adaptive Behaviour Control for LLM Dialogue via RL-Driven Prompt Optimisation

discussion (0)