Control-theoretic guardrails enable proactive correction of risky LLM agent actions in latent space, preventing catastrophes like collisions or bankruptcy while preserving task performance in simulated environments.
Measuring and mitigating unintended bias in text classification,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
From Refusal to Recovery: A Control-Theoretic Approach to Generative AI Guardrails
Control-theoretic guardrails enable proactive correction of risky LLM agent actions in latent space, preventing catastrophes like collisions or bankruptcy while preserving task performance in simulated environments.