CAP is a reinforcement-learning-driven prompt optimization framework that suppresses target knowledge in LLMs while preserving general capabilities, enabling reversible unlearning without any parameter updates.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3representative citing papers
VLM-AR3L learns absolute and relative reward models from VLM preference labels to improve RL on control, manipulation, and Minecraft tasks.
BlendIn replaces binary guidance acceptance with confidence-weighted distribution blending between base and guidance models, mitigating cascading failures in inference-time LLM alignment.
citing papers explorer
-
CAP: Controllable Alignment Prompting for Unlearning in LLMs
CAP is a reinforcement-learning-driven prompt optimization framework that suppresses target knowledge in LLMs while preserving general capabilities, enabling reversible unlearning without any parameter updates.
-
VLM-AR3L: Vision-Language Models for Absolute and Relative Rewards in Reinforcement Learning
VLM-AR3L learns absolute and relative reward models from VLM preference labels to improve RL on control, manipulation, and Minecraft tasks.
-
To Intervene or Not: Guiding Inference-time Alignment with Probabilistic Model Blending
BlendIn replaces binary guidance acceptance with confidence-weighted distribution blending between base and guidance models, mitigating cascading failures in inference-time LLM alignment.