ActFocus resolves the action bottleneck in agentic RL by reweighting token gradients toward action tokens using observed reward variance and an energy-based uncertainty term, outperforming PPO and GRPO by up to 65 percentage points.
Energy-based out-of-distribution detection.Advances in neural information processing systems, 33:21464–21475
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 3roles
baseline 1polarities
baseline 1representative citing papers
MahaVar augments the Mahalanobis OOD score with class-wise distance variance, which is theoretically higher for in-distribution samples under relaxed Neural Collapse geometry.
TINS improves OOD detection by learning negative semantics at test time with ID-prototype separation, cutting average FPR95 from 14.04% to 6.72% on the Four-OOD benchmark with ImageNet-1K.
citing papers explorer
-
Resolving Action Bottleneck: Agentic Reinforcement Learning Informed by Token-Level Energy
ActFocus resolves the action bottleneck in agentic RL by reweighting token gradients toward action tokens using observed reward variance and an energy-based uncertainty term, outperforming PPO and GRPO by up to 65 percentage points.
-
MahaVar: OOD Detection via Class-wise Mahalanobis Distance Variance under Neural Collapse
MahaVar augments the Mahalanobis OOD score with class-wise distance variance, which is theoretically higher for in-distribution samples under relaxed Neural Collapse geometry.
-
TINS: Test-time ID-prototype-separated Negative Semantics Learning for OOD Detection
TINS improves OOD detection by learning negative semantics at test time with ID-prototype separation, cutting average FPR95 from 14.04% to 6.72% on the Four-OOD benchmark with ImageNet-1K.