GenAC introduces generative critics with chain-of-thought reasoning and in-context conditioning to improve value approximation and downstream RL performance in LLMs compared to value-based and value-free baselines.
Brite: Bootstrapping reinforced thinking process to enhance language model reasoning
2 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
R&B-EnCoRe uses self-supervised importance-weighted variational inference to distill action-predictive reasoning datasets that improve VLA performance on manipulation, navigation, and driving tasks without external verifiers.
citing papers explorer
-
Bringing Value Models Back: Generative Critics for Value Modeling in LLM Reinforcement Learning
GenAC introduces generative critics with chain-of-thought reasoning and in-context conditioning to improve value approximation and downstream RL performance in LLMs compared to value-based and value-free baselines.
-
Self-Supervised Bootstrapping of Action-Predictive Embodied Reasoning
R&B-EnCoRe uses self-supervised importance-weighted variational inference to distill action-predictive reasoning datasets that improve VLA performance on manipulation, navigation, and driving tasks without external verifiers.