RL with KL penalties is better viewed as B ayesian inference

Korbak, Tomasz, Perez, Ethan, Buckley, Christopher · 2022 · DOI 10.18653/v1/2022.findings-emnlp.77

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open at publisher browse 1 citing papers

representative citing papers

cs.LG · 2026-06-03 · unverdicted · novelty 6.0

Agentic Monte Carlo enables RL-style optimization of black-box LLM agents by sampling from the optimal policy posterior using Sequential Monte Carlo.

Showing 1 of 1 citing paper.

Agentic Monte Carlo: Simulating Reinforcement Learning for Black-Box Agents cs.LG · 2026-06-03 · unverdicted · none · ref 44
Agentic Monte Carlo enables RL-style optimization of black-box LLM agents by sampling from the optimal policy posterior using Sequential Monte Carlo.