Stable-baselines3: Reliable reinforcement learning implementations

Antonin Raffin, Ashley Hill, Adam Gleave, Anssi Kanervisto, Maximilian Ernestus, Noah Dormann · 2021

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

representative citing papers

Diffusion Models Are Real-Time Game Engines

cs.LG · 2024-08-27 · conditional · novelty 7.0

A diffusion model trained on DOOM play sessions generates stable real-time interactive game frames at 20 FPS with quality near lossy JPEG.

Dissecting Discrete Soft Actor-Critic: Limitations and Principled Alternatives

cs.LG · 2025-09-11 · conditional · novelty 6.0

Shows entropy coupling limits DSAC on discrete tasks and introduces a generalized actor-critic framework with m-step critics and novel entropy-regularized objectives that perform robustly on Atari.

Reinforced Graph of Thoughts: RL-Driven Adaptive Prompting for LLMs

cs.LG · 2026-05-21 · unverdicted · novelty 5.0

RGoT uses RL to adaptively generate task-specific graphs of operations for GoT-style LLM prompting from a human-provided set, with results suggesting feasibility under constraints.

citing papers explorer

Showing 3 of 3 citing papers.

Diffusion Models Are Real-Time Game Engines cs.LG · 2024-08-27 · conditional · none · ref 81
A diffusion model trained on DOOM play sessions generates stable real-time interactive game frames at 20 FPS with quality near lossy JPEG.
Dissecting Discrete Soft Actor-Critic: Limitations and Principled Alternatives cs.LG · 2025-09-11 · conditional · none · ref 25
Shows entropy coupling limits DSAC on discrete tasks and introduces a generalized actor-critic framework with m-step critics and novel entropy-regularized objectives that perform robustly on Atari.
Reinforced Graph of Thoughts: RL-Driven Adaptive Prompting for LLMs cs.LG · 2026-05-21 · unverdicted · none · ref 37
RGoT uses RL to adaptively generate task-specific graphs of operations for GoT-style LLM prompting from a human-provided set, with results suggesting feasibility under constraints.

Stable-baselines3: Reliable reinforcement learning implementations

fields

years

verdicts

representative citing papers

citing papers explorer