Stable-baselines3: Reliable reinforcement learning implementations.Journal of machine learning research, 22(268):1–8

Antonin Raffin, Ashley Hill, Adam Gleave, Anssi Kanervisto, Maximilian Ernestus, Noah Dormann · 2021

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

browse 4 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

QAP-Router: Tackling Qubit Routing as Dynamic Quadratic Assignment with Reinforcement Learning

quant-ph · 2026-05-12 · unverdicted · novelty 7.0

QAP-Router models qubit routing as dynamic QAP and applies RL with a solution-aware Transformer to cut CNOT counts by 12-30% versus industry compilers on real circuit benchmarks.

stable-worldmodel: A Platform for Reproducible World Modeling Research and Evaluation

cs.LG · 2026-05-20 · unverdicted · novelty 5.0

The paper presents stable-worldmodel (swm), a platform with high-performance data layer, modern world model baselines, planning solvers, and extended environments for reproducible research and generalization evaluation.

Safe Deep Reinforcement Learning for Spacecraft Reorientation with Pointing Keep-Out Constraint

eess.SY · 2026-05-19 · unverdicted · novelty 4.0

DRL with CBF safety filter enables guaranteed-safe spacecraft reorientation under pointing keep-out constraints via custom state representation and reward design, validated in Monte Carlo simulations.

parallelcbf: A composable safety-filter and auditability framework for tensor-parallel reinforcement learning

cs.LG · 2026-05-15 · unverdicted · novelty 3.0

ParallelCBF is a composable framework that unifies tensor-parallel UAV environments, hard-gate CBF safety filters, sharded BC-to-RL pipelines, and operational auditability as first-class APIs for safe reinforcement learning.

citing papers explorer

Showing 4 of 4 citing papers.

QAP-Router: Tackling Qubit Routing as Dynamic Quadratic Assignment with Reinforcement Learning quant-ph · 2026-05-12 · unverdicted · none · ref 55
QAP-Router models qubit routing as dynamic QAP and applies RL with a solution-aware Transformer to cut CNOT counts by 12-30% versus industry compilers on real circuit benchmarks.
stable-worldmodel: A Platform for Reproducible World Modeling Research and Evaluation cs.LG · 2026-05-20 · unverdicted · none · ref 63
The paper presents stable-worldmodel (swm), a platform with high-performance data layer, modern world model baselines, planning solvers, and extended environments for reproducible research and generalization evaluation.
Safe Deep Reinforcement Learning for Spacecraft Reorientation with Pointing Keep-Out Constraint eess.SY · 2026-05-19 · unverdicted · none · ref 26
DRL with CBF safety filter enables guaranteed-safe spacecraft reorientation under pointing keep-out constraints via custom state representation and reward design, validated in Monte Carlo simulations.
parallelcbf: A composable safety-filter and auditability framework for tensor-parallel reinforcement learning cs.LG · 2026-05-15 · unverdicted · none · ref 10
ParallelCBF is a composable framework that unifies tensor-parallel UAV environments, hard-gate CBF safety filters, sharded BC-to-RL pipelines, and operational auditability as first-class APIs for safe reinforcement learning.

Stable-baselines3: Reliable reinforcement learning implementations.Journal of machine learning research, 22(268):1–8

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer