pith. sign in

Reinforcement learning: An introduction

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

citation-role summary

background 1

citation-polarity summary

roles

background 1

polarities

background 1

representative citing papers

Revisiting Mixture Policies in Entropy-Regularized Actor-Critic

cs.LG · 2026-05-09 · unverdicted · novelty 7.0

A new marginalized reparameterization estimator allows low-variance training of mixture policies in entropy-regularized actor-critic algorithms, matching or exceeding Gaussian policy performance in several continuous control benchmarks.

Automated Design of Agentic Systems

cs.AI · 2024-08-15 · conditional · novelty 7.0

Meta Agent Search uses a meta-agent to iteratively program novel agentic systems in code, producing agents that outperform state-of-the-art hand-designed ones across coding, science, and math while transferring across domains and models.

Mastering Diverse Domains through World Models

cs.AI · 2023-01-10 · unverdicted · novelty 7.0

DreamerV3 uses world models and robustness techniques to solve over 150 tasks across domains with a single configuration, including Minecraft diamond collection from scratch.

Process Reinforcement through Implicit Rewards

cs.LG · 2025-02-03 · conditional · novelty 6.0

PRIME enables online process reward model updates in LLM RL using implicit rewards from rollouts and outcome labels, yielding 15.1% average gains on reasoning benchmarks and surpassing a stronger instruct model with 10% of the data.

citing papers explorer

Showing 8 of 8 citing papers.