Reinforcement learning: An introduction

Richard S Sutton, Andrew G Barto · 2018

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

browse 8 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Revisiting Mixture Policies in Entropy-Regularized Actor-Critic

cs.LG · 2026-05-09 · unverdicted · novelty 7.0

A new marginalized reparameterization estimator allows low-variance training of mixture policies in entropy-regularized actor-critic algorithms, matching or exceeding Gaussian policy performance in several continuous control benchmarks.

Automated Design of Agentic Systems

cs.AI · 2024-08-15 · conditional · novelty 7.0

Meta Agent Search uses a meta-agent to iteratively program novel agentic systems in code, producing agents that outperform state-of-the-art hand-designed ones across coding, science, and math while transferring across domains and models.

Mastering Diverse Domains through World Models

cs.AI · 2023-01-10 · unverdicted · novelty 7.0

DreamerV3 uses world models and robustness techniques to solve over 150 tasks across domains with a single configuration, including Minecraft diamond collection from scratch.

Process Reinforcement through Implicit Rewards

cs.LG · 2025-02-03 · conditional · novelty 6.0

PRIME enables online process reward model updates in LLM RL using implicit rewards from rollouts and outcome labels, yielding 15.1% average gains on reasoning benchmarks and surpassing a stronger instruct model with 10% of the data.

Centralized Adaptive Sampling for Reliable Co-Training of Independent Multi-Agent Policies

cs.LG · 2025-08-01 · unverdicted · novelty 5.0

CoSER adaptively samples joint actions in CTDE MARL to reduce sampling error relative to the joint on-policy distribution, empirically improving reliability of independent policy gradient convergence.

Reinforcement Learning for Testing Interdependent Requirements in Autonomous Vehicles: An Empirical Study

cs.SE · 2025-02-18 · unverdicted · novelty 5.0

MORL generates more diverse requirement-violation scenarios while SORL produces higher-severity violations when testing interdependent requirements in an end-to-end AV controller.

Designing compact training sets for data-driven molecular property prediction

physics.data-an · 2019-06-25 · unverdicted · novelty 4.0

Combines D-optimality and diversity-maximizing selection in an epsilon-greedy loop to create compact training sets for sparse group additivity and kernel ridge regression models of molecular properties.

A Survey on the Memory Mechanism of Large Language Model based Agents

cs.AI · 2024-04-21 · accept · novelty 3.0

A systematic review of memory designs, evaluation methods, applications, limitations, and future directions for LLM-based agents.

citing papers explorer

Showing 8 of 8 citing papers.

Revisiting Mixture Policies in Entropy-Regularized Actor-Critic cs.LG · 2026-05-09 · unverdicted · none · ref 43
A new marginalized reparameterization estimator allows low-variance training of mixture policies in entropy-regularized actor-critic algorithms, matching or exceeding Gaussian policy performance in several continuous control benchmarks.
Automated Design of Agentic Systems cs.AI · 2024-08-15 · conditional · none · ref 213
Meta Agent Search uses a meta-agent to iteratively program novel agentic systems in code, producing agents that outperform state-of-the-art hand-designed ones across coding, science, and math while transferring across domains and models.
Mastering Diverse Domains through World Models cs.AI · 2023-01-10 · unverdicted · none · ref 29
DreamerV3 uses world models and robustness techniques to solve over 150 tasks across domains with a single configuration, including Minecraft diamond collection from scratch.
Process Reinforcement through Implicit Rewards cs.LG · 2025-02-03 · conditional · none · ref 47
PRIME enables online process reward model updates in LLM RL using implicit rewards from rollouts and outcome labels, yielding 15.1% average gains on reasoning benchmarks and surpassing a stronger instruct model with 10% of the data.
Centralized Adaptive Sampling for Reliable Co-Training of Independent Multi-Agent Policies cs.LG · 2025-08-01 · unverdicted · none · ref 8
CoSER adaptively samples joint actions in CTDE MARL to reduce sampling error relative to the joint on-policy distribution, empirically improving reliability of independent policy gradient convergence.
Reinforcement Learning for Testing Interdependent Requirements in Autonomous Vehicles: An Empirical Study cs.SE · 2025-02-18 · unverdicted · none · ref 13
MORL generates more diverse requirement-violation scenarios while SORL produces higher-severity violations when testing interdependent requirements in an end-to-end AV controller.
Designing compact training sets for data-driven molecular property prediction physics.data-an · 2019-06-25 · unverdicted · none · ref 38
Combines D-optimality and diversity-maximizing selection in an epsilon-greedy loop to create compact training sets for sparse group additivity and kernel ridge regression models of molecular properties.
A Survey on the Memory Mechanism of Large Language Model based Agents cs.AI · 2024-04-21 · accept · none · ref 90
A systematic review of memory designs, evaluation methods, applications, limitations, and future directions for LLM-based agents.

Reinforcement learning: An introduction

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer