Trust region policy optimization

John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, Philipp Moritz · 2015

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

ZeroSiam: An Efficient Asymmetry for Test-Time Entropy Optimization without Collapse

cs.LG · 2025-09-27 · unverdicted · novelty 7.0

ZeroSiam is an asymmetric architecture using a learnable predictor and stop-gradient that prevents collapse in test-time entropy minimization while also regularizing biased signals for improved performance.

Generalised Linear Models in Deep Bayesian RL with Learnable Basis Functions

cs.LG · 2025-12-24 · unverdicted · novelty 6.0

GLiBRL uses GLMs with learnable basis functions for exact Bayesian inference in deep BRL, derives a closed-form link between L2 task distances and kernel task similarity, and reports up to 1.8x gains over prior meta-RL on MuJoCo and MetaWorld.

citing papers explorer

Showing 2 of 2 citing papers.

ZeroSiam: An Efficient Asymmetry for Test-Time Entropy Optimization without Collapse cs.LG · 2025-09-27 · unverdicted · none · ref 44
ZeroSiam is an asymmetric architecture using a learnable predictor and stop-gradient that prevents collapse in test-time entropy minimization while also regularizing biased signals for improved performance.
Generalised Linear Models in Deep Bayesian RL with Learnable Basis Functions cs.LG · 2025-12-24 · unverdicted · none · ref 26
GLiBRL uses GLMs with learnable basis functions for exact Bayesian inference in deep BRL, derives a closed-form link between L2 task distances and kernel task similarity, and reports up to 1.8x gains over prior meta-RL on MuJoCo and MetaWorld.

Trust region policy optimization

fields

years

verdicts

representative citing papers

citing papers explorer