Global convergence of policy gradient methods to (almost) locally optimal policies.SIAM Journal on Control and Optimization, 58 (6):3586–3612

Kaiqing Zhang, Alec Koppel, Hao Zhu, Tamer Basar · 2020

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

citation-role summary

method 1

citation-polarity summary

use method 1

representative citing papers

Interactive Inverse Reinforcement Learning of Interaction Scenarios via Bi-level Optimization

cs.LG · 2026-05-01 · unverdicted · novelty 6.0

Interactive IRL is cast as bi-level optimization with an inner loop learning expert rewards and an outer loop learning interaction policies, solved by the convergent BISIRL algorithm.

citing papers explorer

Showing 1 of 1 citing paper.

Interactive Inverse Reinforcement Learning of Interaction Scenarios via Bi-level Optimization cs.LG · 2026-05-01 · unverdicted · none · ref 45
Interactive IRL is cast as bi-level optimization with an inner loop learning expert rewards and an outer loop learning interaction policies, solved by the convergent BISIRL algorithm.

Global convergence of policy gradient methods to (almost) locally optimal policies.SIAM Journal on Control and Optimization, 58 (6):3586–3612

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer