129, we have C(s, π{.,.}) =J(s, π {.,.})− X g′∈S pgoal(g′)J(s, π{g′,.}) (135) Moreover, the non-negative reward assumption implies that, for everyg∈ Sandg ′ ∈ S, J(s, g, π{g′,.})≥0

J-C relationship for OW(K, γ), formulations with non-negative rewards:Using Eq

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Unifying Goal-Conditioned RL and Unsupervised Skill Learning via Control-Maximization

cs.LG · 2026-05-07 · unverdicted · novelty 8.0

GCRL and MISL are unified as control maximization, with three inequivalent GCRL formulations each matched to a MISL objective via bounds on goal-sensitivity.

citing papers explorer

Showing 1 of 1 citing paper.

Unifying Goal-Conditioned RL and Unsupervised Skill Learning via Control-Maximization cs.LG · 2026-05-07 · unverdicted · none · ref 80
GCRL and MISL are unified as control maximization, with three inequivalent GCRL formulations each matched to a MISL objective via bounds on goal-sensitivity.

129, we have C(s, π{.,.}) =J(s, π {.,.})− X g′∈S pgoal(g′)J(s, π{g′,.}) (135) Moreover, the non-negative reward assumption implies that, for everyg∈ Sandg ′ ∈ S, J(s, g, π{g′,.})≥0

fields

years

verdicts

representative citing papers

citing papers explorer