Bellman values for temporal logic tasks decompose into a graph of reach-avoid, avoid, and reach-avoid-loop equations solved by embedding the graph in a two-layer neural net (VDPPO) for safe high-dimensional control.
Contrastive learning as goal-conditioned reinforcement learning
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 5representative citing papers
VIP learns a visual embedding from human videos whose distance defines dense, smooth rewards for arbitrary goal-image robot tasks without task-specific fine-tuning.
Ms.PR applies multi-scale predictive supervision to enforce goal-directed alignment in latent spaces for offline GCRL, yielding improved representation quality and performance on vision and state-based tasks.
MoMo conditions contrastive representations and prediction operators on user preferences via FiLM and low-rank modulation to enable continuous modulation of plan safety while preserving inference efficiency.
Introduces relativised options and hierarchical abstraction to reuse experience across similar contexts in offline GCRL, with two algorithms demonstrating performance gains.
citing papers explorer
-
Bellman Value Decomposition for Task Logic in Safe Optimal Control
Bellman values for temporal logic tasks decompose into a graph of reach-avoid, avoid, and reach-avoid-loop equations solved by embedding the graph in a two-layer neural net (VDPPO) for safe high-dimensional control.
-
VIP: Towards Universal Visual Reward and Representation via Value-Implicit Pre-Training
VIP learns a visual embedding from human videos whose distance defines dense, smooth rewards for arbitrary goal-image robot tasks without task-specific fine-tuning.
-
Multi-scale Predictive Representations for Goal-conditioned Reinforcement Learning
Ms.PR applies multi-scale predictive supervision to enforce goal-directed alignment in latent spaces for offline GCRL, yielding improved representation quality and performance on vision and state-based tasks.
-
MoMo: Conditioned Contrastive Representation Learning for Preference-Modulated Planning
MoMo conditions contrastive representations and prediction operators on user preferences via FiLM and low-rank modulation to enable continuous modulation of plan safety while preserving inference efficiency.
-
Abstraction for Offline Goal-Conditioned Reinforcement Learning
Introduces relativised options and hierarchical abstraction to reuse experience across similar contexts in offline GCRL, with two algorithms demonstrating performance gains.