PQLRM integrates Pareto Q-Learning and Reward Machines to produce a sample-efficient multi-policy algorithm for non-Markovian RM rewards that converges faster than naive PQL and finds policies QRM cannot.
What regularized auto-encoders learn from the data-generating distribution.J
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
The paper links phase-transition behavior in continuous generative samplers to projection caustics in the data geometry and introduces the Critical Boundary Detector as a diagnostic tool.
citing papers explorer
-
Pareto Q-Learning with Reward Machines
PQLRM integrates Pareto Q-Learning and Reward Machines to produce a sample-efficient multi-policy algorithm for non-Markovian RM rewards that converges faster than naive PQL and finds policies QRM cannot.
-
The Geometry of Phase Transitions in Generative Dynamics via Projection Caustics
The paper links phase-transition behavior in continuous generative samplers to projection caustics in the data geometry and introduces the Critical Boundary Detector as a diagnostic tool.