PACE uses the squared L2 norm of policy parameter changes from a first-order approximation as an efficient proxy for environment value in UED, outperforming baselines with higher IQM and lower optimality gap on MiniGrid and Craftax OOD tests.
Jaxued: A simple and useable ued library in jax.arXiv preprint arXiv:2403.13091
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2representative citing papers
citing papers explorer
-
PACE: Parameter Change for Unsupervised Environment Design
PACE uses the squared L2 norm of policy parameter changes from a first-order approximation as an efficient proxy for environment value in UED, outperforming baselines with higher IQM and lower optimality gap on MiniGrid and Craftax OOD tests.
- TABX: A High-Throughput Sandbox Battle Simulator for Multi-Agent Reinforcement Learning