ZeroSiam is an asymmetric architecture using a learnable predictor and stop-gradient that prevents collapse in test-time entropy minimization while also regularizing biased signals for improved performance.
Trust region policy optimization
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.LG 2years
2025 2verdicts
UNVERDICTED 2representative citing papers
GLiBRL uses GLMs with learnable basis functions for exact Bayesian inference in deep BRL, derives a closed-form link between L2 task distances and kernel task similarity, and reports up to 1.8x gains over prior meta-RL on MuJoCo and MetaWorld.
citing papers explorer
-
ZeroSiam: An Efficient Asymmetry for Test-Time Entropy Optimization without Collapse
ZeroSiam is an asymmetric architecture using a learnable predictor and stop-gradient that prevents collapse in test-time entropy minimization while also regularizing biased signals for improved performance.
-
Generalised Linear Models in Deep Bayesian RL with Learnable Basis Functions
GLiBRL uses GLMs with learnable basis functions for exact Bayesian inference in deep BRL, derives a closed-form link between L2 task distances and kernel task similarity, and reports up to 1.8x gains over prior meta-RL on MuJoCo and MetaWorld.