TEQL uses a low-rank tensor representation of the Q-function plus error-uncertainty guided exploration to achieve better sample efficiency than matrix low-rank or deep RL baselines on classic control tasks under matched parameter budgets.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
Multidisciplinary survey of the finite-horizon two-armed bandit with binary responses that unifies models, evaluates designs computationally for moderate vs small horizons, and debunks myths about Bayes-optimal solvability.
citing papers explorer
-
Tensor-Efficient High-Dimensional Q-learning
TEQL uses a low-rank tensor representation of the Q-function plus error-uncertainty guided exploration to achieve better sample efficiency than matrix low-rank or deep RL baselines on classic control tasks under matched parameter budgets.
-
The Finite-Horizon Two-Armed Bandit Problem with Binary Responses: A Multidisciplinary Survey of the History, State of the Art, and Myths
Multidisciplinary survey of the finite-horizon two-armed bandit with binary responses that unifies models, evaluates designs computationally for moderate vs small horizons, and debunks myths about Bayes-optimal solvability.