TreeDQN is a sample-efficient off-policy RL method for combinatorial optimization that uses tree MDPs, requires up to 10 times less training data than on-policy methods, and outperforms state-of-the-art on ML4CO tasks.
Reinforce- ment learning for variable selection in a branch and bound algorithm
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2023 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
TreeDQN: Sample-Efficient Off-Policy Reinforcement Learning for Combinatorial Optimization
TreeDQN is a sample-efficient off-policy RL method for combinatorial optimization that uses tree MDPs, requires up to 10 times less training data than on-policy methods, and outperforms state-of-the-art on ML4CO tasks.