BISON learns bilevel policies over symbolic world models to generalize long-horizon robotic planning beyond VLA and end-to-end baselines while remaining efficient even at 10,000-object scale.
Gordon, and Drew Bagnell
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
method 1
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2representative citing papers
SOD reweights on-policy distillation strength step-by-step using divergence to stabilize tool use in small language model agents, yielding up to 20.86% gains and 26.13% on AIME 2025 for a 0.6B model.
citing papers explorer
-
Learning Bilevel Policies over Symbolic World Models for Long-Horizon Planning
BISON learns bilevel policies over symbolic world models to generalize long-horizon robotic planning beyond VLA and end-to-end baselines while remaining efficient even at 10,000-object scale.
-
SOD: Step-wise On-policy Distillation for Small Language Model Agents
SOD reweights on-policy distillation strength step-by-step using divergence to stabilize tool use in small language model agents, yielding up to 20.86% gains and 26.13% on AIME 2025 for a 0.6B model.