TRAM is a test-time mixture method that scores and composes risk-neutral source policies using reward and occupancy-based risk to achieve new reward-risk tradeoffs without parameter updates.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
A two-stage distillation plus reinforced fine-tuning approach produces a single humanoid locomotion controller that adapts across skills and irregular terrains.
citing papers explorer
-
TRAM: Test-Time Risk Adaptation with Mixture of Agents
TRAM is a test-time mixture method that scores and composes risk-neutral source policies using reward and occupancy-based risk to achieve new reward-risk tradeoffs without parameter updates.
-
Towards Adaptive Humanoid Control via Multi-Behavior Distillation and Reinforced Fine-Tuning
A two-stage distillation plus reinforced fine-tuning approach produces a single humanoid locomotion controller that adapts across skills and irregular terrains.