pith. sign in

PathVQA (He et al., 2020) shows TT-OPD at 45.3%, outperforming both base text (40.5%) and GRPO (41.5%)

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

fields

cs.LG 1

years

2026 1

verdicts

UNVERDICTED 1

representative citing papers

Healthcare AI GYM for Medical Agents

cs.LG · 2026-05-01 · unverdicted · novelty 7.0

A new Gym environment for medical AI agents reveals collapse in multi-turn RL due to sparse rewards, addressed by Turn-level Truncated On-Policy Distillation yielding +3.9 pp gains on clinical benchmarks.

citing papers explorer

Showing 1 of 1 citing paper.

  • Healthcare AI GYM for Medical Agents cs.LG · 2026-05-01 · unverdicted · none · ref 1

    A new Gym environment for medical AI agents reveals collapse in multi-turn RL due to sparse rewards, addressed by Turn-level Truncated On-Policy Distillation yielding +3.9 pp gains on clinical benchmarks.