PathVQA (He et al., 2020) shows TT-OPD at 45.3%, outperforming both base text (40.5%) and GRPO (41.5%)

On VQA-RAD (Lau et al · 2018

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

cs.LG · 2026-05-01 · unverdicted · novelty 7.0

A new Gym environment for medical AI agents reveals collapse in multi-turn RL due to sparse rewards, addressed by Turn-level Truncated On-Policy Distillation yielding +3.9 pp gains on clinical benchmarks.

citing papers explorer

Showing 1 of 1 citing paper.

Healthcare AI GYM for Medical Agents cs.LG · 2026-05-01 · unverdicted · none · ref 1
A new Gym environment for medical AI agents reveals collapse in multi-turn RL due to sparse rewards, addressed by Turn-level Truncated On-Policy Distillation yielding +3.9 pp gains on clinical benchmarks.

PathVQA (He et al., 2020) shows TT-OPD at 45.3%, outperforming both base text (40.5%) and GRPO (41.5%)

fields

years

verdicts

representative citing papers

citing papers explorer