Outcome-level RL with binary or composite rewards improves compositional generalization over supervised fine-tuning by avoiding overfitting to frequent training patterns.
Nature , pages=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Proposes possibility space, timing computation, and causal factum as a new framework for data-driven trajectory discovery and counterfactual timing deduction on EHR data from 3,276 breast cancer patients.
citing papers explorer
-
Reinforcement Learning for Compositional Generalization with Outcome-Level Optimization
Outcome-level RL with binary or composite rewards improves compositional generalization over supervised fine-tuning by avoiding overfitting to frequent training patterns.
-
To Use AI as Dice of Possibilities with Timing Computation
Proposes possibility space, timing computation, and causal factum as a new framework for data-driven trajectory discovery and counterfactual timing deduction on EHR data from 3,276 breast cancer patients.