Selec- tive reflection-tuning: Student-selected data recycling for llm instruction-tuning

Li, M · 2024

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Reducing Credit Assignment Variance via Counterfactual Reasoning Paths

cs.LG · 2026-04-20 · unverdicted · novelty 5.0

Introduces IBPO, a counterfactual credit assignment method that turns sparse terminal rewards into process-level advantage estimates for more stable LLM reasoning training.

citing papers explorer

Showing 1 of 1 citing paper.

Reducing Credit Assignment Variance via Counterfactual Reasoning Paths cs.LG · 2026-04-20 · unverdicted · none · ref 4
Introduces IBPO, a counterfactual credit assignment method that turns sparse terminal rewards into process-level advantage estimates for more stable LLM reasoning training.

Selec- tive reflection-tuning: Student-selected data recycling for llm instruction-tuning

fields

years

verdicts

representative citing papers

citing papers explorer