pith. sign in

Yes” as a measure of the similarity between the two candi- date solutions. As shown by “HDPO (LLM-Div)

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

fields

cs.CL 1

years

2026 1

verdicts

UNVERDICTED 1

clear filters

representative citing papers

Hint-Guided Diversified Policy Optimization for LLM Reasoning

cs.CL · 2026-06-02 · unverdicted · novelty 4.0

HDPO adds a propose-select-think stage to RLVR so LLMs generate diverse solution outlines as hints, select the most reliable, and reason from it, with experiments claiming improved reasoning and solution diversity.

citing papers explorer

Showing 1 of 1 citing paper after filters.

  • Hint-Guided Diversified Policy Optimization for LLM Reasoning cs.CL · 2026-06-02 · unverdicted · none · ref 6

    HDPO adds a propose-select-think stage to RLVR so LLMs generate diverse solution outlines as hints, select the most reliable, and reason from it, with experiments claiming improved reasoning and solution diversity.