pith. sign in

Steering language generation: Harnessing contrastive expert guidance and negative prompting.arXiv preprint arXiv:2308.07645, 2023

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

fields

cs.CV 1

years

2026 1

verdicts

UNVERDICTED 1

clear filters

representative citing papers

Guidance Contrastive Token Credit Assignment for Discrete Policy Optimization

cs.CV · 2026-05-28 · unverdicted · novelty 6.0

GCPO performs per-token credit assignment in discrete policy optimization by setting token advantages proportional to the difference in model predictions under positive versus negative prompts, outperforming GRPO and DAPO on text-to-image and chain-of-thought tasks.

citing papers explorer

Showing 1 of 1 citing paper after filters.

  • Guidance Contrastive Token Credit Assignment for Discrete Policy Optimization cs.CV · 2026-05-28 · unverdicted · none · ref 20

    GCPO performs per-token credit assignment in discrete policy optimization by setting token advantages proportional to the difference in model predictions under positive versus negative prompts, outperforming GRPO and DAPO on text-to-image and chain-of-thought tasks.