Steering language generation: Harnessing contrastive expert guidance and negative prompting.arXiv preprint arXiv:2308.07645, 2023

Charles O’Neill et al · 2023 · arXiv 2308.07645

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

Guidance Contrastive Token Credit Assignment for Discrete Policy Optimization

cs.CV · 2026-05-28 · unverdicted · novelty 6.0

GCPO performs per-token credit assignment in discrete policy optimization by setting token advantages proportional to the difference in model predictions under positive versus negative prompts, outperforming GRPO and DAPO on text-to-image and chain-of-thought tasks.

citing papers explorer

Showing 1 of 1 citing paper.

Guidance Contrastive Token Credit Assignment for Discrete Policy Optimization cs.CV · 2026-05-28 · unverdicted · none · ref 20
GCPO performs per-token credit assignment in discrete policy optimization by setting token advantages proportional to the difference in model predictions under positive versus negative prompts, outperforming GRPO and DAPO on text-to-image and chain-of-thought tasks.

Steering language generation: Harnessing contrastive expert guidance and negative prompting.arXiv preprint arXiv:2308.07645, 2023

fields

years

verdicts

representative citing papers

citing papers explorer