A.2.3 Group Sequence Policy Optimization Group Sequence Policy Optimization (GSPO) (Zheng et al., 2025) addresses train- ing instability in large Mixture-of-Experts (MoE) models

Token-Level Loss · 2025 · arXiv 5171.6590

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

Every Step Counts: Step-Level Credit Assignment for Tool-Integrated Text-to-SQL

cs.CL · 2026-05-06 · unverdicted · novelty 6.0

FineStep adds step-level process rewards and credit assignment to tool-augmented Text-to-SQL, achieving 3.25% higher execution accuracy than GRPO on BIRD while cutting redundant tool calls.

citing papers explorer

Showing 1 of 1 citing paper.

Every Step Counts: Step-Level Credit Assignment for Tool-Integrated Text-to-SQL cs.CL · 2026-05-06 · unverdicted · none · ref 11
FineStep adds step-level process rewards and credit assignment to tool-augmented Text-to-SQL, achieving 3.25% higher execution accuracy than GRPO on BIRD while cutting redundant tool calls.

A.2.3 Group Sequence Policy Optimization Group Sequence Policy Optimization (GSPO) (Zheng et al., 2025) addresses train- ing instability in large Mixture-of-Experts (MoE) models

fields

years

verdicts

representative citing papers

citing papers explorer