pith. sign in
Pith Number

pith:JUO6GTVU

pith:2026:JUO6GTVURNRCRLSPWBNRUYT7E6
not attested not anchored not stored refs resolved

F-GRPO: Factorized Group-Relative Policy Optimization for Unified Candidate Generation and Ranking

Bowen Jin, Gagan Mundada, Jiawei Han, Jingbo Shang, Julian McAuley, Junda Wu, Ritwik Sinha, Rohan Surana, Sizhe Zhou, Tong Yu, Xintong Li, Yizhu Jiao

F-GRPO lets one LLM jointly generate candidates and rank them by factorizing policy optimization into separate phases with distinct advantages.

arxiv:2605.12995 v1 · 2026-05-13 · cs.LG

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{JUO6GTVURNRCRLSPWBNRUYT7E6}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

F-GRPO improves top-ranked performance over GRPO and decoupled baselines, outperforms supervised alternatives, and remains competitive with strong zero-shot rerankers, with no architectural changes at inference time.

C2weakest assumption

The phase-specific credit assignment problem can be resolved by applying separate group-relative advantages to generation and ranking inside a two-phase sequence-level objective while sharing a single LLM backbone.

C3one line summary

F-GRPO factorizes group-relative policy optimization into generation and ranking phases within one autoregressive sequence, using order-invariant coverage and position-aware utility rewards to improve top-ranked performance on recommendation and multi-hop QA tasks.

References

76 extracted · 76 resolved · 7 Pith anchors

[1] Ellis, Brian Whitman, and Paul Lamere 2011
[2] Autoregressive search engines: Generating substrings as document identifiers 2022
[3] Generative slate recommendation with reinforcement learning 2023 · doi:10.1145/3539597.3570412
[4] Chang, Claire Cardie, Kianté Brantley, and Thorsten Joachim 2024
[5] URL https://doi.org/10.18653/v1/2022.naa cl-main.194 2022 · doi:10.18653/v1/2022.naacl-main.194

Formal links

2 machine-checked theorem links

Receipt and verification
First computed 2026-05-18T03:09:00.497587Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

4d1de34eb48b6228ae4fb05b1a627f278c6373f11e9b08af016fb3ae7b8d1b5f

Aliases

arxiv: 2605.12995 · arxiv_version: 2605.12995v1 · doi: 10.48550/arxiv.2605.12995 · pith_short_12: JUO6GTVURNRC · pith_short_16: JUO6GTVURNRCRLSP · pith_short_8: JUO6GTVU
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/JUO6GTVURNRCRLSPWBNRUYT7E6 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 4d1de34eb48b6228ae4fb05b1a627f278c6373f11e9b08af016fb3ae7b8d1b5f
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "d2216fd17bd931574c3fd6cab14a6aec7f5ce6e1e6086a96e8b28c009824c5db",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by-nc-sa/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-05-13T04:52:33Z",
    "title_canon_sha256": "be1d84642db727b9fd0a96430a0139d17bd727ead19e033275c85103a0863336"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.12995",
    "kind": "arxiv",
    "version": 1
  }
}