{"bundle_type":"pith_open_graph_bundle","bundle_version":"1.0","pith_number":"pith:2026:CZ346JSLYI4GRFJN5WL7J3VUT7","short_pith_number":"pith:CZ346JSL","canonical_record":{"source":{"id":"2605.15417","kind":"arxiv","version":1},"metadata":{"license":"http://creativecommons.org/licenses/by/4.0/","primary_cat":"cs.LG","submitted_at":"2026-05-14T21:02:07Z","cross_cats_sorted":["cs.AI"],"title_canon_sha256":"a4ee575b2dcc713ff1fa49a4956eb5aa306519dc23e85057571a540450c0f516","abstract_canon_sha256":"4078c284f01852446425428635bb59b3cb2f88ba3d6ac510c0ffc3b13bfa024b"},"schema_version":"1.0"},"canonical_sha256":"1677cf264bc23868952ded97f4eeb49feb0af8ad34fe8caf3495e4d0b799a852","source":{"kind":"arxiv","id":"2605.15417","version":1},"source_aliases":[{"alias_kind":"arxiv","alias_value":"2605.15417","created_at":"2026-05-20T00:00:57Z"},{"alias_kind":"arxiv_version","alias_value":"2605.15417v1","created_at":"2026-05-20T00:00:57Z"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2605.15417","created_at":"2026-05-20T00:00:57Z"},{"alias_kind":"pith_short_12","alias_value":"CZ346JSLYI4G","created_at":"2026-05-20T00:00:57Z"},{"alias_kind":"pith_short_16","alias_value":"CZ346JSLYI4GRFJN","created_at":"2026-05-20T00:00:57Z"},{"alias_kind":"pith_short_8","alias_value":"CZ346JSL","created_at":"2026-05-20T00:00:57Z"}],"events":[{"event_type":"record_created","subject_pith_number":"pith:2026:CZ346JSLYI4GRFJN5WL7J3VUT7","target":"record","payload":{"canonical_record":{"source":{"id":"2605.15417","kind":"arxiv","version":1},"metadata":{"license":"http://creativecommons.org/licenses/by/4.0/","primary_cat":"cs.LG","submitted_at":"2026-05-14T21:02:07Z","cross_cats_sorted":["cs.AI"],"title_canon_sha256":"a4ee575b2dcc713ff1fa49a4956eb5aa306519dc23e85057571a540450c0f516","abstract_canon_sha256":"4078c284f01852446425428635bb59b3cb2f88ba3d6ac510c0ffc3b13bfa024b"},"schema_version":"1.0"},"canonical_sha256":"1677cf264bc23868952ded97f4eeb49feb0af8ad34fe8caf3495e4d0b799a852","receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-20T00:00:57.505925Z","signature_b64":"LUWROAu0oCA5neRrflDvP5AaenyihW7Al3Rr1cQmCsOR+VlWpuTdoIJqU1pFVkrIWiVw7OkQM0JWzLeHkbJzCQ==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"1677cf264bc23868952ded97f4eeb49feb0af8ad34fe8caf3495e4d0b799a852","last_reissued_at":"2026-05-20T00:00:57.505206Z","signature_status":"signed_v1","first_computed_at":"2026-05-20T00:00:57.505206Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"source_kind":"arxiv","source_id":"2605.15417","source_version":1,"attestation_state":"computed"},"signer":{"signer_id":"pith.science","signer_type":"pith_registry","key_id":"pith-v1-2026-05","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"created_at":"2026-05-20T00:00:57Z","supersedes":[],"prev_event":null,"signature":{"signature_status":"signed_v1","algorithm":"ed25519","key_id":"pith-v1-2026-05","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54","signature_b64":"B6DGfp24i+hXuGmSaM1LU1/9xfpzCuMbbcbsJcKPJIWUQE2U8IHOcLWrF+vjdI/hoQA/0KokW6sAb2fG6GRbDg==","signed_message":"open_graph_event_sha256_bytes","signed_at":"2026-05-21T23:38:28.478771Z"},"content_sha256":"964a41abdae3876a40e678142b69ae0b66f9a7842f112bbb9e32bcb5228fa99f","schema_version":"1.0","event_id":"sha256:964a41abdae3876a40e678142b69ae0b66f9a7842f112bbb9e32bcb5228fa99f"},{"event_type":"graph_snapshot","subject_pith_number":"pith:2026:CZ346JSLYI4GRFJN5WL7J3VUT7","target":"graph","payload":{"graph_snapshot":{"paper":{"title":"$f$-Trajectory Balance: A Loss Family for Tuning GFlowNets, Generative Models, and LLMs with Off- and On-Policy Data","license":"http://creativecommons.org/licenses/by/4.0/","headline":"A family of losses lets generative models match any f-divergence on-policy while keeping the same global minimizer off-policy.","cross_cats":["cs.AI"],"primary_cat":"cs.LG","authors_text":"Jake Fawkes, Jason Hartford","submitted_at":"2026-05-14T21:02:07Z","abstract_excerpt":"In GFlowNets and variational inference, it has been shown that the mean square error between target and model log probabilities is an effective, low variance, surrogate loss for training generative models.\n  This loss has the property that when evaluated \\emph{on-policy} its gradients correspond to those of the KL divergence, while \\emph{off-policy} it remains a valid loss with the same global minimizer. In this work, we demonstrate that this construction can be extended to the whole family of $f$-divergences, leading to a family of losses whose on-policy gradients are that of the correspondin"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"This construction can be extended to the whole family of f-divergences, leading to a family of losses whose on-policy gradients are that of the corresponding f-divergence, but retain the same global minimizer off-policy. Specifically, the on-policy gradients lead to a one to one correspondence between translation invariant loss functions on the target and model log probabilities, and f-divergences.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The paper assumes that the surrogate losses remain valid and share the same global minimizer when evaluated off-policy, which rests on the translation-invariance property of the loss functions on log probabilities; if this invariance does not hold for the chosen f-divergence or if the correspondence is not one-to-one, the off-policy guarantee fails.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"The work defines a family of surrogate losses for generative models and LLMs whose on-policy gradients match those of any f-divergence while sharing the same off-policy minimizer as the original MSE loss.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"A family of losses lets generative models match any f-divergence on-policy while keeping the same global minimizer off-policy.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"a1479b9d9681b4650de00afff61de45ff3360e2356bbcaa9cad374944b746833"},"source":{"id":"2605.15417","kind":"arxiv","version":1},"verdict":{"id":"d20875d1-e822-47e6-9970-31734748c0fa","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-19T15:55:56.227045Z","strongest_claim":"This construction can be extended to the whole family of f-divergences, leading to a family of losses whose on-policy gradients are that of the corresponding f-divergence, but retain the same global minimizer off-policy. Specifically, the on-policy gradients lead to a one to one correspondence between translation invariant loss functions on the target and model log probabilities, and f-divergences.","one_line_summary":"The work defines a family of surrogate losses for generative models and LLMs whose on-policy gradients match those of any f-divergence while sharing the same off-policy minimizer as the original MSE loss.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The paper assumes that the surrogate losses remain valid and share the same global minimizer when evaluated off-policy, which rests on the translation-invariance property of the loss functions on log probabilities; if this invariance does not hold for the chosen f-divergence or if the correspondence is not one-to-one, the off-policy guarantee fails.","pith_extraction_headline":"A family of losses lets generative models match any f-divergence on-policy while keeping the same global minimizer off-policy."},"integrity":{"clean":false,"summary":{"advisory":1,"critical":0,"by_detector":{"doi_compliance":{"total":1,"advisory":1,"critical":0,"informational":0}},"informational":0},"endpoint":"/pith/2605.15417/integrity.json","findings":[{"note":"DOI in the printed bibliography is fragmented by whitespace or line breaks. A longer candidate (10.64434/tml.20251026.https://thinkingmachines.ai/blog/on-policy-distillation.N.Malkin) was visible in the surrounding text but could not be confirmed against doi.org as printed.","detector":"doi_compliance","severity":"advisory","ref_index":1,"audited_at":"2026-05-19T16:04:40.928097Z","detected_doi":"10.64434/tml.20251026.https://thinkingmachines.ai/blog/on-policy-distillation.N.Malkin","finding_type":"recoverable_identifier","verdict_class":"incontrovertible","detected_arxiv_id":null}],"available":true,"detectors_run":[{"name":"cited_work_retraction","ran_at":"2026-05-19T16:23:36.281901Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"doi_compliance","ran_at":"2026-05-19T16:04:40.928097Z","status":"completed","version":"1.0.0","findings_count":1},{"name":"doi_title_agreement","ran_at":"2026-05-19T16:01:17.993422Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"citation_quote_validity","ran_at":"2026-05-19T15:50:44.017272Z","status":"skipped","version":"0.1.0","findings_count":0},{"name":"claim_evidence","ran_at":"2026-05-19T14:21:54.145804Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"ai_meta_artifact","ran_at":"2026-05-19T13:33:22.706558Z","status":"skipped","version":"1.0.0","findings_count":0}],"snapshot_sha256":"04277d73cb1be7b205bef075a229fcef623a88257bbec8995386398f031de8d5"},"references":{"count":13,"sample":[{"doi":"10.64434/tml","year":2020,"title":"PMLR, 2020. D. Go, T. Korbak, G. Kruszewski, J. Rozen, N. Ryu, and M. Dymetman. Aligning language models with prefer- ences through f-divergence minimization.arXiv preprint arXiv:2302.08215, 2023. J. ","work_id":"47dbfacb-ba1e-4993-a8cb-0a77adc5933c","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"If µ=p θ, the expected auto-differentiated gradients match the f-divergence gradient: ∇θDf(pθ∥p⋆) = Epθ[∇θLf(∆θ(y))]. A.1.1. PROOF OFPART1: CONVEXITY ANDGLOBALMINIMIZER Let the scalar loss function wi","work_id":"eb2a86ab-7d72-4d3c-bd47-36f929ce8379","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"Gradient of the Surrogate Loss:The gradient of the loss Lf with respect to the backward parameters ϕ, estimated on-policy, is: ∇ϕJon =E τ∼π F [∇ϕLf(logu)] =E τ∼π F [(f ′(u)−f ′(1))∇ϕ(−logπ B)] =E τ∼π ","work_id":"932a69a2-1764-4218-913e-d2aff6131936","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"Gradient of the Candidate Divergence:Consider the generic divergence Dg(πF ∥πB) = R πB(τ)g πF (τ) πB(τ) dτ. Differentiating with respect toϕ: ∇ϕDg = Z (∇ϕπB ·g(u) +π Bg′(u)∇ϕu)dτ Using the identity∇ ϕ","work_id":"0a5b5dc3-2bf3-42fb-8d04-746c7c8050f6","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"•Gradient Weight: wDG i = ∆i −E B[∆(y)]","work_id":"3189b515-8d9a-4e0c-8e6a-52a33ea74bf3","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":13,"snapshot_sha256":"57e599fca8de7a78ba5d662c179e8a2ac0bab9aa08e79cd7b8bb180c48616780","internal_anchors":0},"formal_canon":{"evidence_count":2,"snapshot_sha256":"261567f7f99af7f1cc1179f1036a6addfa08b82d7c7fd7df4e64fe62bb2a4df1"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"verdict_id":"d20875d1-e822-47e6-9970-31734748c0fa"},"signer":{"signer_id":"pith.science","signer_type":"pith_registry","key_id":"pith-v1-2026-05","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"created_at":"2026-05-20T00:00:57Z","supersedes":[],"prev_event":null,"signature":{"signature_status":"signed_v1","algorithm":"ed25519","key_id":"pith-v1-2026-05","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54","signature_b64":"U5UDDGq0voQyJdSZyip46cPUzL5VRMH/WSUadZHM2S9mltpA4RPNTb33iNM+k7+KVLYv10pLeOIgDizpuAjZCw==","signed_message":"open_graph_event_sha256_bytes","signed_at":"2026-05-21T23:38:28.479622Z"},"content_sha256":"abb6c2c00f7e50cd8605eebb6940bd84f61a7a7c8631ff2349a8c9e222e89244","schema_version":"1.0","event_id":"sha256:abb6c2c00f7e50cd8605eebb6940bd84f61a7a7c8631ff2349a8c9e222e89244"},{"event_type":"integrity_finding","subject_pith_number":"pith:2026:CZ346JSLYI4GRFJN5WL7J3VUT7","target":"integrity","payload":{"note":"DOI in the printed bibliography is fragmented by whitespace or line breaks. A longer candidate (10.64434/tml.20251026.https://thinkingmachines.ai/blog/on-policy-distillation.N.Malkin) was visible in the surrounding text but could not be confirmed against doi.org as printed.","snippet":"PMLR, 2020. D. Go, T. Korbak, G. Kruszewski, J. Rozen, N. Ryu, and M. Dymetman. Aligning language models with prefer- ences through f-divergence minimization.arXiv preprint arXiv:2302.08215, 2023. J. Han, M. Jiang, Y . Song, S. Ermon, and M","arxiv_id":"2605.15417","detector":"doi_compliance","evidence":{"ref_index":1,"verdict_class":"incontrovertible","resolved_title":null,"printed_excerpt":"PMLR, 2020. D. Go, T. Korbak, G. Kruszewski, J. Rozen, N. Ryu, and M. Dymetman. Aligning language models with prefer- ences through f-divergence minimization.arXiv preprint arXiv:2302.08215, 2023. J. Han, M. Jiang, Y . Song, S. Ermon, and M","reconstructed_doi":"10.64434/tml.20251026.https://thinkingmachines.ai/blog/on-policy-distillation.N.Malkin"},"severity":"advisory","ref_index":1,"audited_at":"2026-05-19T16:04:40.928097Z","event_type":"pith.integrity.v1","detected_doi":"10.64434/tml.20251026.https://thinkingmachines.ai/blog/on-policy-distillation.N.Malkin","detector_url":"https://pith.science/pith-integrity-protocol#doi_compliance","external_url":null,"finding_type":"recoverable_identifier","evidence_hash":"5a13006a2f6d5c28e1dfdcc7ab7f57a034069dcd47a7cefd9dad25aa874012a7","paper_version":1,"verdict_class":"incontrovertible","resolved_title":null,"detector_version":"1.0.0","detected_arxiv_id":null,"integrity_event_id":1962,"payload_sha256":"d8adf5ab8ed1a177781c214e7297f2c6d953da025be17f6569e0b2084c7c3ba2","signature_b64":"7TtIA2SsaYFs3omWhZHJX3Y8e/rYmHo0rEm4vgfq9kgr1bgYP1Il35ACFXKuDWLEh67Nji0NTQVW+iB92zMwCg==","signing_key_id":"pith-v1-2026-05"},"signer":{"signer_id":"pith.science","signer_type":"pith_registry","key_id":"pith-v1-2026-05","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"created_at":"2026-05-19T16:07:12Z","supersedes":[],"prev_event":null,"signature":{"signature_status":"signed_v1","algorithm":"ed25519","key_id":"pith-v1-2026-05","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54","signature_b64":"ZlN+YICeqqG429uRUkCiGk3zLEhsfqEEViI62/apUkl1pV9HdW448DJPqx5WQsVYeO3sLd8DyQiv89jpqZS6Bw==","signed_message":"open_graph_event_sha256_bytes","signed_at":"2026-05-21T23:38:28.480643Z"},"content_sha256":"61fe4ae50b0c161749401d611c7a6230a00396a067b106bdbae3f6220851aa98","schema_version":"1.0","event_id":"sha256:61fe4ae50b0c161749401d611c7a6230a00396a067b106bdbae3f6220851aa98"}],"timestamp_proofs":[],"mirror_hints":[{"mirror_type":"https","name":"Pith Resolver","base_url":"https://pith.science","bundle_url":"https://pith.science/pith/CZ346JSLYI4GRFJN5WL7J3VUT7/bundle.json","state_url":"https://pith.science/pith/CZ346JSLYI4GRFJN5WL7J3VUT7/state.json","well_known_bundle_url":"https://pith.science/.well-known/pith/CZ346JSLYI4GRFJN5WL7J3VUT7/bundle.json","status":"primary"}],"public_keys":[{"key_id":"pith-v1-2026-05","algorithm":"ed25519","format":"raw","public_key_b64":"stVStoiQhXFxp4s2pdzPNoqVNBMojDU/fJ2db5S3CbM=","public_key_hex":"b2d552b68890857171a78b36a5dccf368a953413288c353f7c9d9d6f94b709b3","fingerprint_sha256_b32_first128bits":"RVFV5Z2OI2J3ZUO7ERDEBCYNKS","fingerprint_sha256_hex":"8d4b5ee74e4693bcd1df2446408b0d54","rotates_at":null,"url":"https://pith.science/pith-signing-key.json","notes":"Pith uses this Ed25519 key to sign canonical record SHA-256 digests. Verify with: ed25519_verify(public_key, message=canonical_sha256_bytes, signature=base64decode(signature_b64))."}],"merge_version":"pith-open-graph-merge-v1","built_at":"2026-05-21T23:38:28Z","links":{"resolver":"https://pith.science/pith/CZ346JSLYI4GRFJN5WL7J3VUT7","bundle":"https://pith.science/pith/CZ346JSLYI4GRFJN5WL7J3VUT7/bundle.json","state":"https://pith.science/pith/CZ346JSLYI4GRFJN5WL7J3VUT7/state.json","well_known_bundle":"https://pith.science/.well-known/pith/CZ346JSLYI4GRFJN5WL7J3VUT7/bundle.json"},"state":{"state_type":"pith_open_graph_state","state_version":"1.0","pith_number":"pith:2026:CZ346JSLYI4GRFJN5WL7J3VUT7","merge_version":"pith-open-graph-merge-v1","event_count":3,"valid_event_count":3,"invalid_event_count":0,"equivocation_count":0,"current":{"canonical_record":{"metadata":{"abstract_canon_sha256":"4078c284f01852446425428635bb59b3cb2f88ba3d6ac510c0ffc3b13bfa024b","cross_cats_sorted":["cs.AI"],"license":"http://creativecommons.org/licenses/by/4.0/","primary_cat":"cs.LG","submitted_at":"2026-05-14T21:02:07Z","title_canon_sha256":"a4ee575b2dcc713ff1fa49a4956eb5aa306519dc23e85057571a540450c0f516"},"schema_version":"1.0","source":{"id":"2605.15417","kind":"arxiv","version":1}},"source_aliases":[{"alias_kind":"arxiv","alias_value":"2605.15417","created_at":"2026-05-20T00:00:57Z"},{"alias_kind":"arxiv_version","alias_value":"2605.15417v1","created_at":"2026-05-20T00:00:57Z"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2605.15417","created_at":"2026-05-20T00:00:57Z"},{"alias_kind":"pith_short_12","alias_value":"CZ346JSLYI4G","created_at":"2026-05-20T00:00:57Z"},{"alias_kind":"pith_short_16","alias_value":"CZ346JSLYI4GRFJN","created_at":"2026-05-20T00:00:57Z"},{"alias_kind":"pith_short_8","alias_value":"CZ346JSL","created_at":"2026-05-20T00:00:57Z"}],"graph_snapshots":[{"event_id":"sha256:abb6c2c00f7e50cd8605eebb6940bd84f61a7a7c8631ff2349a8c9e222e89244","target":"graph","created_at":"2026-05-20T00:00:57Z","signer":{"key_id":"pith-v1-2026-05","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54","signer_id":"pith.science","signer_type":"pith_registry"},"payload":{"graph_snapshot":{"author_claims":{"count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","strong_count":0},"builder_version":"pith-number-builder-2026-05-17-v1","claims":{"count":4,"items":[{"attestation":"unclaimed","claim_id":"C1","kind":"strongest_claim","source":"verdict.strongest_claim","status":"machine_extracted","text":"This construction can be extended to the whole family of f-divergences, leading to a family of losses whose on-policy gradients are that of the corresponding f-divergence, but retain the same global minimizer off-policy. Specifically, the on-policy gradients lead to a one to one correspondence between translation invariant loss functions on the target and model log probabilities, and f-divergences."},{"attestation":"unclaimed","claim_id":"C2","kind":"weakest_assumption","source":"verdict.weakest_assumption","status":"machine_extracted","text":"The paper assumes that the surrogate losses remain valid and share the same global minimizer when evaluated off-policy, which rests on the translation-invariance property of the loss functions on log probabilities; if this invariance does not hold for the chosen f-divergence or if the correspondence is not one-to-one, the off-policy guarantee fails."},{"attestation":"unclaimed","claim_id":"C3","kind":"one_line_summary","source":"verdict.one_line_summary","status":"machine_extracted","text":"The work defines a family of surrogate losses for generative models and LLMs whose on-policy gradients match those of any f-divergence while sharing the same off-policy minimizer as the original MSE loss."},{"attestation":"unclaimed","claim_id":"C4","kind":"headline","source":"verdict.pith_extraction.headline","status":"machine_extracted","text":"A family of losses lets generative models match any f-divergence on-policy while keeping the same global minimizer off-policy."}],"snapshot_sha256":"a1479b9d9681b4650de00afff61de45ff3360e2356bbcaa9cad374944b746833"},"formal_canon":{"evidence_count":2,"snapshot_sha256":"261567f7f99af7f1cc1179f1036a6addfa08b82d7c7fd7df4e64fe62bb2a4df1"},"integrity":{"available":true,"clean":false,"detectors_run":[{"findings_count":0,"name":"cited_work_retraction","ran_at":"2026-05-19T16:23:36.281901Z","status":"completed","version":"1.0.0"},{"findings_count":1,"name":"doi_compliance","ran_at":"2026-05-19T16:04:40.928097Z","status":"completed","version":"1.0.0"},{"findings_count":0,"name":"doi_title_agreement","ran_at":"2026-05-19T16:01:17.993422Z","status":"completed","version":"1.0.0"},{"findings_count":0,"name":"citation_quote_validity","ran_at":"2026-05-19T15:50:44.017272Z","status":"skipped","version":"0.1.0"},{"findings_count":0,"name":"claim_evidence","ran_at":"2026-05-19T14:21:54.145804Z","status":"completed","version":"1.0.0"},{"findings_count":0,"name":"ai_meta_artifact","ran_at":"2026-05-19T13:33:22.706558Z","status":"skipped","version":"1.0.0"}],"endpoint":"/pith/2605.15417/integrity.json","findings":[{"audited_at":"2026-05-19T16:04:40.928097Z","detected_arxiv_id":null,"detected_doi":"10.64434/tml.20251026.https://thinkingmachines.ai/blog/on-policy-distillation.N.Malkin","detector":"doi_compliance","finding_type":"recoverable_identifier","note":"DOI in the printed bibliography is fragmented by whitespace or line breaks. A longer candidate (10.64434/tml.20251026.https://thinkingmachines.ai/blog/on-policy-distillation.N.Malkin) was visible in the surrounding text but could not be confirmed against doi.org as printed.","ref_index":1,"severity":"advisory","verdict_class":"incontrovertible"}],"snapshot_sha256":"04277d73cb1be7b205bef075a229fcef623a88257bbec8995386398f031de8d5","summary":{"advisory":1,"by_detector":{"doi_compliance":{"advisory":1,"critical":0,"informational":0,"total":1}},"critical":0,"informational":0}},"paper":{"abstract_excerpt":"In GFlowNets and variational inference, it has been shown that the mean square error between target and model log probabilities is an effective, low variance, surrogate loss for training generative models.\n  This loss has the property that when evaluated \\emph{on-policy} its gradients correspond to those of the KL divergence, while \\emph{off-policy} it remains a valid loss with the same global minimizer. In this work, we demonstrate that this construction can be extended to the whole family of $f$-divergences, leading to a family of losses whose on-policy gradients are that of the correspondin","authors_text":"Jake Fawkes, Jason Hartford","cross_cats":["cs.AI"],"headline":"A family of losses lets generative models match any f-divergence on-policy while keeping the same global minimizer off-policy.","license":"http://creativecommons.org/licenses/by/4.0/","primary_cat":"cs.LG","submitted_at":"2026-05-14T21:02:07Z","title":"$f$-Trajectory Balance: A Loss Family for Tuning GFlowNets, Generative Models, and LLMs with Off- and On-Policy Data"},"references":{"count":13,"internal_anchors":0,"resolved_work":13,"sample":[{"cited_arxiv_id":"","doi":"10.64434/tml","is_internal_anchor":false,"ref_index":1,"title":"PMLR, 2020. D. Go, T. Korbak, G. Kruszewski, J. Rozen, N. Ryu, and M. Dymetman. Aligning language models with prefer- ences through f-divergence minimization.arXiv preprint arXiv:2302.08215, 2023. J. ","work_id":"47dbfacb-ba1e-4993-a8cb-0a77adc5933c","year":2020},{"cited_arxiv_id":"","doi":"","is_internal_anchor":false,"ref_index":2,"title":"If µ=p θ, the expected auto-differentiated gradients match the f-divergence gradient: ∇θDf(pθ∥p⋆) = Epθ[∇θLf(∆θ(y))]. A.1.1. PROOF OFPART1: CONVEXITY ANDGLOBALMINIMIZER Let the scalar loss function wi","work_id":"eb2a86ab-7d72-4d3c-bd47-36f929ce8379","year":null},{"cited_arxiv_id":"","doi":"","is_internal_anchor":false,"ref_index":3,"title":"Gradient of the Surrogate Loss:The gradient of the loss Lf with respect to the backward parameters ϕ, estimated on-policy, is: ∇ϕJon =E τ∼π F [∇ϕLf(logu)] =E τ∼π F [(f ′(u)−f ′(1))∇ϕ(−logπ B)] =E τ∼π ","work_id":"932a69a2-1764-4218-913e-d2aff6131936","year":null},{"cited_arxiv_id":"","doi":"","is_internal_anchor":false,"ref_index":4,"title":"Gradient of the Candidate Divergence:Consider the generic divergence Dg(πF ∥πB) = R πB(τ)g πF (τ) πB(τ) dτ. Differentiating with respect toϕ: ∇ϕDg = Z (∇ϕπB ·g(u) +π Bg′(u)∇ϕu)dτ Using the identity∇ ϕ","work_id":"0a5b5dc3-2bf3-42fb-8d04-746c7c8050f6","year":null},{"cited_arxiv_id":"","doi":"","is_internal_anchor":false,"ref_index":5,"title":"•Gradient Weight: wDG i = ∆i −E B[∆(y)]","work_id":"3189b515-8d9a-4e0c-8e6a-52a33ea74bf3","year":null}],"snapshot_sha256":"57e599fca8de7a78ba5d662c179e8a2ac0bab9aa08e79cd7b8bb180c48616780"},"source":{"id":"2605.15417","kind":"arxiv","version":1},"verdict":{"created_at":"2026-05-19T15:55:56.227045Z","id":"d20875d1-e822-47e6-9970-31734748c0fa","model_set":{"reader":"grok-4.3"},"one_line_summary":"The work defines a family of surrogate losses for generative models and LLMs whose on-policy gradients match those of any f-divergence while sharing the same off-policy minimizer as the original MSE loss.","pipeline_version":"pith-pipeline@v0.9.0","pith_extraction_headline":"A family of losses lets generative models match any f-divergence on-policy while keeping the same global minimizer off-policy.","strongest_claim":"This construction can be extended to the whole family of f-divergences, leading to a family of losses whose on-policy gradients are that of the corresponding f-divergence, but retain the same global minimizer off-policy. Specifically, the on-policy gradients lead to a one to one correspondence between translation invariant loss functions on the target and model log probabilities, and f-divergences.","weakest_assumption":"The paper assumes that the surrogate losses remain valid and share the same global minimizer when evaluated off-policy, which rests on the translation-invariance property of the loss functions on log probabilities; if this invariance does not hold for the chosen f-divergence or if the correspondence is not one-to-one, the off-policy guarantee fails."}},"verdict_id":"d20875d1-e822-47e6-9970-31734748c0fa"}}],"author_attestations":[],"timestamp_anchors":[],"storage_attestations":[],"citation_signatures":[],"replication_records":[],"corrections":[],"mirror_hints":[],"record_created":{"event_id":"sha256:964a41abdae3876a40e678142b69ae0b66f9a7842f112bbb9e32bcb5228fa99f","target":"record","created_at":"2026-05-20T00:00:57Z","signer":{"key_id":"pith-v1-2026-05","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54","signer_id":"pith.science","signer_type":"pith_registry"},"payload":{"attestation_state":"computed","canonical_record":{"metadata":{"abstract_canon_sha256":"4078c284f01852446425428635bb59b3cb2f88ba3d6ac510c0ffc3b13bfa024b","cross_cats_sorted":["cs.AI"],"license":"http://creativecommons.org/licenses/by/4.0/","primary_cat":"cs.LG","submitted_at":"2026-05-14T21:02:07Z","title_canon_sha256":"a4ee575b2dcc713ff1fa49a4956eb5aa306519dc23e85057571a540450c0f516"},"schema_version":"1.0","source":{"id":"2605.15417","kind":"arxiv","version":1}},"canonical_sha256":"1677cf264bc23868952ded97f4eeb49feb0af8ad34fe8caf3495e4d0b799a852","receipt":{"algorithm":"ed25519","builder_version":"pith-number-builder-2026-05-17-v1","canonical_sha256":"1677cf264bc23868952ded97f4eeb49feb0af8ad34fe8caf3495e4d0b799a852","first_computed_at":"2026-05-20T00:00:57.505206Z","key_id":"pith-v1-2026-05","kind":"pith_receipt","last_reissued_at":"2026-05-20T00:00:57.505206Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54","receipt_version":"0.3","signature_b64":"LUWROAu0oCA5neRrflDvP5AaenyihW7Al3Rr1cQmCsOR+VlWpuTdoIJqU1pFVkrIWiVw7OkQM0JWzLeHkbJzCQ==","signature_status":"signed_v1","signed_at":"2026-05-20T00:00:57.505925Z","signed_message":"canonical_sha256_bytes"},"source_id":"2605.15417","source_kind":"arxiv","source_version":1}}},"equivocations":[],"invalid_events":[],"applied_event_ids":["sha256:61fe4ae50b0c161749401d611c7a6230a00396a067b106bdbae3f6220851aa98","sha256:964a41abdae3876a40e678142b69ae0b66f9a7842f112bbb9e32bcb5228fa99f","sha256:abb6c2c00f7e50cd8605eebb6940bd84f61a7a7c8631ff2349a8c9e222e89244"],"state_sha256":"c58e7d713bd9a3e4f5d010b86f47a3f107b2da331b6325e7153a6e2f8d145356"},"bundle_signature":{"signature_status":"signed_v1","algorithm":"ed25519","key_id":"pith-v1-2026-05","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54","signature_b64":"6joxr5G11NxLENdPhHZIxXbMAIFN/noXBu+PuIqdiXZQgFXhflRD/cUxommu7gRSLUQQHZnzZzshnBObtGZKCA==","signed_message":"bundle_sha256_bytes","signed_at":"2026-05-21T23:38:28.485205Z","bundle_sha256":"8f1c0fa925aa1bc7e376818950791a1938e79e6dae605e9d7c37c9d61d42a3a9"}}