{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2025:TVTKIJ5SFEPIQA27ODCVPOUP2Y","short_pith_number":"pith:TVTKIJ5S","schema_version":"1.0","canonical_sha256":"9d66a427b2291e88035f70c557ba8fd62f3c5c1308ccb732e2ff94724e1b4d49","source":{"kind":"arxiv","id":"2511.22581","version":5},"attestation_state":"computed","paper":{"title":"High entropy leads to symmetry-equivariant policies in Dec-POMDPs","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Sufficiently high entropy regularization in any Dec-POMDP makes policy gradient flow with tabular softmax converge to the same symmetry-equivariant joint policy from every initialization.","cross_cats":["cs.MA"],"primary_cat":"cs.LG","authors_text":"Andreas Bulling, Constantin Ruhdorfer, Jakob Foerster, Johannes Forkel, Michael Beukman","submitted_at":"2025-11-27T16:13:27Z","abstract_excerpt":"We prove that in any Dec-POMDP, sufficiently high entropy regularization ensures that the policy gradient flow with tabular softmax parametrization always converges, for any initialization, to the same joint policy, and that this joint policy is equivariant w.r.t. all symmetries of the Dec-POMDP. In particular, policies coming from different initializations will be fully compatible, in that their cross-play returns are equal to their self-play returns. Through extensive evaluation of independent PPO, arguably the standard baseline deep multi-agent policy gradient algorithm, in the Hanabi, Over"},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":false,"formal_links_present":true},"canonical_record":{"source":{"id":"2511.22581","kind":"arxiv","version":5},"metadata":{"license":"http://creativecommons.org/licenses/by/4.0/","primary_cat":"cs.LG","submitted_at":"2025-11-27T16:13:27Z","cross_cats_sorted":["cs.MA"],"title_canon_sha256":"c2a96431595a94e9af6a69640f799e7d9b707b5ec410a680d7dc97b49647592e","abstract_canon_sha256":"819c6b1c9354a7f25db8dc7ab33bd055709d415f1534be8346080dd82694cc12"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-06-08T01:03:51.787702Z","signature_b64":"zd/dO4S3x6oGnbZwoZuNZBCwhJFxWeViqLGuO97wfJhkPagt2o60D21wsuukDyHfb3v8kQsWkzrh8WMpkBABAg==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"9d66a427b2291e88035f70c557ba8fd62f3c5c1308ccb732e2ff94724e1b4d49","last_reissued_at":"2026-06-08T01:03:51.786624Z","signature_status":"signed_v1","first_computed_at":"2026-06-08T01:03:51.786624Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"High entropy leads to symmetry-equivariant policies in Dec-POMDPs","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Sufficiently high entropy regularization in any Dec-POMDP makes policy gradient flow with tabular softmax converge to the same symmetry-equivariant joint policy from every initialization.","cross_cats":["cs.MA"],"primary_cat":"cs.LG","authors_text":"Andreas Bulling, Constantin Ruhdorfer, Jakob Foerster, Johannes Forkel, Michael Beukman","submitted_at":"2025-11-27T16:13:27Z","abstract_excerpt":"We prove that in any Dec-POMDP, sufficiently high entropy regularization ensures that the policy gradient flow with tabular softmax parametrization always converges, for any initialization, to the same joint policy, and that this joint policy is equivariant w.r.t. all symmetries of the Dec-POMDP. In particular, policies coming from different initializations will be fully compatible, in that their cross-play returns are equal to their self-play returns. Through extensive evaluation of independent PPO, arguably the standard baseline deep multi-agent policy gradient algorithm, in the Hanabi, Over"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"We prove that in any Dec-POMDP, sufficiently high entropy regularization ensures that the policy gradient flow with tabular softmax parametrization always converges, for any initialization, to the same joint policy, and that this joint policy is equivariant w.r.t. all symmetries of the Dec-POMDP.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The assumption that entropy regularization is 'sufficiently high' to force convergence to the unique equivariant policy under tabular softmax parametrization in arbitrary Dec-POMDPs.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"High entropy regularization guarantees convergence to symmetry-equivariant policies in Dec-POMDPs, making cross-play returns match self-play returns.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Sufficiently high entropy regularization in any Dec-POMDP makes policy gradient flow with tabular softmax converge to the same symmetry-equivariant joint policy from every initialization.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"a3d2bbd05f18ee82846ad516b0cd0b177ac98ddb6971b4e3b2ea03a16b59d9b7"},"source":{"id":"2511.22581","kind":"arxiv","version":5},"verdict":{"id":"4fc64729-6705-4235-9b01-1c731c61d2c5","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-17T03:48:27.835788Z","strongest_claim":"We prove that in any Dec-POMDP, sufficiently high entropy regularization ensures that the policy gradient flow with tabular softmax parametrization always converges, for any initialization, to the same joint policy, and that this joint policy is equivariant w.r.t. all symmetries of the Dec-POMDP.","one_line_summary":"High entropy regularization guarantees convergence to symmetry-equivariant policies in Dec-POMDPs, making cross-play returns match self-play returns.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The assumption that entropy regularization is 'sufficiently high' to force convergence to the unique equivariant policy under tabular softmax parametrization in arbitrary Dec-POMDPs.","pith_extraction_headline":"Sufficiently high entropy regularization in any Dec-POMDP makes policy gradient flow with tabular softmax converge to the same symmetry-equivariant joint policy from every initialization."},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2511.22581/integrity.json","findings":[],"available":true,"detectors_run":[],"snapshot_sha256":"c28c3603d3b5d939e8dc4c7e95fa8dfce3d595e45f758748cecf8e644a296938"},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":2,"snapshot_sha256":"171c3368d47694db0f90bf6b61ab3998361e69851e6596670bc5409f555ecf0b"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2511.22581","created_at":"2026-06-08T01:03:51.786731+00:00"},{"alias_kind":"arxiv_version","alias_value":"2511.22581v5","created_at":"2026-06-08T01:03:51.786731+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2511.22581","created_at":"2026-06-08T01:03:51.786731+00:00"},{"alias_kind":"pith_short_12","alias_value":"TVTKIJ5SFEPI","created_at":"2026-06-08T01:03:51.786731+00:00"},{"alias_kind":"pith_short_16","alias_value":"TVTKIJ5SFEPIQA27","created_at":"2026-06-08T01:03:51.786731+00:00"},{"alias_kind":"pith_short_8","alias_value":"TVTKIJ5S","created_at":"2026-06-08T01:03:51.786731+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":1,"internal_anchor_count":1,"sample":[{"citing_arxiv_id":"2606.26463","citing_title":"Finding the Time to Think: Learning Planning Budgets in Real-Time RL","ref_index":10,"is_internal_anchor":true}]},"formal_canon":{"evidence_count":2,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/TVTKIJ5SFEPIQA27ODCVPOUP2Y","json":"https://pith.science/pith/TVTKIJ5SFEPIQA27ODCVPOUP2Y.json","graph_json":"https://pith.science/api/pith-number/TVTKIJ5SFEPIQA27ODCVPOUP2Y/graph.json","events_json":"https://pith.science/api/pith-number/TVTKIJ5SFEPIQA27ODCVPOUP2Y/events.json","paper":"https://pith.science/paper/TVTKIJ5S"},"agent_actions":{"view_html":"https://pith.science/pith/TVTKIJ5SFEPIQA27ODCVPOUP2Y","download_json":"https://pith.science/pith/TVTKIJ5SFEPIQA27ODCVPOUP2Y.json","view_paper":"https://pith.science/paper/TVTKIJ5S","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2511.22581&json=true","fetch_graph":"https://pith.science/api/pith-number/TVTKIJ5SFEPIQA27ODCVPOUP2Y/graph.json","fetch_events":"https://pith.science/api/pith-number/TVTKIJ5SFEPIQA27ODCVPOUP2Y/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/TVTKIJ5SFEPIQA27ODCVPOUP2Y/action/timestamp_anchor","attest_storage":"https://pith.science/pith/TVTKIJ5SFEPIQA27ODCVPOUP2Y/action/storage_attestation","attest_author":"https://pith.science/pith/TVTKIJ5SFEPIQA27ODCVPOUP2Y/action/author_attestation","sign_citation":"https://pith.science/pith/TVTKIJ5SFEPIQA27ODCVPOUP2Y/action/citation_signature","submit_replication":"https://pith.science/pith/TVTKIJ5SFEPIQA27ODCVPOUP2Y/action/replication_record"}},"created_at":"2026-06-08T01:03:51.786731+00:00","updated_at":"2026-06-08T01:03:51.786731+00:00"}