{"paper":{"title":"Collaborative Yet Personalized Policy Training: Single-Timescale Federated Actor-Critic","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Agents share a linear subspace for collaboration while keeping personalized policies, yielding finite-time convergence rates that scale linearly with the number of agents under single-timescale Markovian updates.","cross_cats":["cs.AI"],"primary_cat":"cs.LG","authors_text":"Leo Muxing Wang, Lili Su, Pengkun Yang","submitted_at":"2026-05-14T06:10:31Z","abstract_excerpt":"Despite the popularity of the actor-critic method and the practical needs of collaborative policy training, existing works typically either overlook environmental heterogeneity or give up personalization altogether by training a single shared policy across all agents. We consider a federated actor-critic framework in which agents share a common linear subspace representation while maintaining personalized local policy components, and agents iteratively estimate the common subspace, local critic heads, and local policies (i.e., actors). Under canonical single-timescale updates with Markovian sa"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"Under canonical single-timescale updates with Markovian sampling, we establish finite-time convergence via a novel joint linear approximation framework. Specifically, we show that the critic error converges to zero at the rate of Õ(1/((1−γ)4√(TK))), and the policy gradient norm converges to zero at the rate of Õ(1/((1−γ)6√(TK))), ... These results demonstrate linear speedup with respect to the number of agents K, despite heterogeneous Markovian trajectories under distinct transition kernels and coupled learning dynamics.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That a single common linear subspace is expressive enough to capture the shared structure across all agents' heterogeneous environments while the remaining personalization can be handled by local heads, and that the perturbation analysis for projected subspace updates and the conditional mixing arguments for heterogeneous Markovian noise remain valid under the coupled policy-critic dynamics.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"A federated actor-critic framework lets agents share a linear subspace representation for policies while maintaining personalized local actors and critics, achieving critic error and policy gradient convergence rates of order 1 over square root of TK with linear speedup in K agents under environment","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Agents share a linear subspace for collaboration while keeping personalized policies, yielding finite-time convergence rates that scale linearly with the number of agents under single-timescale Markovian updates.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"c1b17bf5b414ea2327c9ed47aa44eb3cd67b3501f630d970732601ae3b80ee57"},"source":{"id":"2605.14423","kind":"arxiv","version":1},"verdict":{"id":"77948ec8-95bd-4a52-96cf-5c98724be0da","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T02:03:44.452727Z","strongest_claim":"Under canonical single-timescale updates with Markovian sampling, we establish finite-time convergence via a novel joint linear approximation framework. Specifically, we show that the critic error converges to zero at the rate of Õ(1/((1−γ)4√(TK))), and the policy gradient norm converges to zero at the rate of Õ(1/((1−γ)6√(TK))), ... These results demonstrate linear speedup with respect to the number of agents K, despite heterogeneous Markovian trajectories under distinct transition kernels and coupled learning dynamics.","one_line_summary":"A federated actor-critic framework lets agents share a linear subspace representation for policies while maintaining personalized local actors and critics, achieving critic error and policy gradient convergence rates of order 1 over square root of TK with linear speedup in K agents under environment","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That a single common linear subspace is expressive enough to capture the shared structure across all agents' heterogeneous environments while the remaining personalization can be handled by local heads, and that the perturbation analysis for projected subspace updates and the conditional mixing arguments for heterogeneous Markovian noise remain valid under the coupled policy-critic dynamics.","pith_extraction_headline":"Agents share a linear subspace for collaboration while keeping personalized policies, yielding finite-time convergence rates that scale linearly with the number of agents under single-timescale Markovian updates."},"references":{"count":61,"sample":[{"doi":"","year":2005,"title":"R. K. Ando, T. Zhang, and P. Bartlett. A framework for learning predictive structures from multiple tasks and unlabeled data.Journal of machine learning research, 6(11), 2005","work_id":"c6dd3559-46f8-45d0-81fa-5759eda86d58","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2013,"title":"Y . Bengio, A. Courville, and P. Vincent. Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence, 35(8):1798–1828, 2013","work_id":"202bce67-d4e5-4b91-9bdc-f0892fdc499f","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2018,"title":"J. Bhandari, D. Russo, and R. Singal. A finite time analysis of temporal difference learning with linear function approximation. InConference on learning theory, pages 1691–1692. PMLR, 2018","work_id":"32ebb3e7-ccea-4b9a-8509-ffd7e0be992e","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":1997,"title":"R. Caruana. Multitask learning.Machine learning, 28:41–75, 1997","work_id":"a8653e19-4c38-424e-9302-6febaf0d1ada","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2021,"title":"T. Chen, Y . Sun, and W. Yin. Closing the gap: Tighter analysis of alternating stochastic gradient methods for bilevel problems.Advances in Neural Information Processing Systems, 34:25294–25307, 2021","work_id":"9b6ead6a-6637-44f2-9f32-3398cd110886","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":61,"snapshot_sha256":"e560542a1cdabfe0bdd2f1d4d1d74a75bdbe55056c01eb45e25b35a9f006355b","internal_anchors":1},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}