Pith Number

pith:THM2OTVT

pith:2026:THM2OTVTUV4C33HYFVOBJMYKZU

not attested not anchored not stored refs resolved

Collaborative Yet Personalized Policy Training: Single-Timescale Federated Actor-Critic

Leo Muxing Wang, Lili Su, Pengkun Yang

Agents share a linear subspace for collaboration while keeping personalized policies, yielding finite-time convergence rates that scale linearly with the number of agents under single-timescale Markovian updates.

arxiv:2605.14423 v1 · 2026-05-14 · cs.LG · cs.AI

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{THM2OTVTUV4C33HYFVOBJMYKZU}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Under canonical single-timescale updates with Markovian sampling, we establish finite-time convergence via a novel joint linear approximation framework. Specifically, we show that the critic error converges to zero at the rate of Õ(1/((1−γ)4√(TK))), and the policy gradient norm converges to zero at the rate of Õ(1/((1−γ)6√(TK))), ... These results demonstrate linear speedup with respect to the number of agents K, despite heterogeneous Markovian trajectories under distinct transition kernels and coupled learning dynamics.

C2weakest assumption

That a single common linear subspace is expressive enough to capture the shared structure across all agents' heterogeneous environments while the remaining personalization can be handled by local heads, and that the perturbation analysis for projected subspace updates and the conditional mixing arguments for heterogeneous Markovian noise remain valid under the coupled policy-critic dynamics.

C3one line summary

A federated actor-critic framework lets agents share a linear subspace representation for policies while maintaining personalized local actors and critics, achieving critic error and policy gradient convergence rates of order 1 over square root of TK with linear speedup in K agents under environment

References

61 extracted · 61 resolved · 1 Pith anchors

[1] R. K. Ando, T. Zhang, and P. Bartlett. A framework for learning predictive structures from multiple tasks and unlabeled data.Journal of machine learning research, 6(11), 2005 2005

[2] Y . Bengio, A. Courville, and P. Vincent. Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence, 35(8):1798–1828, 2013 2013

[3] J. Bhandari, D. Russo, and R. Singal. A finite time analysis of temporal difference learning with linear function approximation. InConference on learning theory, pages 1691–1692. PMLR, 2018 2018

[4] R. Caruana. Multitask learning.Machine learning, 28:41–75, 1997 1997

[5] T. Chen, Y . Sun, and W. Yin. Closing the gap: Tighter analysis of alternating stochastic gradient methods for bilevel problems.Advances in Neural Information Processing Systems, 34:25294–25307, 2021 2021

Receipt and verification

First computed	2026-05-17T23:39:07.224863Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

99d9a74eb3a5782decf82d5c14b30acd273b770fff7075175d2fc7f17f6d2c35

Aliases

arxiv: 2605.14423 · arxiv_version: 2605.14423v1 · doi: 10.48550/arxiv.2605.14423 · pith_short_12: THM2OTVTUV4C · pith_short_16: THM2OTVTUV4C33HY · pith_short_8: THM2OTVT

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/THM2OTVTUV4C33HYFVOBJMYKZU \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 99d9a74eb3a5782decf82d5c14b30acd273b770fff7075175d2fc7f17f6d2c35

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "7f3d656ea630b8a1489a217c6e70ae74211d324fe11b756ff46e72939594c50c",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-05-14T06:10:31Z",
    "title_canon_sha256": "ea38118b998e53d9d07f82342507701f103b0b2d618b67e6ca3392c021006205"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.14423",
    "kind": "arxiv",
    "version": 1
  }
}