pith. sign in

Journal of Machine Learning Research , volume=

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

citation-role summary

method 1

citation-polarity summary

years

2026 5

roles

method 1

polarities

use method 1

representative citing papers

Randomness is sometimes necessary for coordination

cs.AI · 2026-05-07 · conditional · novelty 7.0

Structured per-agent randomness via ranked masking in attention allows symmetric agents to break ties and coordinate, achieving perfect success on symmetric tasks where deterministic policies fail and enabling zero-shot transfer across team sizes.

TeamTR: Trust-Region Fine-Tuning for Multi-Agent LLM Coordination

cs.LG · 2026-05-01 · unverdicted · novelty 6.0

TeamTR is a trust-region framework for multi-agent LLM fine-tuning that resamples trajectories after each update to convert quadratic compounding occupancy shift into linear scaling and yields per-update improvement lower bounds.

citing papers explorer

Showing 5 of 5 citing papers.

  • Submodular Multi-Agent Policy Learning for Online Distributed Task Allocation in Open Multi-Agent Systems eess.SY · 2026-05-13 · unverdicted · none · ref 108

    SubMAPG uses a new Partition Multilinear Extension to derive unbiased policy gradients from submodular difference rewards, delivering 1/2-approximation and sublinear dynamic regret for online distributed task allocation in open multi-agent systems.

  • Randomness is sometimes necessary for coordination cs.AI · 2026-05-07 · conditional · none · ref 14

    Structured per-agent randomness via ranked masking in attention allows symmetric agents to break ties and coordinate, achieving perfect success on symmetric tasks where deterministic policies fail and enabling zero-shot transfer across team sizes.

  • NonZero: Interaction-Guided Exploration for Multi-Agent Monte Carlo Tree Search cs.LG · 2026-05-01 · unverdicted · none · ref 38

    NonZero introduces an interaction score and bandit-formalized proposal rule for local agent deviations in multi-agent MCTS, delivering a sublinear local-regret guarantee and improved sample efficiency on game benchmarks without full joint-action enumeration.

  • TeamTR: Trust-Region Fine-Tuning for Multi-Agent LLM Coordination cs.LG · 2026-05-01 · unverdicted · none · ref 30

    TeamTR is a trust-region framework for multi-agent LLM fine-tuning that resamples trajectories after each update to convert quadratic compounding occupancy shift into linear scaling and yields per-update improvement lower bounds.

  • ERPPO: Entropy Regularization-based Proximal Policy Optimization cs.LG · 2026-05-13 · unverdicted · none · ref 8

    ERPPO adds a DSA-based ambiguity estimator to MAPPO and switches between L1 and L2 entropy regularization to improve exploration and stability in non-stationary multi-dimensional observations.