SubMAPG uses a new Partition Multilinear Extension to derive unbiased policy gradients from submodular difference rewards, delivering 1/2-approximation and sublinear dynamic regret for online distributed task allocation in open multi-agent systems.
Neural networks , volume=
4 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
SPIN lets weak LLMs become strong by self-generating training data from previous model versions and training to prefer human-annotated responses over its own outputs, outperforming DPO even with extra GPT-4 data on benchmarks.
A differentiable chemistry solver is added to PINNs along with parameterized network architecture and stiffness-tailored residual weighting to solve initial/boundary value problems, inverse parameter identification, and parameterized PDEs for hydrogen combustion.
citing papers explorer
-
Submodular Multi-Agent Policy Learning for Online Distributed Task Allocation in Open Multi-Agent Systems
SubMAPG uses a new Partition Multilinear Extension to derive unbiased policy gradients from submodular difference rewards, delivering 1/2-approximation and sublinear dynamic regret for online distributed task allocation in open multi-agent systems.
-
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
SPIN lets weak LLMs become strong by self-generating training data from previous model versions and training to prefer human-annotated responses over its own outputs, outperforming DPO even with extra GPT-4 data on benchmarks.
-
Differentiable Chemistry in PINNs for Solving Parameterized and Stiff Reaction Systems
A differentiable chemistry solver is added to PINNs along with parameterized network architecture and stiffness-tailored residual weighting to solve initial/boundary value problems, inverse parameter identification, and parameterized PDEs for hydrogen combustion.
- Query-efficient model evaluation using cached responses