MSRL represents trajectory segments as PSD matrices to prove additive composition properties and bootstrap value functions for better transfer, reaching 0.73 AUC versus 0.57-0.65 baselines.
Geometry of drifting mdps with path-integral stability certificates
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.LG 3years
2026 3verdicts
UNVERDICTED 3representative citing papers
HPML projects multi-agent update fields onto the closest metric-gradient potential flow via Hodge decomposition, yielding Lyapunov potentials and equilibrium-gap bounds.
DPA-GRPO trains a generator-verifier pair via group-relative policy optimization on paired counterfactual actions, improving structured output accuracy on TaxCalcBench over zero-shot and generator-only baselines.
citing papers explorer
-
Matrix-Space Reinforcement Learning for Reusing Local Transition Geometry
MSRL represents trajectory segments as PSD matrices to prove additive composition properties and bootstrap value functions for better transfer, reaching 0.73 AUC versus 0.57-0.65 baselines.
-
Metric-Gradient Projection for Stable Multi-Agent Policy Learning
HPML projects multi-agent update fields onto the closest metric-gradient potential flow via Hodge decomposition, yielding Lyapunov potentials and equilibrium-gap bounds.
-
Interactive Critique-Revision Training for Reliable Structured LLM Generation
DPA-GRPO trains a generator-verifier pair via group-relative policy optimization on paired counterfactual actions, improving structured output accuracy on TaxCalcBench over zero-shot and generator-only baselines.