MSRL represents trajectory segments as PSD matrices to prove additive composition properties and bootstrap value functions for better transfer, reaching 0.73 AUC versus 0.57-0.65 baselines.
Cochain perspectives on temporal-difference signals for learning beyond markov dynamics.arXiv preprint arXiv:2602.06939, 2026b
4 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.LG 4years
2026 4verdicts
UNVERDICTED 4representative citing papers
HPML projects multi-agent update fields onto the closest metric-gradient potential flow via Hodge decomposition, yielding Lyapunov potentials and equilibrium-gap bounds.
VPSD-RL discovers exact and approximate value-preserving Lie-group operators in continuous RL to stabilize learning via transition augmentation and consistency regularization.
NonZero introduces an interaction score and bandit-formalized proposal rule for local agent deviations in multi-agent MCTS, delivering a sublinear local-regret guarantee and improved sample efficiency on game benchmarks without full joint-action enumeration.
citing papers explorer
-
Matrix-Space Reinforcement Learning for Reusing Local Transition Geometry
MSRL represents trajectory segments as PSD matrices to prove additive composition properties and bootstrap value functions for better transfer, reaching 0.73 AUC versus 0.57-0.65 baselines.
-
Metric-Gradient Projection for Stable Multi-Agent Policy Learning
HPML projects multi-agent update fields onto the closest metric-gradient potential flow via Hodge decomposition, yielding Lyapunov potentials and equilibrium-gap bounds.
-
Operator-Guided Invariance Learning for Continuous Reinforcement Learning
VPSD-RL discovers exact and approximate value-preserving Lie-group operators in continuous RL to stabilize learning via transition augmentation and consistency regularization.
-
NonZero: Interaction-Guided Exploration for Multi-Agent Monte Carlo Tree Search
NonZero introduces an interaction score and bandit-formalized proposal rule for local agent deviations in multi-agent MCTS, delivering a sublinear local-regret guarantee and improved sample efficiency on game benchmarks without full joint-action enumeration.