Formalizes interface-constrained semi-Markov decision processes and proves a finite-sample bound for neural IC-Q that decomposes into neural approximation error, interface gap, and mixing-time residual, with experiments showing parity to centralized oracles.
Deep transferq-learning for offline non-stationary reinforcement learning.arXiv preprint arXiv:2501.04870
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3roles
background 1polarities
unclear 1representative citing papers
DC-TNN decomposes tensors into low-rank core plus sparse refinement fed to coupled neural channels, yielding non-asymptotic risk bounds and the first distribution-free conformal procedure for selecting among tensor decompositions.
Kernel smoothing enables accurate low-variance value and gradient estimates for policy optimization in LLM reasoning under tight sampling constraints per prompt.
citing papers explorer
-
Learning to Hand Off: Provably Convergent Workflow Learning under Interface Constraints
Formalizes interface-constrained semi-Markov decision processes and proves a finite-sample bound for neural IC-Q that decomposes into neural approximation error, interface gap, and mixing-time residual, with experiments showing parity to centralized oracles.
-
Dual-Channel Tensor Neural Networks: Finite-Sample Theory and Conformal Structure Selection
DC-TNN decomposes tensors into low-rank core plus sparse refinement fed to coupled neural channels, yielding non-asymptotic risk bounds and the first distribution-free conformal procedure for selecting among tensor decompositions.
-
Kernelized Advantage Estimation: From Nonparametric Statistics to LLM Reasoning
Kernel smoothing enables accurate low-variance value and gradient estimates for policy optimization in LLM reasoning under tight sampling constraints per prompt.