Deep transferq-learning for offline non-stationary reinforcement learning.arXiv preprint arXiv:2501.04870

Jinhang Chai, Elynn Chen, Jianqing Fan · 2025 · arXiv 2501.04870

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

unclear 1

representative citing papers

Learning to Hand Off: Provably Convergent Workflow Learning under Interface Constraints

cs.AI · 2026-05-18 · unverdicted · novelty 8.0

Formalizes interface-constrained semi-Markov decision processes and proves a finite-sample bound for neural IC-Q that decomposes into neural approximation error, interface gap, and mixing-time residual, with experiments showing parity to centralized oracles.

Dual-Channel Tensor Neural Networks: Finite-Sample Theory and Conformal Structure Selection

stat.ML · 2026-05-18 · unverdicted · novelty 6.0

DC-TNN decomposes tensors into low-rank core plus sparse refinement fed to coupled neural channels, yielding non-asymptotic risk bounds and the first distribution-free conformal procedure for selecting among tensor decompositions.

Kernelized Advantage Estimation: From Nonparametric Statistics to LLM Reasoning

cs.LG · 2026-04-30 · unverdicted · novelty 6.0 · 2 refs

Kernel smoothing enables accurate low-variance value and gradient estimates for policy optimization in LLM reasoning under tight sampling constraints per prompt.

citing papers explorer

Showing 3 of 3 citing papers.

Learning to Hand Off: Provably Convergent Workflow Learning under Interface Constraints cs.AI · 2026-05-18 · unverdicted · none · ref 7
Formalizes interface-constrained semi-Markov decision processes and proves a finite-sample bound for neural IC-Q that decomposes into neural approximation error, interface gap, and mixing-time residual, with experiments showing parity to centralized oracles.
Dual-Channel Tensor Neural Networks: Finite-Sample Theory and Conformal Structure Selection stat.ML · 2026-05-18 · unverdicted · none · ref 23
DC-TNN decomposes tensors into low-rank core plus sparse refinement fed to coupled neural channels, yielding non-asymptotic risk bounds and the first distribution-free conformal procedure for selecting among tensor decompositions.
Kernelized Advantage Estimation: From Nonparametric Statistics to LLM Reasoning cs.LG · 2026-04-30 · unverdicted · none · ref 1 · 2 links
Kernel smoothing enables accurate low-variance value and gradient estimates for policy optimization in LLM reasoning under tight sampling constraints per prompt.

Deep transferq-learning for offline non-stationary reinforcement learning.arXiv preprint arXiv:2501.04870

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer