pith. sign in

Title resolution pending

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

years

2026 3

verdicts

UNVERDICTED 3

representative citing papers

Behavior-Consistent Deep Reinforcement Learning

cs.LG · 2026-05-20 · unverdicted · novelty 6.0 · 2 refs

QED bounds cross-run KL divergence in Boltzmann policies by setting temperature proportional to Q-disagreement and reduces return variance by two orders of magnitude on 18 continuous-control tasks without performance loss.

Actor-Critic Algorithm for Dynamic Expectile and CVaR

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

A model-free off-policy actor-critic algorithm is constructed for dynamic expectile and CVaR using a surrogate policy gradient without transition perturbation and elicitability-based value learning, with empirical outperformance in risk-averse domains.

citing papers explorer

Showing 3 of 3 citing papers.

  • Concentration of General Stochastic Approximation Under Heavy-Tailed Markovian Noise math.PR · 2026-05-20 · unverdicted · none · ref 45

    Establishes maximal concentration bounds for stochastic approximation under heavy-tailed Markovian noise, with tails ranging from sub-Gaussian to heavier than Weibull depending on step sizes and contractivity properties, plus a truncation argument for unbounded noise.

  • Behavior-Consistent Deep Reinforcement Learning cs.LG · 2026-05-20 · unverdicted · none · ref 293 · 2 links

    QED bounds cross-run KL divergence in Boltzmann policies by setting temperature proportional to Q-disagreement and reduces return variance by two orders of magnitude on 18 continuous-control tasks without performance loss.

  • Actor-Critic Algorithm for Dynamic Expectile and CVaR cs.LG · 2026-05-08 · unverdicted · none · ref 33

    A model-free off-policy actor-critic algorithm is constructed for dynamic expectile and CVaR using a surrogate policy gradient without transition perturbation and elicitability-based value learning, with empirical outperformance in risk-averse domains.