John Wiley & Sons

Martin L Puterman · 2014

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

browse 8 citing papers

citation-role summary

background 1 method 1

citation-polarity summary

background 1 use method 1

representative citing papers

Wasserstein-p Central Limit Theorem Rates: From Local Dependence to Markov Chains

math.PR · 2026-01-13 · unverdicted · novelty 8.0

The paper proves the first optimal O(n^{-1/2}) Wasserstein-1 CLT rates for locally dependent sequences and geometrically ergodic Markov chains, plus new W_p rates for p greater than or equal to 2 under mild moments, with an application to U-statistics.

State-Centric Decision Process

cs.AI · 2026-05-12 · unverdicted · novelty 7.0

SDP constructs a task-induced state space from raw text by having agents commit to and certify natural-language predicates as states, enabling structured planning and analysis in unstructured language environments.

Iterative Critique-and-Routing Controller for Multi-Agent Systems with Heterogeneous LLMs

cs.AI · 2026-05-09 · unverdicted · novelty 6.0

A critique-and-routing controller cast as a finite-horizon MDP with policy-gradient optimization outperforms one-shot routing baselines on reasoning benchmarks while using the strongest agent for under 25% of calls.

When Dynamics Shift, Robust Task Inference Wins: Offline Imitation Learning with Behavior Foundation Models Revisited

cs.LG · 2026-05-16 · unverdicted · novelty 5.0

Robust minimax task inference in BFMs achieves dynamics-shift robustness from nominal offline data alone and outperforms standard baselines.

Toward Template-Free Explainability for Monte Carlo Tree Search

cs.HC · 2026-05-15 · unverdicted · novelty 5.0 · 2 refs

Framework uses LLMs to map natural-language questions about MCTS to explanations based on tree statistics like visit counts and values, without hand-crafted formal logic.

Value Mirror Descent for Reinforcement Learning

math.OC · 2026-04-07 · unverdicted · novelty 5.0

Value mirror descent integrates mirror descent into value iteration for discounted MDPs, delivering near-optimal sample complexity of order |S||A|(1-γ)^{-3}ε^{-2} for general convex regularizers and bounded Bregman divergence between generated and optimal policies.

Optimal sequential decision-making for error propagation mitigation in digital twins

cs.LG · 2026-04-24 · unverdicted · novelty 4.0

Error propagation mitigation in digital twins is cast as an MDP/POMDP with HMM-derived regimes as states, where the MDP policy maximizes reward and the POMDP recovers 95% of that performance.

Sample Complexity for Markov Decision Processes and Stochastic Optimal Control with Static Risk Measures

math.OC · 2026-04-06 · unverdicted · novelty 4.0

State augmentation allows dynamic programming and sample complexity bounds for MDPs and optimal control under static risk measures including CVaR.

citing papers explorer

Showing 8 of 8 citing papers.

Wasserstein-p Central Limit Theorem Rates: From Local Dependence to Markov Chains math.PR · 2026-01-13 · unverdicted · none · ref 60
The paper proves the first optimal O(n^{-1/2}) Wasserstein-1 CLT rates for locally dependent sequences and geometrically ergodic Markov chains, plus new W_p rates for p greater than or equal to 2 under mild moments, with an application to U-statistics.
State-Centric Decision Process cs.AI · 2026-05-12 · unverdicted · none · ref 31
SDP constructs a task-induced state space from raw text by having agents commit to and certify natural-language predicates as states, enabling structured planning and analysis in unstructured language environments.
Iterative Critique-and-Routing Controller for Multi-Agent Systems with Heterogeneous LLMs cs.AI · 2026-05-09 · unverdicted · none · ref 24
A critique-and-routing controller cast as a finite-horizon MDP with policy-gradient optimization outperforms one-shot routing baselines on reasoning benchmarks while using the strongest agent for under 25% of calls.
When Dynamics Shift, Robust Task Inference Wins: Offline Imitation Learning with Behavior Foundation Models Revisited cs.LG · 2026-05-16 · unverdicted · none · ref 46
Robust minimax task inference in BFMs achieves dynamics-shift robustness from nominal offline data alone and outperforms standard baselines.
Toward Template-Free Explainability for Monte Carlo Tree Search cs.HC · 2026-05-15 · unverdicted · none · ref 4 · 2 links
Framework uses LLMs to map natural-language questions about MCTS to explanations based on tree statistics like visit counts and values, without hand-crafted formal logic.
Value Mirror Descent for Reinforcement Learning math.OC · 2026-04-07 · unverdicted · none · ref 24
Value mirror descent integrates mirror descent into value iteration for discounted MDPs, delivering near-optimal sample complexity of order |S||A|(1-γ)^{-3}ε^{-2} for general convex regularizers and bounded Bregman divergence between generated and optimal policies.
Optimal sequential decision-making for error propagation mitigation in digital twins cs.LG · 2026-04-24 · unverdicted · none · ref 12
Error propagation mitigation in digital twins is cast as an MDP/POMDP with HMM-derived regimes as states, where the MDP policy maximizes reward and the POMDP recovers 95% of that performance.
Sample Complexity for Markov Decision Processes and Stochastic Optimal Control with Static Risk Measures math.OC · 2026-04-06 · unverdicted · none · ref 23
State augmentation allows dynamic programming and sample complexity bounds for MDPs and optimal control under static risk measures including CVaR.

John Wiley & Sons

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer