Mixed citations

Wiley Series in Probability and Statistics, Wiley (1994)

Martin L. Puterman · 2005 · Wiley Series in Probability and Statistics · DOI 10.1002/9780470316887

Mixed citation behavior. Most common role is background (60%).

11 Pith papers citing it

4,524 external citations · Crossref

Background 60% of classified citations

open at publisher browse 11 citing papers

citation-role summary

background 3 method 1 other 1

citation-polarity summary

background 3 unclear 1 use method 1

representative citing papers

TokenRatio: Principled Token-Level Preference Optimization via Ratio Matching

cs.CL · 2026-05-12 · unverdicted · novelty 7.0

TBPO posits a token-level Bradley-Terry model and derives a Bregman-divergence density-ratio matching loss that generalizes DPO while preserving token-level optimality.

Fast Computation of Conditional Probabilities in MDPs and Markov Chain Families

cs.LO · 2026-05-12 · unverdicted · novelty 7.0

A new efficient algorithm computes optimal conditional reachability probabilities in MDPs without creating hard cyclic reductions, achieving linear time on acyclic cases and substantial speedups on benchmarks from Bayesian networks, probabilistic programs, and runtime monitoring.

Multi-Environment POMDPs with Finite-Horizon Objectives

cs.AI · 2026-05-08 · unverdicted · novelty 7.0

The optimal value and policy computation problem for finite-horizon objectives in multi-environment POMDPs is PSPACE-complete, and a new algorithm solves it more efficiently than previous methods on classical benchmarks.

Probabilistic Hazard Analysis Framework with Stochastic Optimal Control for Deteriorating Civil Infrastructure Systems

eess.SY · 2026-04-24 · unverdicted · novelty 7.0

A life-cycle optimization framework for deteriorating infrastructure under hazards is formulated as an MDP with a Kronecker-factored tensor method that reduces computational complexity from exponential to linear while preserving exact dynamic programming solutions.

Scaling Observation-aware Planning in Uncertain Domains

cs.AI · 2026-05-21 · unverdicted · novelty 6.0

A POMDP decomposition method scales solving of the Sensor Selection Problem and Positional Observability Problem by 3 and 5 orders of magnitude in instance size and runtime.

Long-Horizon Q-Learning: Accurate Value Learning via n-Step Inequalities

cs.AI · 2026-05-07 · unverdicted · novelty 6.0 · 2 refs

LQL turns n-step action-sequence lower bounds into a practical hinge-loss stabilizer for off-policy Q-learning without extra networks or forward passes.

The hidden risks of temporal resampling in clinical reinforcement learning

cs.LG · 2026-02-06 · conditional · novelty 6.0

Resampling clinical time series into uniform bins for offline RL reduces performance by up to 60% and causes retrospective evaluations to overestimate returns by 1.5-3x versus unprocessed data.

AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning

cs.LG · 2025-05-30 · conditional · novelty 6.0

AReaL decouples generation and training in LLM reinforcement learning to achieve up to 2.77x speedup with matched or better performance on math and code benchmarks.

Benchmark Data Contamination of Large Language Models: A Survey

cs.CL · 2024-06-06 · unverdicted · novelty 3.0

A survey reviewing benchmark data contamination in LLMs, its impact on evaluation, and alternative assessment approaches.

Deep Learning for Sequential Decision Making under Uncertainty: Foundations, Frameworks, and Frontiers

math.OC · 2026-04-13 · unverdicted · novelty 2.0

A tutorial framing deep learning as a complement to optimization for sequential decision-making under uncertainty, with applications in supply chains, healthcare, and energy.

Optimal strategies in the all-heads coin game

math.PR · 2026-04-24

citing papers explorer

Showing 11 of 11 citing papers.

TokenRatio: Principled Token-Level Preference Optimization via Ratio Matching cs.CL · 2026-05-12 · unverdicted · none · ref 181
TBPO posits a token-level Bradley-Terry model and derives a Bregman-divergence density-ratio matching loss that generalizes DPO while preserving token-level optimality.
Fast Computation of Conditional Probabilities in MDPs and Markov Chain Families cs.LO · 2026-05-12 · unverdicted · none · ref 29
A new efficient algorithm computes optimal conditional reachability probabilities in MDPs without creating hard cyclic reductions, achieving linear time on acyclic cases and substantial speedups on benchmarks from Bayesian networks, probabilistic programs, and runtime monitoring.
Multi-Environment POMDPs with Finite-Horizon Objectives cs.AI · 2026-05-08 · unverdicted · none · ref 12
The optimal value and policy computation problem for finite-horizon objectives in multi-environment POMDPs is PSPACE-complete, and a new algorithm solves it more efficiently than previous methods on classical benchmarks.
Probabilistic Hazard Analysis Framework with Stochastic Optimal Control for Deteriorating Civil Infrastructure Systems eess.SY · 2026-04-24 · unverdicted · none · ref 80
A life-cycle optimization framework for deteriorating infrastructure under hazards is formulated as an MDP with a Kronecker-factored tensor method that reduces computational complexity from exponential to linear while preserving exact dynamic programming solutions.
Scaling Observation-aware Planning in Uncertain Domains cs.AI · 2026-05-21 · unverdicted · none · ref 24
A POMDP decomposition method scales solving of the Sensor Selection Problem and Positional Observability Problem by 3 and 5 orders of magnitude in instance size and runtime.
Long-Horizon Q-Learning: Accurate Value Learning via n-Step Inequalities cs.AI · 2026-05-07 · unverdicted · none · ref 35 · 2 links
LQL turns n-step action-sequence lower bounds into a practical hinge-loss stabilizer for off-policy Q-learning without extra networks or forward passes.
The hidden risks of temporal resampling in clinical reinforcement learning cs.LG · 2026-02-06 · conditional · none · ref 28
Resampling clinical time series into uniform bins for offline RL reduces performance by up to 60% and causes retrospective evaluations to overestimate returns by 1.5-3x versus unprocessed data.
AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning cs.LG · 2025-05-30 · conditional · none · ref 39
AReaL decouples generation and training in LLM reinforcement learning to achieve up to 2.77x speedup with matched or better performance on math and code benchmarks.
Benchmark Data Contamination of Large Language Models: A Survey cs.CL · 2024-06-06 · unverdicted · none · ref 119
A survey reviewing benchmark data contamination in LLMs, its impact on evaluation, and alternative assessment approaches.
Deep Learning for Sequential Decision Making under Uncertainty: Foundations, Frameworks, and Frontiers math.OC · 2026-04-13 · unverdicted · none · ref 104
A tutorial framing deep learning as a complement to optimization for sequential decision-making under uncertainty, with applications in supply chains, healthcare, and energy.
Optimal strategies in the all-heads coin game math.PR · 2026-04-24 · unreviewed · ref 6

Wiley Series in Probability and Statistics, Wiley (1994)

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer