hub Mixed citations

Wiley Series in Probability and Statistics, Wiley (1994)

Martin L. Puterman · 2017 · Wiley Series in Probability and Statistics · DOI 10.1002/9780470316887

Mixed citation behavior. Most common role is background (60%).

18 Pith papers citing it

4,524 external citations · Crossref

Background 60% of classified citations

open at publisher browse 18 citing papers

hub tools

JSON dossier citing papers JSON publisher DOI

citation-role summary

background 3 method 1 other 1

citation-polarity summary

background 3 unclear 1 use method 1

representative citing papers

Heavy-Ball Q-Learning with Residual Weighting Correction

cs.LG · 2026-06-25 · unverdicted · novelty 7.0

Corrected heavy-ball Q-learning with convergence and acceleration guarantees is derived via switched linear system and joint spectral radius analysis, extended to linear function approximation.

The Value Function Semi-Algebraic Set in Partially Observable Markov Decision Processes

math.OC · 2026-06-02 · unverdicted · novelty 7.0

Feasible value functions in POMDPs under memoryless policies form a semi-algebraic set defined by polynomial inequalities from the model parameters.

Adaptive clinical trials based on design-optimal e-values with automatic curtailment: An application to single-arm trials with binary data

stat.ME · 2026-05-27 · unverdicted · novelty 7.0

Finite-horizon optimal e-value designs for adaptive single-arm binary trials are constructed via dynamic programming and shown to have competitive operating characteristics with automatic futility indication.

Fast Computation of Conditional Probabilities in MDPs and Markov Chain Families

cs.LO · 2026-05-12 · unverdicted · novelty 7.0

A new efficient algorithm computes optimal conditional reachability probabilities in MDPs without creating hard cyclic reductions, achieving linear time on acyclic cases and substantial speedups on benchmarks from Bayesian networks, probabilistic programs, and runtime monitoring.

Multi-Environment POMDPs with Finite-Horizon Objectives

cs.AI · 2026-05-08 · unverdicted · novelty 7.0

The optimal value and policy computation problem for finite-horizon objectives in multi-environment POMDPs is PSPACE-complete, and a new algorithm solves it more efficiently than previous methods on classical benchmarks.

Probabilistic Hazard Analysis Framework with Stochastic Optimal Control for Deteriorating Civil Infrastructure Systems

eess.SY · 2026-04-24 · unverdicted · novelty 7.0

A life-cycle optimization framework for deteriorating infrastructure under hazards is formulated as an MDP with a Kronecker-factored tensor method that reduces computational complexity from exponential to linear while preserving exact dynamic programming solutions.

UMB: A Unified Markov Binary Format for Probabilistic Model Checking (extended version)

cs.LO · 2026-06-16 · unverdicted · novelty 6.0

UMB is a new binary file format for probabilistic systems that provides a unified, efficient alternative to tool-specific textual representations.

Consistent Distributed Cooperative Localization for Ultra Large-Scale Multi-agent Systems

eess.SY · 2026-06-03 · unverdicted · novelty 6.0

A new cooperative localization algorithm based on overlapping covariance intersection is fully distributed, provably recursively consistent, and scalable to ultra large-scale multi-agent systems without performance loss from ignored cross-correlations.

Scaling Observation-aware Planning in Uncertain Domains

cs.AI · 2026-05-21 · unverdicted · novelty 6.0

A POMDP decomposition method scales solving of the Sensor Selection Problem and Positional Observability Problem by 3 and 5 orders of magnitude in instance size and runtime.

Long-Horizon Q-Learning: Accurate Value Learning via n-Step Inequalities

cs.AI · 2026-05-07 · unverdicted · novelty 6.0 · 2 refs

LQL turns n-step action-sequence lower bounds into a practical hinge-loss stabilizer for off-policy Q-learning without extra networks or forward passes.

The hidden risks of temporal resampling in clinical reinforcement learning

cs.LG · 2026-02-06 · conditional · novelty 6.0

Resampling clinical time series into uniform bins for offline RL reduces performance by up to 60% and causes retrospective evaluations to overestimate returns by 1.5-3x versus unprocessed data.

AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning

cs.LG · 2025-05-30 · conditional · novelty 6.0

AReaL decouples generation and training in LLM reinforcement learning to achieve up to 2.77x speedup with matched or better performance on math and code benchmarks.

Constrained Multi-Objective Reinforcement Learning with Max-Min Criterion

cs.LG · 2026-05-29 · unverdicted · novelty 4.0

Introduces a constrained max-min MORL algorithm with convergence analysis, validated in tabular settings and three simulated control domains.

Benchmark Data Contamination of Large Language Models: A Survey

cs.CL · 2024-06-06 · unverdicted · novelty 3.0

A survey reviewing benchmark data contamination in LLMs, its impact on evaluation, and alternative assessment approaches.

A Simple Hierarchical Causality Primer

cs.MA · 2026-06-01 · unverdicted · novelty 2.0

Presents a simple discrete primer on hierarchical causality that requires causation classes, aggregation operators, and discrete event-time maps to connect actor and agent levels.

Deep Learning for Sequential Decision Making under Uncertainty: Foundations, Frameworks, and Frontiers

math.OC · 2026-04-13 · unverdicted · novelty 2.0

A tutorial framing deep learning as a complement to optimization for sequential decision-making under uncertainty, with applications in supply chains, healthcare, and energy.

TokenRatio: Principled Token-Level Preference Optimization via Ratio Matching

cs.CL · 2026-05-12

Optimal strategies in the all-heads coin game

math.PR · 2026-04-24

citing papers explorer

Showing 2 of 2 citing papers after filters.

The hidden risks of temporal resampling in clinical reinforcement learning cs.LG · 2026-02-06 · conditional · none · ref 28
Resampling clinical time series into uniform bins for offline RL reduces performance by up to 60% and causes retrospective evaluations to overestimate returns by 1.5-3x versus unprocessed data.
AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning cs.LG · 2025-05-30 · conditional · none · ref 39
AReaL decouples generation and training in LLM reinforcement learning to achieve up to 2.77x speedup with matched or better performance on math and code benchmarks.

Wiley Series in Probability and Statistics, Wiley (1994)

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer