The work gives the first algorithms for general robust Markov games with linear function approximation whose sample complexity breaks the curse of multiagency for large state spaces in both generative and online settings.
hub
arXiv preprint arXiv:1908.05659 , year=
18 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 18roles
other 1polarities
unclear 1representative citing papers
A distributionally robust safety filter reduces certification for nonlinear systems under arbitrary uncertainties to a one-dimensional switching-time search with Wasserstein-inflated sampling guarantees.
CorrDP relaxes standard differential privacy by incorporating feature correlations, enabling distance-dependent noise in DP-ERM for better privacy-utility tradeoffs.
Presents the first algorithm to identify an ε-optimal policy in robust constrained MDPs via epigraph form and bisection search with Õ(ε^{-4}) robust policy evaluations.
Proposes APUB optimization framework for stochastic programming, proves asymptotic correctness and consistency of the new bound, and develops bootstrap and L-shaped solvers for two-stage linear problems with empirical tests on a product mix example.
CVaR-constrained TD3 policies for robot navigation show larger safety margins and higher post-training reachability verification rates than average-cost baselines across simulated scenarios and real-robot tests.
Expected regret equals covariance between costs and optimal decisions for linear and quadratic stochastic programs, with explicit bounds on the residual.
Learned primal and dual maps conditioned on population summaries enable reliable coordination across composition shifts in large multi-agent systems, cutting forecast error 16-19% and violations 20-51% in a supply-chain case study.
RACER routes between reasoning and non-reasoning LLM judges via constrained distributionally robust optimization to achieve better accuracy-cost trade-offs under distribution shift.
Q-MMR introduces recursive reweighting and moment matching for off-policy evaluation, delivering dimension-free error bounds under Q^π realizability alone.
The authors create a distributionally robust formulation for the cyclic inventory routing problem that admits a deterministic reformulation via multi-point worst-case distributions and chance-constraint equivalents, solved by nested branch-and-price and tested on real automotive data.
DRRO for RLHF minimizes worst-case regret relative to the best policy under Wasserstein reward perturbations, yielding an exact inner solution and water-filling policy structure for the promptwise simplex model plus a practical policy-gradient algorithm.
The authors introduce (ηx,ηy,δ,ε)-GSSP as a convergence criterion and develop projected gradient-free descent-ascent methods achieving non-asymptotic rates for nonsmooth nonconvex-concave minimax optimization without weak convexity assumptions.
A new disturbance-affine distributionally robust MPC framework for uncertain linear systems that is less conservative than tube-based approaches while guaranteeing recursive feasibility and stability.
PAC learning-based DR-MPC framework interpolates between robust MPC and stochastic MPC for interactive trajectory planning under agent decision uncertainty.
The authors develop a conceptual framework for assured autonomy in generative AI by using flow-based models for auditable generation and adversarial robustness for operational safety, repositioning operations research as a system architect.
PECO strengthens chance constraints by mandating feasibility for all high-probability events and is solved via a data-embedded deterministic program that works for nonlinear nonconvex instances when the size of the solution-determining data family can be estimated by machine learning.
A target-based DRO model for MST under distributional uncertainty is solved exactly via Benders decomposition and a modified Prim algorithm.
citing papers explorer
-
Taming the Curses of Multiagency in Robust Markov Games with Large State Space through Linear Function Approximation
The work gives the first algorithms for general robust Markov games with linear function approximation whose sample complexity breaks the curse of multiagency for large state spaces in both generative and online settings.
-
Distributionally Robust Safety Under Arbitrary Uncertainties: A Safety Filtering Approach
A distributionally robust safety filter reduces certification for nonlinear systems under arbitrary uncertainties to a one-dimensional switching-time search with Wasserstein-inflated sampling guarantees.
-
Integrating Feature Correlation in Differential Privacy with Applications in DP-ERM
CorrDP relaxes standard differential privacy by incorporating feature correlations, enabling distance-dependent noise in DP-ERM for better privacy-utility tradeoffs.
-
Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form
Presents the first algorithm to identify an ε-optimal policy in robust constrained MDPs via epigraph form and bisection search with Õ(ε^{-4}) robust policy evaluations.
-
Minimizing Upper Confidence Bounds: A Data-Driven Framework for Stochastic Programming
Proposes APUB optimization framework for stochastic programming, proves asymptotic correctness and consistency of the new bound, and develops bootstrap and L-shaped solvers for two-stage linear problems with empirical tests on a product mix example.
-
Safety-Constrained Reinforcement Learning with Post-Training Reachability Verification for Robot Navigation
CVaR-constrained TD3 policies for robot navigation show larger safety margins and higher post-training reachability verification rates than average-cost baselines across simulated scenarios and real-robot tests.
-
Regret Equals Covariance: A Closed-Form Characterization for Stochastic Optimization
Expected regret equals covariance between costs and optimal decisions for linear and quadratic stochastic programs, with explicit bounds on the residual.
-
Ready from Day 1: Population-Aware Coordination for Large-Scale Constrained Multi-Agent Systems
Learned primal and dual maps conditioned on population summaries enable reliable coordination across composition shifts in large multi-agent systems, cutting forecast error 16-19% and violations 20-51% in a supply-chain case study.
-
Reasoning Is Not Free: Robust Adaptive Cost-Efficient Routing for LLM-as-a-Judge
RACER routes between reasoning and non-reasoning LLM judges via constrained distributionally robust optimization to achieve better accuracy-cost trade-offs under distribution shift.
-
Q-MMR: Off-Policy Evaluation via Recursive Reweighting and Moment Matching
Q-MMR introduces recursive reweighting and moment matching for off-policy evaluation, delivering dimension-free error bounds under Q^π realizability alone.
-
The Distributionally Robust Cyclic Inventory Routing Problem
The authors create a distributionally robust formulation for the cyclic inventory routing problem that admits a deterministic reformulation via multi-point worst-case distributions and chance-constraint equivalents, solved by nested branch-and-price and tested on real automotive data.
-
Wasserstein Distributionally Robust Regret Optimization for Reinforcement Learning from Human Feedback
DRRO for RLHF minimizes worst-case regret relative to the best policy under Wasserstein reward perturbations, yielding an exact inner solution and water-filling policy structure for the promptwise simplex model plus a practical policy-gradient algorithm.
-
Nonsmooth Nonconvex-Concave Minimax Optimization: Convergence Criteria and Algorithms
The authors introduce (ηx,ηy,δ,ε)-GSSP as a convergence criterion and develop projected gradient-free descent-ascent methods achieving non-asymptotic rates for nonsmooth nonconvex-concave minimax optimization without weak convexity assumptions.
-
Distributionally Robust Stochastic MPC under Disturbance-Affine Feedback Policies
A new disturbance-affine distributionally robust MPC framework for uncertain linear systems that is less conservative than tube-based approaches while guaranteeing recursive feasibility and stability.
-
Interactive Trajectory Planning with Learning-based Distributionally Robust Model Predictive Control and Markov Systems
PAC learning-based DR-MPC framework interpolates between robust MPC and stochastic MPC for interactive trajectory planning under agent decision uncertainty.
-
Assured autonomy: How operations research powers and orchestrates generative AI systems
The authors develop a conceptual framework for assured autonomy in generative AI by using flow-based models for auditable generation and adversarial robustness for operational safety, repositioning operations research as a system architect.
-
A Data-embedded Solution Paradigm for Nonconvex Probable Event Constrained Optimization
PECO strengthens chance constraints by mandating feasibility for all high-probability events and is solved via a data-embedded deterministic program that works for nonlinear nonconvex instances when the size of the solution-determining data family can be estimated by machine learning.
-
Target-based Distributionally Robust Minimum Spanning Tree Problem
A target-based DRO model for MST under distributional uncertainty is solved exactly via Benders decomposition and a modified Prim algorithm.