Routledge, 2021

Eitan Altman · 2021

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

browse 9 citing papers

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

SURF: Steering the Scalarization Weight to Uniformly Traverse the Pareto Front

cs.LG · 2026-05-20 · unverdicted · novelty 6.0

SURF derives weight sampling rules from the arc-length CDF of the scalarization path to uniformly traverse the Pareto front in multi-objective optimization.

Shaping Zero-Shot Coordination via State Blocking

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

SBC generates virtual environments via state blocking to expose agents to diverse suboptimal partner policies, yielding superior zero-shot coordination performance including with humans.

A Single Deep Preference-Conditioned Policy for Learning Pareto Coverage Sets

cs.LG · 2026-05-09 · unverdicted · novelty 6.0

A single preference-conditioned policy achieves unique and Lipschitz-continuous Pareto coverage in multi-objective MDPs via a new mirror-descent policy iteration algorithm with O(1/k) convergence.

Why Does Agentic Safety Fail to Generalize Across Tasks?

cs.LG · 2026-05-07 · conditional · novelty 6.0

Agentic safety fails to generalize across tasks because the task-to-safe-controller mapping has a higher Lipschitz constant than the task-to-controller mapping alone, as proven in linear-quadratic control and demonstrated in quadcopter and LLM experiments.

Bridging the Gap Between Average and Discounted TD Learning

cs.LG · 2026-05-03 · unverdicted · novelty 6.0

A new two-trajectory sampling algorithm for average-reward TD learning guarantees convergence with quadratic sample complexity and no explicit dimension dependence in both tabular and linear approximation settings.

Model-Based Proactive Cost Generation for Learning Safe Policies Offline with Limited Violation Data

cs.LG · 2026-05-02 · unverdicted · novelty 6.0

PROCO generates synthetic unsafe samples via model-based rollouts and LLM-grounded costs to enable safer policy learning from offline datasets containing few or no violations.

Cat-DPO: Category-Adaptive Safety Alignment

cs.CL · 2026-04-19 · unverdicted · novelty 6.0

Cat-DPO applies per-category adaptive safety margins during direct preference optimization to reduce variance in safety across harm categories.

When Dynamics Shift, Robust Task Inference Wins: Offline Imitation Learning with Behavior Foundation Models Revisited

cs.LG · 2026-05-16 · unverdicted · novelty 5.0

Robust minimax task inference in BFMs achieves dynamics-shift robustness from nominal offline data alone and outperforms standard baselines.

SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Constrained Learning

cs.RO · 2025-03-05 · unverdicted · novelty 5.0

SafeVLA applies constrained reinforcement learning via CMDP min-max optimization to VLAs, cutting safety violation costs by 83.58% while preserving task success on long-horizon mobile manipulation tasks.

citing papers explorer

Showing 9 of 9 citing papers.

SURF: Steering the Scalarization Weight to Uniformly Traverse the Pareto Front cs.LG · 2026-05-20 · unverdicted · none · ref 2
SURF derives weight sampling rules from the arc-length CDF of the scalarization path to uniformly traverse the Pareto front in multi-objective optimization.
Shaping Zero-Shot Coordination via State Blocking cs.LG · 2026-05-12 · unverdicted · none · ref 39
SBC generates virtual environments via state blocking to expose agents to diverse suboptimal partner policies, yielding superior zero-shot coordination performance including with humans.
A Single Deep Preference-Conditioned Policy for Learning Pareto Coverage Sets cs.LG · 2026-05-09 · unverdicted · none · ref 29
A single preference-conditioned policy achieves unique and Lipschitz-continuous Pareto coverage in multi-objective MDPs via a new mirror-descent policy iteration algorithm with O(1/k) convergence.
Why Does Agentic Safety Fail to Generalize Across Tasks? cs.LG · 2026-05-07 · conditional · none · ref 3
Agentic safety fails to generalize across tasks because the task-to-safe-controller mapping has a higher Lipschitz constant than the task-to-controller mapping alone, as proven in linear-quadratic control and demonstrated in quadcopter and LLM experiments.
Bridging the Gap Between Average and Discounted TD Learning cs.LG · 2026-05-03 · unverdicted · none · ref 4
A new two-trajectory sampling algorithm for average-reward TD learning guarantees convergence with quadratic sample complexity and no explicit dimension dependence in both tabular and linear approximation settings.
Model-Based Proactive Cost Generation for Learning Safe Policies Offline with Limited Violation Data cs.LG · 2026-05-02 · unverdicted · none · ref 3
PROCO generates synthetic unsafe samples via model-based rollouts and LLM-grounded costs to enable safer policy learning from offline datasets containing few or no violations.
Cat-DPO: Category-Adaptive Safety Alignment cs.CL · 2026-04-19 · unverdicted · none · ref 37
Cat-DPO applies per-category adaptive safety margins during direct preference optimization to reduce variance in safety across harm categories.
When Dynamics Shift, Robust Task Inference Wins: Offline Imitation Learning with Behavior Foundation Models Revisited cs.LG · 2026-05-16 · unverdicted · none · ref 7
Robust minimax task inference in BFMs achieves dynamics-shift robustness from nominal offline data alone and outperforms standard baselines.
SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Constrained Learning cs.RO · 2025-03-05 · unverdicted · none · ref 38
SafeVLA applies constrained reinforcement learning via CMDP min-max optimization to VLAs, cutting safety violation costs by 83.58% while preserving task success on long-horizon mobile manipulation tasks.

Routledge, 2021

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer