SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Constrained Learning

Borong Zhang; Jiaming Ji; Josef Dai; Yaodong Yang; Yingshan Lei; Yishuai Cai; Yuanpei Chen; Yuhao Zhang

arxiv: 2503.03480 · v4 · submitted 2025-03-05 · 💻 cs.RO · cs.AI

SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Constrained Learning

Borong Zhang , Yuhao Zhang , Jiaming Ji , Yingshan Lei , Yishuai Cai , Josef Dai , Yuanpei Chen , Yaodong Yang This is my paper

Pith reviewed 2026-05-23 01:24 UTC · model grok-4.3

classification 💻 cs.RO cs.AI

keywords vision language action modelssafety alignmentconstrained markov decision processsafe reinforcement learningrobot safetymobile manipulation

0 comments

The pith

Vision-language-action robot policies achieve strong safety alignment by eliciting unsafe behaviors and optimizing under constraints in a CMDP framework.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper establishes that safety constraints can be explicitly integrated into vision-language-action models for robots through an integrated safety approach. The approach involves modeling safety requirements, actively eliciting diverse unsafe behaviors, constraining policies via safe reinforcement learning, and assuring safety through evaluations. If true, this would allow generalist robot policies to be deployed with significantly reduced risk of harm while maintaining their task performance. A sympathetic reader would care because VLAs promise versatile robot control but currently face extreme safety challenges in real-world settings.

Core claim

The paper claims that leveraging the constrained Markov decision process paradigm, the integrated safety approach optimizes VLAs from a min-max perspective against elicited safety risks, resulting in policies that achieve effective safety-performance trade-offs, strong safety assurance for long-tail risks, and robust generalization to out-of-distribution perturbations, as demonstrated on long-horizon mobile manipulation tasks.

What carries the argument

The Integrated Safety Approach (ISA), which systematically models safety, elicits unsafe behaviors, applies constrained optimization in CMDP, and performs targeted safety evaluations.

If this is right

Reduces the cumulative cost of safety violations by 83.58% compared to state-of-the-art while increasing task success rate by 3.85%.
Mitigates long-tail risks and handles extreme failure scenarios.
Generalizes learned safety behaviors to various out-of-distribution perturbations.
Evaluated on long-horizon mobile manipulation tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This approach may extend to other embodied AI systems beyond VLAs.
Future work could test the method in physical robot deployments with novel perturbations.
Emphasizes the importance of comprehensive unsafe behavior elicitation for real-world safety.

Load-bearing premise

The method assumes that the set of actively elicited unsafe behaviors sufficiently covers the safety risks that will appear in real-world deployment and out-of-distribution perturbations.

What would settle it

A test showing whether the aligned policy still incurs high safety violation costs when faced with unsafe scenarios not included in the active elicitation process.

Figures

Figures reproduced from arXiv: 2503.03480 by Borong Zhang, Jiaming Ji, Josef Dai, Yaodong Yang, Yingshan Lei, Yishuai Cai, Yuanpei Chen, Yuhao Zhang.

**Figure 1.** Figure 1: The Integrated Safety Approach (ISA) pipeline. Our proposed pipeline employs multifaceted framework for the systematic safety alignment of vision-language-action (VLA) models. challenges posed by the complex and unpredictable physical world [27]. Despite large-scale behavior cloning and careful alignment in existing VLAs [28, 29], the most advanced models have yet to explicitly define and integrate safety… view at source ↗

**Figure 2.** Figure 2: Upper: Conceptual diagrams of each safety critical component. Lower: Corresponding photorealistic examples from our simulation environment. we utilize a large-scale dataset of 150K diverse indoor scenes generated by ProcTHOR [70], alongside Objaverse [71], which provides an extensive library of 800K 3D assets. The simulation is conducted in the AI2THOR [72] simulator, which supports photo-realistic renderi… view at source ↗

**Figure 3.** Figure 3: Cumulative cost distribution analysis. Left: Distribution of cumulative cost across robot trajectories in the test set after fine-tuning with ISA and FLaRe. Middle: Cumulative cost distribution when the task succeeds. Right: Cumulative cost distribution when the task fails. technique [73], Equation 2 is transformed into an unconstrained safe optimization problem: min θ max λ≥0 [−Jr(θ) +Xn i=0 λiJci (θ)], (… view at source ↗

**Figure 4.** Figure 4: Effectiveness of ISA across diverse VLA models and benchmarks. (§ 5.2.2); (III) Which components within ISA critically impact its safety-performance balance? (§ 5.2.3) (IV) Do learned safety behaviors generalize to OOD scenarios and extreme failures? (§ 5.2.4) 5.1 Experimental Setup Tasks, Environments and Training. Our primary experiments utilize Safety-CHORES. To contextualize the unique challenges pose… view at source ↗

**Figure 5.** Figure 5: Comparative performance of VLA models on multiple benchmarks. Left: SR of each model per benchmark. Right: CC incurred by each model on these benchmarks. demonstrates substantial safety improvements, achieving an average reduction in CC of 83.58% compared to the strongest task-focused RL baseline, FLaRe. This significant decrease is consistent across all tasks, as illustrated by per-room safety improvement… view at source ↗

**Figure 6.** Figure 6: ISA with fixed penalty coefficients. Importance of Risk Elicitation. The importance of risk elicitation is demonstrated by an ablation study in [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: Left: Ablation of the risk elicitation component. Middle: Ablation on cost thresholds bi . Right: Safety in extreme failure scenarios. ISA Generalizability to Different VLA Models. In [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

**Figure 8.** Figure 8: Setup for sim-to-real validation. The physical platform consists of dual Realman RM75- 6F arms equipped with PsiBot G0-R hands, perceived through an egocentric RealSense D455 camera. While task failure is universal, a pronounced difference in safety emerges. In [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗

**Figure 9.** Figure 9: Logistic regression analysis of task success versus cumulative cost. Left: Logistic regression analysis of task success probability as a function of cumulative cost for the ISA model. The model maintains a relatively high probability of success across different cost levels, indicating its robustness in handling cost variations. Right: Logistic regression analysis of task success probability for the FLaRe b… view at source ↗

**Figure 10.** Figure 10: Mean cumulative cost distribution per room analysis. The mean cumulative cost is calculated as the average of all unsafe events across the entire evaluation set. Left: Mean cumulative cost distribution for the Safety-ObjNav task across different rooms. Middle: Mean cumulative cost distribution for the Safety-Pickup task across different rooms. Right: Mean cumulative cost for the Safety-Fetch task across d… view at source ↗

**Figure 11.** Figure 11: Qualitative comparison of ISA-aligned VLA and unaligned VLA behaviors. Left: Trajectory comparison for a representative task. The ISA-aligned VLA exhibits a smoother, more direct path, while the unaligned VLA shows erratic movements, collisions, and interaction with non-target areas. Right: Examples of unsafe behaviors exhibited by unaligned VLAs, corresponding to safety-critical components. B.2 Behaviors… view at source ↗

**Figure 12.** Figure 12: Training dynamics of the ISA framework on the Safety-ObjNav task. Left: Task success rate over training steps. Middle: Average cumulative cost, which rapidly decreases and stabilizes below the predefined cost limit. Right: The value of the Lagrange multiplier, which dynamically adjusts to enforce the safety constraint. represents a trajectory, and τ ∼ πθ denotes the trajectory distribution dependent on πθ… view at source ↗

**Figure 13.** Figure 13: Visual examples of Out-of-Distribution (OOD) conditions applied in the simulation environment. Bottom: A scene under normal rendering conditions. Top-Left: Color OOD demonstrates significant hue and saturation changes to environmental surfaces like walls and floors. Top-Right: Lighting OOD showcases variations in brightness, color temperature, and shadowing. Middle-left: Material OOD displays objects with… view at source ↗

**Figure 14.** Figure 14: Details of Material OOD. Material OOD applies material transformations to four categories of objects. Each subcategory has a preset set of material packages. For each object instance, materials are randomly sampled and combined from a predefined set of material packages specific to its category, leading to significant visual alterations as exemplified above [PITH_FULL_IMAGE:figures/full_fig_p031_14.png] view at source ↗

read the original abstract

Vision-language-action models (VLAs) show potential as generalist robot policies. However, these models pose extreme safety challenges during real-world deployment, including the risk of harm to the environment, the robot itself, and humans. How can safety constraints be explicitly integrated into VLAs? We address this by exploring an integrated safety approach (ISA), systematically modeling safety requirements, then actively eliciting diverse unsafe behaviors, effectively constraining VLA policies via safe reinforcement learning, and rigorously assuring their safety through targeted evaluations. Leveraging the constrained Markov decision process (CMDP) paradigm, ISA optimizes VLAs from a min-max perspective against elicited safety risks. Thus, policies aligned through this comprehensive approach achieve the following key features: (I) effective safety-performance trade-offs, reducing the cumulative cost of safety violations by 83.58% compared to the state-of-the-art method, while also maintaining task success rate (+3.85%). (II) strong safety assurance, with the ability to mitigate long-tail risks and handle extreme failure scenarios. (III) robust generalization of learned safety behaviors to various out-of-distribution perturbations. The effectiveness is evaluated on long-horizon mobile manipulation tasks. Our data, models and newly proposed benchmark environment are available at https://pku-safevla.github.io.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper applies an established CMDP-plus-adversarial-elicitation pipeline to VLAs and releases a benchmark, but the safety claims hinge on whether the elicited unsafe set covers real deployment risks.

read the letter

The main thing here is a systematic recipe for folding safety into VLAs: model requirements as CMDP costs, actively generate unsafe trajectories, then run constrained RL to minimize those costs while keeping task reward. They evaluate on long-horizon mobile manipulation and report an 83.58% drop in cumulative safety cost versus prior methods plus a small task-success gain. They also ship the data, models, and new benchmark environment, which is useful for anyone who wants to test similar ideas.

Referee Report

2 major / 0 minor

Summary. The paper proposes an Integrated Safety Approach (ISA) for Vision-Language-Action (VLA) models that models safety requirements, actively elicits diverse unsafe behaviors, constrains VLA policies via safe reinforcement learning in a constrained Markov decision process (CMDP) formulated as a min-max optimization, and evaluates safety through targeted assessments. On long-horizon mobile manipulation tasks, the approach is claimed to reduce cumulative safety violation costs by 83.58% relative to the state-of-the-art while increasing task success rate by 3.85%, with additional claims of mitigating long-tail risks and achieving robust out-of-distribution generalization. The work releases data, models, and a new benchmark environment.

Significance. If the coverage of elicited unsafe behaviors is shown to extend to real deployment distributions, the work would provide a concrete, reproducible framework for embedding explicit safety constraints into generalist robot policies, addressing a pressing deployment barrier for VLAs. The public release of data, models, and benchmark is a clear strength that supports follow-on research. The quantitative gains on the reported tasks are potentially impactful for the robotics community, but their interpretation is limited by the unverified central assumption.

major comments (2)

[Abstract] Abstract: the central quantitative claims (83.58% reduction in cumulative safety violation cost and +3.85% task success) are reported without any information on the number of independent runs, statistical significance testing, variance, or the precise mathematical definition of the safety cost metric used in the CMDP formulation; this prevents verification of the claimed safety-performance trade-off.
[Abstract] Abstract and Evaluation section: the safety-assurance and OOD-generalization claims rest on the assumption that the set of actively elicited unsafe behaviors is sufficiently representative of real-world deployment risks and out-of-distribution perturbations; however, the manuscript states that safety is assured “through targeted evaluations” on the same elicited risks, leaving the coverage assumption untested and making the reported metrics dependent on an unverified premise.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their insightful comments, which help improve the clarity and rigor of our work. We address each major comment below and will make corresponding revisions to the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the central quantitative claims (83.58% reduction in cumulative safety violation cost and +3.85% task success) are reported without any information on the number of independent runs, statistical significance testing, variance, or the precise mathematical definition of the safety cost metric used in the CMDP formulation; this prevents verification of the claimed safety-performance trade-off.

Authors: We agree that these details are important for verifying the claims. The experiments in the manuscript were performed with 5 independent runs using different random seeds, and we report the mean values along with standard deviations in the evaluation section. The safety cost metric is defined as the cumulative sum of per-timestep costs in the CMDP, where the cost is 1 upon violation of any safety constraint (such as collisions or unsafe actions) and 0 otherwise. We will revise the abstract to include this information, e.g., '83.58% reduction (5 runs, mean ± std)'. We can also include statistical significance if space permits. revision: yes
Referee: [Abstract] Abstract and Evaluation section: the safety-assurance and OOD-generalization claims rest on the assumption that the set of actively elicited unsafe behaviors is sufficiently representative of real-world deployment risks and out-of-distribution perturbations; however, the manuscript states that safety is assured “through targeted evaluations” on the same elicited risks, leaving the coverage assumption untested and making the reported metrics dependent on an unverified premise.

Authors: This is a fair point regarding the scope of our claims. The elicitation of unsafe behaviors is performed through the min-max optimization in the CMDP to actively discover diverse violation scenarios based on modeled safety requirements. We evaluate on both the elicited behaviors and additional OOD perturbations to test generalization. However, we cannot empirically verify coverage against all possible real-world distributions. We will revise the text to more explicitly state this assumption and add a discussion of limitations, emphasizing that the approach provides safety assurance within the scope of the elicited and tested risks. revision: partial

Circularity Check

0 steps flagged

No circularity in derivation chain; empirical results independent of inputs.

full rationale

The paper's core derivation applies the standard CMDP min-max formulation to constrain VLAs after eliciting unsafe behaviors, then reports empirical metrics (83.58% cost reduction, +3.85% success) from evaluations on long-horizon tasks against SOTA baselines. These outcomes are measured on benchmark environments rather than being algebraically equivalent to the elicited set or optimization parameters by construction. No self-definitional equations, fitted inputs renamed as predictions, or load-bearing self-citations appear in the provided derivation steps. The coverage assumption for elicited behaviors is an unverified modeling choice but does not reduce the reported results to tautology.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that elicited unsafe behaviors form a representative set for the CMDP constraints; no free parameters or invented entities are described in the abstract.

axioms (1)

domain assumption Safety requirements can be modeled as additive costs in a CMDP whose violation cost is minimized jointly with task reward.
Invoked when the paper states it optimizes VLAs from a min-max perspective against elicited safety risks.

pith-pipeline@v0.9.0 · 5783 in / 1186 out tokens · 20095 ms · 2026-05-23T01:24:26.810033+00:00 · methodology

discussion (0)

Forward citations

Cited by 8 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

SafeManip: A Property-Driven Benchmark for Temporal Safety Evaluation in Robotic Manipulation
cs.RO 2026-05 unverdicted novelty 8.0

SafeManip is a new benchmark that applies LTLf monitors to assess temporal safety properties across eight categories in robotic manipulation, demonstrating that task success frequently fails to ensure safe execution i...
Towards Backdoor-Based Ownership Verification for Vision-Language-Action Models
cs.RO 2026-05 unverdicted novelty 7.0

GuardVLA embeds a stealthy backdoor watermark in VLAs via secret messages in visual data and uses a swap-and-detect mechanism for post-release ownership verification that preserves task performance.
Constrained Decoding for Safe Robot Navigation Foundation Models
cs.RO 2025-09 unverdicted novelty 7.0

SafeDec uses constrained decoding to ensure autoregressive robot navigation foundation models generate actions that provably satisfy STL safety specifications under assumed dynamics.
Escaping the Diversity Trap in Robotic Manipulation via Anchor-Centric Adaptation
cs.RO 2026-05 unverdicted novelty 6.0

Anchor-Centric Adaptation escapes the diversity trap by prioritizing repeated demonstrations at core anchors over broad coverage, yielding higher success rates under fixed data budgets in robotic manipulation.
RLearner-LLM: Balancing Logical Grounding and Fluency in Large Language Models via Hybrid Direct Preference Optimization
cs.CL 2026-05 unverdicted novelty 6.0

RLearner-LLM's Hybrid-DPO fuses DeBERTa NLI and LLM verifier scores to deliver up to 6x higher NLI entailment than standard SFT while preserving answer coverage across academic domains.
RLearner-LLM: Balancing Logical Grounding and Fluency in Large Language Models via Hybrid Direct Preference Optimization
cs.CL 2026-05 unverdicted novelty 6.0

RLearner-LLM achieves up to 6x gains in NLI entailment over standard fine-tuning by using an automated hybrid DPO pipeline that balances logic and fluency across multiple model sizes and domains.
RLearner-LLM: Balancing Logical Grounding and Fluency in Large Language Models via Hybrid Direct Preference Optimization
cs.CL 2026-05 unverdicted novelty 5.0

Hybrid-DPO combining NLI and verifier scores delivers up to 6x NLI improvement over SFT baselines across multiple LLMs and domains while preserving answer coverage and inference speed.
Can Explicit Physical Feasibility Benefit VLA Learning? An Empirical Study
cs.LG 2026-04 unverdicted novelty 5.0

Explicit geometry-based feasibility supervision added to diffusion VLA training leads to better physical reliability, task success, and faster learning with limited data in manipulation tasks.

Reference graph

Works this paper leans on

106 extracted references · 106 canonical work pages · cited by 6 Pith papers · 31 internal anchors

[1]

Aligning cyber space with physical world: A comprehensive survey on embodied ai.arXiv preprint arXiv:2407.06886, 2024

Yang Liu, Weixing Chen, Yongjie Bai, Xiaodan Liang, Guanbin Li, Wen Gao, and Liang Lin. Aligning cyber space with physical world: A comprehensive survey on embodied ai.arXiv preprint arXiv:2407.06886, 2024

work page arXiv 2024
[2]

RT-1: Robotics Transformer for Real-World Control at Scale

Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Joseph Dabis, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, Jasmine Hsu, et al. Rt-1: Robotics transformer for real-world control at scale.arXiv preprint arXiv:2212.06817, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[3]

Open X-Embodiment: Robotic Learning Datasets and RT-X Models

Abby O’Neill, Abdul Rehman, Abhinav Gupta, Abhiram Maddukuri, Abhishek Gupta, Ab- hishek Padalkar, Abraham Lee, Acorn Pooley, Agrim Gupta, Ajay Mandlekar, et al. Open x-embodiment: Robotic learning datasets and rt-x models.arXiv preprint arXiv:2310.08864, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[4]

Octo: An Open-Source Generalist Robot Policy

Octo Model Team, Dibya Ghosh, Homer Walke, Karl Pertsch, Kevin Black, Oier Mees, Sudeep Dasari, Joey Hejna, Tobias Kreiman, Charles Xu, et al. Octo: An open-source generalist robot policy.arXiv preprint arXiv:2405.12213, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[5]

OpenVLA: An Open-Source Vision-Language-Action Model

Moo Jin Kim, Karl Pertsch, Siddharth Karamcheti, Ted Xiao, Ashwin Balakrishna, Suraj Nair, Rafael Rafailov, Ethan Foster, Grace Lam, Pannag Sanketi, et al. Openvla: An open-source vision-language-action model.arXiv preprint arXiv:2406.09246, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[6]

A Generalist Agent

Scott Reed, Konrad Zolna, Emilio Parisotto, Sergio Gomez Colmenarejo, Alexander Novikov, Gabriel Barth-Maron, Mai Gimenez, Yury Sulsky, Jackie Kay, Jost Tobias Springenberg, et al. A generalist agent.arXiv preprint arXiv:2205.06175, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[7]

A Survey on Vision-Language-Action Models for Embodied AI

Yueen Ma, Zixing Song, Yuzheng Zhuang, Jianye Hao, and Irwin King. A survey on vision- language-action models for embodied ai.arXiv preprint arXiv:2405.14093, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[8]

Challenges and applications of large language models.arXiv preprint arXiv:2307.10169, 2023

Jean Kaddour, Joshua Harris, Maximilian Mozes, Herbie Bradley, Roberta Raileanu, and Robert McHardy. Challenges and applications of large language models.arXiv preprint arXiv:2307.10169, 2023

work page arXiv 2023
[9]

AI Alignment: A Comprehensive Survey

Jiaming Ji, Tianyi Qiu, Boyuan Chen, Borong Zhang, Hantao Lou, Kaile Wang, Yawen Duan, Zhonghao He, Jiayi Zhou, Zhaowei Zhang, et al. Ai alignment: A comprehensive survey.arXiv preprint arXiv:2310.19852, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[10]

The Llama 3 Herd of Models

Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et al. The llama 3 herd of models.arXiv preprint arXiv:2407.21783, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[11]

Openai o1 system card

OpenAI. Openai o1 system card. https://cdn.openai.com/o1-system-card-2024120 5.pdf, 2024

work page 2024
[12]

DeepSeek-V3 Technical Report

Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, et al. Deepseek-v3 technical report.arXiv preprint arXiv:2412.19437, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[13]

Language models resist alignment: Evidence from data compression

Jiaming Ji, Kaile Wang, Tianyi Alex Qiu, Boyuan Chen, Jiayi Zhou, Changye Li, Hantao Lou, Josef Dai, Yunhuai Liu, and Yaodong Yang. Language models resist alignment: Evidence from data compression. In Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar, editors,Proceedings of the 63rd Annual Meeting of the Association for Computa- ...

work page 2025
[14]

Shadows of intelligence: A comprehensive survey of ai deception

PKU-Alignment Group and Collaborators. Shadows of intelligence: A comprehensive survey of ai deception. https://deceptionsurvey.com/, 2025. Beta Version V2 (v1 updated on August 28, 2025; v2 updated on September 24, 2025). Preprint to appear

work page 2025
[15]

Reinforced Self-Training (ReST) for Language Modeling

Caglar Gulcehre, Tom Le Paine, Srivatsan Srinivasan, Ksenia Konyushkova, Lotte Weerts, Abhishek Sharma, Aditya Siddhant, Alex Ahern, Miaosen Wang, Chenjie Gu, et al. Reinforced self-training (rest) for language modeling.arXiv preprint arXiv:2308.08998, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[16]

Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations

Hakan Inan, Kartikeya Upasani, Jianfeng Chi, Rashi Rungta, Krithika Iyer, Yuning Mao, Michael Tontchev, Qing Hu, Brian Fuller, Davide Testuggine, et al. Llama guard: Llm-based input-output safeguard for human-ai conversations.arXiv preprint arXiv:2312.06674, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[17]

Llama Guard 3 Vision: Safeguarding Human-AI Image Understanding Conversations

Jianfeng Chi, Ujjwal Karn, Hongyuan Zhan, Eric Smith, Javier Rando, Yiming Zhang, Kate Plawiak, Zacharie Delpierre Coudert, Kartikeya Upasani, and Mahesh Pasupuleti. Llama guard 3 vision: Safeguarding human-ai image understanding conversations.arXiv preprint arXiv:2411.10414, 2024

work page arXiv 2024
[18]

Training language models to follow instructions with human feedback.Advances in neural information processing systems, 35:27730–27744, 2022

Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. Training language models to follow instructions with human feedback.Advances in neural information processing systems, 35:27730–27744, 2022

work page 2022
[19]

Llama 2: Open Foundation and Fine-Tuned Chat Models

Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. Llama 2: Open foundation and fine-tuned chat models.arXiv preprint arXiv:2307.09288, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[20]

Safe RLHF: Safe Reinforcement Learning from Human Feedback

Josef Dai, Xuehai Pan, Ruiyang Sun, Jiaming Ji, Xinbo Xu, Mickel Liu, Yizhou Wang, and Yaodong Yang. Safe rlhf: Safe reinforcement learning from human feedback.arXiv preprint arXiv:2310.12773, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[21]

Align anything: Training all-modality models to follow instructions with language feedback.arXiv preprint arXiv:2412.15838, 2024

Jiaming Ji, Jiayi Zhou, Hantao Lou, Boyuan Chen, Donghai Hong, Xuyao Wang, Wenqi Chen, Kaile Wang, Rui Pan, Jiahao Li, et al. Align anything: Training all-modality models to follow instructions with language feedback.arXiv preprint arXiv:2412.15838, 2024

work page arXiv 2024
[22]

Sequence to sequence reward modeling: Improving rlhf by language feedback

Jiayi Zhou, Jiaming Ji, Josef Dai, and Yaodong Yang. Sequence to sequence reward modeling: Improving rlhf by language feedback. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 27765–27773, 2025

work page 2025
[23]

Aligner: Efficient alignment by learning to correct

Jiaming Ji, Boyuan Chen, Hantao Lou, Donghai Hong, Borong Zhang, Xuehai Pan, Tianyi Alex Qiu, Juntao Dai, and Yaodong Yang. Aligner: Efficient alignment by learning to correct. Advances in Neural Information Processing Systems, 37:90853–90890, 2024

work page 2024
[24]

Med-aligner empowers llm medical applications for complex medical scenarios.The Innovation, page 101002, 2025

Xiangbin Meng, Jia-ming Ji, Xiangyu Yan, Jun-tao Dai, Bo-yuan Chen, Guan Wang, Hua Xu, Jing-jia Wang, Xu-liang Wang, Da Liu, et al. Med-aligner empowers llm medical applications for complex medical scenarios.The Innovation, page 101002, 2025

work page 2025
[25]

Generative rlhf-v: Learning principles from multi-modal human preference.arXiv preprint arXiv:2505.18531, 2025

Jiayi Zhou, Jiaming Ji, Boyuan Chen, Jiapeng Sun, Wenqi Chen, Donghai Hong, Sirui Han, Yike Guo, and Yaodong Yang. Generative rlhf-v: Learning principles from multi-modal human preference.arXiv preprint arXiv:2505.18531, 2025

work page arXiv 2025
[26]

Intermt: Multi-turn interleaved preference alignment with human feedback.arXiv preprint arXiv:2505.23950, 2025

Boyuan Chen, Donghai Hong, Jiaming Ji, Jiacheng Zheng, Bowen Dong, Jiayi Zhou, Kaile Wang, Juntao Dai, Xuyao Wang, Wenqi Chen, et al. Intermt: Multi-turn interleaved preference alignment with human feedback.arXiv preprint arXiv:2505.23950, 2025

work page arXiv 2025
[27]

Safety-critical advanced robots: A survey.Robotics and Autonomous Systems, 94:43–52, 2017

Jérémie Guiochet, Mathilde Machin, and Hélène Waeselynck. Safety-critical advanced robots: A survey.Robotics and Autonomous Systems, 94:43–52, 2017

work page 2017
[28]

Flare: Achieving masterful and adaptive robot policies with large-scale reinforcement learning fine-tuning.arXiv preprint arXiv:2409.16578, 2024

Jiaheng Hu, Rose Hendrix, Ali Farhadi, Aniruddha Kembhavi, Roberto Martín-Martín, Peter Stone, Kuo-Hao Zeng, and Kiana Ehsani. Flare: Achieving masterful and adaptive robot policies with large-scale reinforcement learning fine-tuning.arXiv preprint arXiv:2409.16578, 2024

work page arXiv 2024
[29]

Grape: Generalizing robot policy via preference alignment.arXiv preprint arXiv:2411.19309, 2024

Zijian Zhang, Kaiyuan Zheng, Zhaorun Chen, Joel Jang, Yi Li, Chaoqi Wang, Mingyu Ding, Dieter Fox, and Huaxiu Yao. Grape: Generalizing robot policy via preference alignment.arXiv preprint arXiv:2411.19309, 2024. 12

work page arXiv 2024
[30]

RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Xi Chen, Krzysztof Choro- manski, Tianli Ding, Danny Driess, Avinava Dubey, Chelsea Finn, et al. Rt-2: Vision-language- action models transfer web knowledge to robotic control.arXiv preprint arXiv:2307.15818, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[31]

Rt-trajectory: Robotic task generalization via hindsight trajectory sketches,

Jiayuan Gu, Sean Kirmani, Paul Wohlhart, Yao Lu, Montserrat Gonzalez Arenas, Kanishka Rao, Wenhao Yu, Chuyuan Fu, Keerthana Gopalakrishnan, Zhuo Xu, et al. Rt-trajectory: Robotic task generalization via hindsight trajectory sketches.arXiv preprint arXiv:2311.01977, 2023

work page arXiv 2023
[32]

Spoc: Imitating shortest paths in simulation enables effective navigation and manipulation in the real world

Kiana Ehsani, Tanmay Gupta, Rose Hendrix, Jordi Salvador, Luca Weihs, Kuo-Hao Zeng, Ku- nal Pratap Singh, Yejin Kim, Winson Han, Alvaro Herrasti, et al. Spoc: Imitating shortest paths in simulation enables effective navigation and manipulation in the real world. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 162...

work page 2024
[33]

RT-H: Action Hierarchies Using Language

Suneel Belkhale, Tianli Ding, Ted Xiao, Pierre Sermanet, Quon Vuong, Jonathan Tompson, Yevgen Chebotar, Debidatta Dwibedi, and Dorsa Sadigh. Rt-h: Action hierarchies using language.arXiv preprint arXiv:2403.01823, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[34]

$\pi_0$: A Vision-Language-Action Flow Model for General Robot Control

Kevin Black, Noah Brown, Danny Driess, Adnan Esmail, Michael Equi, Chelsea Finn, Niccolo Fusai, Lachy Groom, Karol Hausman, Brian Ichter, et al. pi0 : A vision-language-action flow model for general robot control.arXiv preprint arXiv:2410.24164, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[35]

RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation

Songming Liu, Lingxuan Wu, Bangguo Li, Hengkai Tan, Huayu Chen, Zhengyi Wang, Ke Xu, Hang Su, and Jun Zhu. Rdt-1b: a diffusion foundation model for bimanual manipulation.arXiv preprint arXiv:2410.07864, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[36]

Safety bounds in human robot interaction: A survey.Safety science, 127:104667, 2020

Angeliki Zacharaki, Ioannis Kostavelis, Antonios Gasteratos, and Ioannis Dokas. Safety bounds in human robot interaction: A survey.Safety science, 127:104667, 2020

work page 2020
[37]

Governing ai safety through independent audits.Nature Machine Intelligence, 3(7):566–571, 2021

Gregory Falco, Ben Shneiderman, Julia Badger, Ryan Carrier, Anton Dahbura, David Danks, Martin Eling, Alwyn Goodloe, Jerry Gupta, Christopher Hart, et al. Governing ai safety through independent audits.Nature Machine Intelligence, 3(7):566–571, 2021

work page 2021
[38]

Routledge, 2021

Eitan Altman.Constrained Markov decision processes. Routledge, 2021

work page 2021
[39]

Omnisafe: An infrastructure for accelerating safe reinforcement learning research.Journal of Machine Learning Research, 25(285):1–6, 2024

Jiaming Ji, Jiayi Zhou, Borong Zhang, Juntao Dai, Xuehai Pan, Ruiyang Sun, Weidong Huang, Yiran Geng, Mickel Liu, and Yaodong Yang. Omnisafe: An infrastructure for accelerating safe reinforcement learning research.Journal of Machine Learning Research, 25(285):1–6, 2024

work page 2024
[40]

A Survey on Vision-Language-Action Models: An Action Tokenization Perspective

Yifan Zhong, Fengshuo Bai, Shaofei Cai, Xuchuan Huang, Zhang Chen, Xiaowei Zhang, Yuanfei Wang, Shaoyang Guo, Tianrui Guan, Ka Nam Lui, et al. A survey on vision-language- action models: An action tokenization perspective.arXiv preprint arXiv:2507.01925, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[41]

Robotic Control via Embodied Chain-of-Thought Reasoning

Michał Zawalski, William Chen, Karl Pertsch, Oier Mees, Chelsea Finn, and Sergey Levine. Robotic control via embodied chain-of-thought reasoning.arXiv preprint arXiv:2407.08693, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[42]

FAST: Efficient Action Tokenization for Vision-Language-Action Models

Karl Pertsch, Kyle Stachowicz, Brian Ichter, Danny Driess, Suraj Nair, Quan Vuong, Oier Mees, Chelsea Finn, and Sergey Levine. Fast: Efficient action tokenization for vision-language-action models.arXiv preprint arXiv:2501.09747, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[43]

Towards testing and evaluating vision-language-action models for robotic manipulation: An empirical study.arXiv preprint arXiv:2409.12894, 2024

Zhijie Wang, Zhehua Zhou, Jiayang Song, Yuheng Huang, Zhan Shu, and Lei Ma. Towards testing and evaluating vision-language-action models for robotic manipulation: An empirical study.arXiv preprint arXiv:2409.12894, 2024

work page arXiv 2024
[44]

Learning to act anywhere with task-centric latent actions.arXiv preprint arXiv:2502.14420, 2025

Qingwen Bu, Yanting Yang, Jisong Cai, Shenyuan Gao, Guanghui Ren, Maoqing Yao, Ping Luo, and Hongyang Li. Learning to act anywhere with task-centric latent actions.arXiv preprint arXiv:2502.14420, 2025

work page arXiv 2025
[45]

Dexgraspvla: A vision-language-action framework towards general dexterous grasping.arXiv preprint arXiv:2502.20900, 2025

Yifan Zhong, Xuchuan Huang, Ruochong Li, Ceyao Zhang, Zhang Chen, Tianrui Guan, Fanlian Zeng, Ka Num Lui, Yuyao Ye, Yitao Liang, et al. Dexgraspvla: A vision-language-action framework towards general dexterous grasping.arXiv preprint arXiv:2502.20900, 2025. 13

work page arXiv 2025
[46]

Gemini Robotics: Bringing AI into the Physical World

Gemini Robotics Team, Saminda Abeyruwan, Joshua Ainslie, Jean-Baptiste Alayrac, Montser- rat Gonzalez Arenas, Travis Armstrong, Ashwin Balakrishna, Robert Baruch, Maria Bauza, Michiel Blokzijl, et al. Gemini robotics: Bringing ai into the physical world.arXiv preprint arXiv:2503.20020, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[47]

Hi Robot: Open-Ended Instruction Following with Hierarchical Vision-Language-Action Models

Lucy Xiaoyang Shi, Brian Ichter, Michael Equi, Liyiming Ke, Karl Pertsch, Quan Vuong, James Tanner, Anna Walling, Haohuan Wang, Niccolo Fusai, et al. Hi robot: Open-ended instruction following with hierarchical vision-language-action models.arXiv preprint arXiv:2502.19417, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[48]

Cot-vla: Visual chain-of-thought reasoning for vision-language-action models

Qingqing Zhao, Yao Lu, Moo Jin Kim, Zipeng Fu, Zhuoyang Zhang, Yecheng Wu, Zhaoshuo Li, Qianli Ma, Song Han, Chelsea Finn, et al. Cot-vla: Visual chain-of-thought reasoning for vision-language-action models. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 1702–1713, 2025

work page 2025
[49]

Vision-language-action model with open-world embodied reasoning from pretrained knowledge.arXiv preprint arXiv:2505.21906, 2025

Zhongyi Zhou, Yichen Zhu, Junjie Wen, Chaomin Shen, and Yi Xu. Vision-language-action model with open-world embodied reasoning from pretrained knowledge.arXiv preprint arXiv:2505.21906, 2025

work page arXiv 2025
[50]

TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies

Ruijie Zheng, Yongyuan Liang, Shuaiyi Huang, Jianfeng Gao, Hal Daumé III, Andrey Kolobov, Furong Huang, and Jianwei Yang. Tracevla: Visual trace prompting enhances spatial-temporal awareness for generalist robotic policies.arXiv preprint arXiv:2412.10345, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[51]

Constitutional AI: Harmlessness from AI Feedback

Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, et al. Constitutional ai: Harmlessness from ai feedback.arXiv preprint arXiv:2212.08073, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[52]

Beavertails: Towards improved safety alignment of llm via a human-preference dataset.Advances in Neural Information Processing Systems, 36, 2024

Jiaming Ji, Mickel Liu, Josef Dai, Xuehai Pan, Chi Zhang, Ce Bian, Boyuan Chen, Ruiyang Sun, Yizhou Wang, and Yaodong Yang. Beavertails: Towards improved safety alignment of llm via a human-preference dataset.Advances in Neural Information Processing Systems, 36, 2024

work page 2024
[53]

Safe rlhf-v: Safe reinforcement learning from multi-modal human feedback.arXiv preprint arXiv:2503.17682, 2025

Jiaming Ji, Xinyu Chen, Rui Pan, Conghui Zhang, Han Zhu, Jiahao Li, Donghai Hong, Boyuan Chen, Jiayi Zhou, Kaile Wang, et al. Safe rlhf-v: Safe reinforcement learning from multi-modal human feedback.arXiv preprint arXiv:2503.17682, 2025

work page arXiv 2025
[54]

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

Deep Ganguli, Liane Lovitt, Jackson Kernion, Amanda Askell, Yuntao Bai, Saurav Kadavath, Ben Mann, Ethan Perez, Nicholas Schiefer, Kamal Ndousse, et al. Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned.arXiv preprint arXiv:2209.07858, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[55]

An overview of catastrophic ai risks

Dan Hendrycks, Mantas Mazeika, and Thomas Woodside. An overview of catastrophic ai risks. arXiv preprint arXiv:2306.12001, 2023

work page arXiv 2023
[56]

arXiv preprint arXiv:2205.10330 (2022)

Shangding Gu, Long Yang, Yali Du, Guang Chen, Florian Walter, Jun Wang, and Alois Knoll. A review of safe reinforcement learning: Methods, theory and applications.arXiv preprint arXiv:2205.10330, 2022

work page arXiv 2022
[57]

Artificial intelligence act.Regulamento da União Europeia (UE), 1689, 2024

Artificial Intelligence Act. Artificial intelligence act.Regulamento da União Europeia (UE), 1689, 2024

work page 2024
[58]

MolmoAct: Action Reasoning Models that can Reason in Space

Jason Lee, Jiafei Duan, Haoquan Fang, Yuquan Deng, Shuo Liu, Boyang Li, Bohan Fang, Jieyu Zhang, Yi Ru Wang, Sangho Lee, et al. Molmoact: Action reasoning models that can reason in space.arXiv preprint arXiv:2508.07917, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[59]

Responsive safety in reinforcement learning by pid lagrangian methods

Adam Stooke, Joshua Achiam, and Pieter Abbeel. Responsive safety in reinforcement learning by pid lagrangian methods. InInternational Conference on Machine Learning, pages 9133–9143. PMLR, 2020

work page 2020
[60]

Augmented proximal policy optimization for safe reinforcement learning

Juntao Dai, Jiaming Ji, Long Yang, Qian Zheng, and Gang Pan. Augmented proximal policy optimization for safe reinforcement learning. InProceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 7288–7295, 2023

work page 2023
[61]

AI Safety Gridworlds

Jan Leike, Miljan Martic, Victoria Krakovna, Pedro A Ortega, Tom Everitt, Andrew Lefrancq, Laurent Orseau, and Shane Legg. Ai safety gridworlds.arXiv preprint arXiv:1711.09883, 2017. 14

work page internal anchor Pith review Pith/arXiv arXiv 2017
[62]

Safe-control-gym: A unified benchmark suite for safe learning- based control and reinforcement learning in robotics.IEEE Robotics and Automation Letters, 7(4):11142–11149, 2022

Zhaocong Yuan, Adam W Hall, Siqi Zhou, Lukas Brunke, Melissa Greeff, Jacopo Panerati, and Angela P Schoellig. Safe-control-gym: A unified benchmark suite for safe learning- based control and reinforcement learning in robotics.IEEE Robotics and Automation Letters, 7(4):11142–11149, 2022

work page 2022
[63]

Safety gymnasium: A unified safe reinforcement learning benchmark.Advances in Neural Information Processing Systems, 36:18964–18993, 2023

Jiaming Ji, Borong Zhang, Jiayi Zhou, Xuehai Pan, Weidong Huang, Ruiyang Sun, Yiran Geng, Yifan Zhong, Josef Dai, and Yaodong Yang. Safety gymnasium: A unified safe reinforcement learning benchmark.Advances in Neural Information Processing Systems, 36:18964–18993, 2023

work page 2023
[64]

Hasard: A benchmark for vision-based safe reinforcement learning in embodied agents.arXiv preprint arXiv:2503.08241, 2025

Tristan Tomilin, Meng Fang, and Mykola Pechenizkiy. Hasard: A benchmark for vision-based safe reinforcement learning in embodied agents.arXiv preprint arXiv:2503.08241, 2025

work page arXiv 2025
[65]

Rlbench: The robot learning benchmark & learning environment.IEEE Robotics and Automation Letters, 5(2):3019–3026, 2020

Stephen James, Zicong Ma, David Rovick Arrojo, and Andrew J Davison. Rlbench: The robot learning benchmark & learning environment.IEEE Robotics and Automation Letters, 5(2):3019–3026, 2020

work page 2020
[66]

Calvin: A benchmark for language-conditioned policy learning for long-horizon robot manipulation tasks.IEEE Robotics and Automation Letters, 7(3):7327–7334, 2022

Oier Mees, Lukas Hermann, Erick Rosete-Beas, and Wolfram Burgard. Calvin: A benchmark for language-conditioned policy learning for long-horizon robot manipulation tasks.IEEE Robotics and Automation Letters, 7(3):7327–7334, 2022

work page 2022
[67]

Vlabench: A large-scale benchmark for language-conditioned robotics manipulation with long-horizon reasoning tasks.arXiv preprint arXiv:2412.18194, 2024

Shiduo Zhang, Zhe Xu, Peiju Liu, Xiaopeng Yu, Yuan Li, Qinghui Gao, Zhaoye Fei, Zhangyue Yin, Zuxuan Wu, Yu-Gang Jiang, et al. Vlabench: A large-scale benchmark for language-conditioned robotics manipulation with long-horizon reasoning tasks.arXiv preprint arXiv:2412.18194, 2024

work page arXiv 2024
[68]

Vision-and-language navigation: Interpreting visually-grounded navigation instructions in real environments

Peter Anderson, Qi Wu, Damien Teney, Jake Bruce, Mark Johnson, Niko Sünderhauf, Ian Reid, Stephen Gould, and Anton Van Den Hengel. Vision-and-language navigation: Interpreting visually-grounded navigation instructions in real environments. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 3674–3683, 2018

work page 2018
[69]

Robothor: An open simulation-to-real embodied ai platform

Matt Deitke, Winson Han, Alvaro Herrasti, Aniruddha Kembhavi, Eric Kolve, Roozbeh Mot- taghi, Jordi Salvador, Dustin Schwenk, Eli VanderBilt, Matthew Wallingford, et al. Robothor: An open simulation-to-real embodied ai platform. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3164–3174, 2020

work page 2020
[70]

Procthor: Large-scale embodied ai using procedural generation.Advances in Neural Information Processing Systems, 35:5982–5994, 2022

Matt Deitke, Eli VanderBilt, Alvaro Herrasti, Luca Weihs, Kiana Ehsani, Jordi Salvador, Winson Han, Eric Kolve, Aniruddha Kembhavi, and Roozbeh Mottaghi. Procthor: Large-scale embodied ai using procedural generation.Advances in Neural Information Processing Systems, 35:5982–5994, 2022

work page 2022
[71]

Objaverse: A universe of annotated 3d objects

Matt Deitke, Dustin Schwenk, Jordi Salvador, Luca Weihs, Oscar Michel, Eli VanderBilt, Ludwig Schmidt, Kiana Ehsani, Aniruddha Kembhavi, and Ali Farhadi. Objaverse: A universe of annotated 3d objects. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13142–13153, 2023

work page 2023
[72]

AI2-THOR: An Interactive 3D Environment for Visual AI

Eric Kolve, Roozbeh Mottaghi, Winson Han, Eli VanderBilt, Luca Weihs, Alvaro Herrasti, Matt Deitke, Kiana Ehsani, Daniel Gordon, Yuke Zhu, et al. Ai2-thor: An interactive 3d environment for visual ai.arXiv preprint arXiv:1712.05474, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[73]

Wright.Numerical Optimization

Jorge Nocedal and Stephen J. Wright.Numerical Optimization. Springer, New York, NY , USA, second edition, 2006

work page 2006
[74]

First order constrained optimization in policy space.Advances in Neural Information Processing Systems, 33:15338–15349, 2020

Yiming Zhang, Quan Vuong, and Keith Ross. First order constrained optimization in policy space.Advances in Neural Information Processing Systems, 33:15338–15349, 2020

work page 2020
[75]

Poliformer: Scaling on-policy rl with transformers results in masterful navigators.arXiv preprint arXiv:2406.20083, 2024

Kuo-Hao Zeng, Zichen Zhang, Kiana Ehsani, Rose Hendrix, Jordi Salvador, Alvaro Herrasti, Ross Girshick, Aniruddha Kembhavi, and Luca Weihs. Poliformer: Scaling on-policy rl with transformers results in masterful navigators.arXiv preprint arXiv:2406.20083, 2024

work page arXiv 2024
[76]

Simple but effective: Clip embeddings for embodied ai

Apoorv Khandelwal, Luca Weihs, Roozbeh Mottaghi, and Aniruddha Kembhavi. Simple but effective: Clip embeddings for embodied ai. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14829–14838, 2022. 15

work page 2022
[77]

Selective visual representations improve convergence and generalization for embodied ai.arXiv preprint arXiv:2311.04193, 2023

Ainaz Eftekhar, Kuo-Hao Zeng, Jiafei Duan, Ali Farhadi, Ani Kembhavi, and Ranjay Krishna. Selective visual representations improve convergence and generalization for embodied ai.arXiv preprint arXiv:2311.04193, 2023

work page arXiv 2023
[78]

A constraint-based method for solving sequential manipulation planning problems

Tomás Lozano-Pérez and Leslie Pack Kaelbling. A constraint-based method for solving sequential manipulation planning problems. In2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 3684–3691. IEEE, 2014

work page 2014
[79]

A real-time approach for chance- constrained motion planning with dynamic obstacles.IEEE Robotics and Automation Letters, 5(2):3620–3625, 2020

Manuel Castillo-Lopez, Philippe Ludivig, Seyed Amin Sajadi-Alamdari, Jose Luis Sanchez- Lopez, Miguel A Olivares-Mendez, and Holger V oos. A real-time approach for chance- constrained motion planning with dynamic obstacles.IEEE Robotics and Automation Letters, 5(2):3620–3625, 2020

work page 2020
[80]

Foundationpose: Unified 6d pose estimation and tracking of novel objects

Bowen Wen, Wei Yang, Jan Kautz, and Stan Birchfield. Foundationpose: Unified 6d pose estimation and tracking of novel objects. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17868–17879, 2024

work page 2024

Showing first 80 references.

[1] [1]

Aligning cyber space with physical world: A comprehensive survey on embodied ai.arXiv preprint arXiv:2407.06886, 2024

Yang Liu, Weixing Chen, Yongjie Bai, Xiaodan Liang, Guanbin Li, Wen Gao, and Liang Lin. Aligning cyber space with physical world: A comprehensive survey on embodied ai.arXiv preprint arXiv:2407.06886, 2024

work page arXiv 2024

[2] [2]

RT-1: Robotics Transformer for Real-World Control at Scale

Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Joseph Dabis, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, Jasmine Hsu, et al. Rt-1: Robotics transformer for real-world control at scale.arXiv preprint arXiv:2212.06817, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[3] [3]

Open X-Embodiment: Robotic Learning Datasets and RT-X Models

Abby O’Neill, Abdul Rehman, Abhinav Gupta, Abhiram Maddukuri, Abhishek Gupta, Ab- hishek Padalkar, Abraham Lee, Acorn Pooley, Agrim Gupta, Ajay Mandlekar, et al. Open x-embodiment: Robotic learning datasets and rt-x models.arXiv preprint arXiv:2310.08864, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[4] [4]

Octo: An Open-Source Generalist Robot Policy

Octo Model Team, Dibya Ghosh, Homer Walke, Karl Pertsch, Kevin Black, Oier Mees, Sudeep Dasari, Joey Hejna, Tobias Kreiman, Charles Xu, et al. Octo: An open-source generalist robot policy.arXiv preprint arXiv:2405.12213, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[5] [5]

OpenVLA: An Open-Source Vision-Language-Action Model

Moo Jin Kim, Karl Pertsch, Siddharth Karamcheti, Ted Xiao, Ashwin Balakrishna, Suraj Nair, Rafael Rafailov, Ethan Foster, Grace Lam, Pannag Sanketi, et al. Openvla: An open-source vision-language-action model.arXiv preprint arXiv:2406.09246, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[6] [6]

A Generalist Agent

Scott Reed, Konrad Zolna, Emilio Parisotto, Sergio Gomez Colmenarejo, Alexander Novikov, Gabriel Barth-Maron, Mai Gimenez, Yury Sulsky, Jackie Kay, Jost Tobias Springenberg, et al. A generalist agent.arXiv preprint arXiv:2205.06175, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[7] [7]

A Survey on Vision-Language-Action Models for Embodied AI

Yueen Ma, Zixing Song, Yuzheng Zhuang, Jianye Hao, and Irwin King. A survey on vision- language-action models for embodied ai.arXiv preprint arXiv:2405.14093, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[8] [8]

Challenges and applications of large language models.arXiv preprint arXiv:2307.10169, 2023

Jean Kaddour, Joshua Harris, Maximilian Mozes, Herbie Bradley, Roberta Raileanu, and Robert McHardy. Challenges and applications of large language models.arXiv preprint arXiv:2307.10169, 2023

work page arXiv 2023

[9] [9]

AI Alignment: A Comprehensive Survey

Jiaming Ji, Tianyi Qiu, Boyuan Chen, Borong Zhang, Hantao Lou, Kaile Wang, Yawen Duan, Zhonghao He, Jiayi Zhou, Zhaowei Zhang, et al. Ai alignment: A comprehensive survey.arXiv preprint arXiv:2310.19852, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[10] [10]

The Llama 3 Herd of Models

Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et al. The llama 3 herd of models.arXiv preprint arXiv:2407.21783, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[11] [11]

Openai o1 system card

OpenAI. Openai o1 system card. https://cdn.openai.com/o1-system-card-2024120 5.pdf, 2024

work page 2024

[12] [12]

DeepSeek-V3 Technical Report

Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, et al. Deepseek-v3 technical report.arXiv preprint arXiv:2412.19437, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[13] [13]

Language models resist alignment: Evidence from data compression

Jiaming Ji, Kaile Wang, Tianyi Alex Qiu, Boyuan Chen, Jiayi Zhou, Changye Li, Hantao Lou, Josef Dai, Yunhuai Liu, and Yaodong Yang. Language models resist alignment: Evidence from data compression. In Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar, editors,Proceedings of the 63rd Annual Meeting of the Association for Computa- ...

work page 2025

[14] [14]

Shadows of intelligence: A comprehensive survey of ai deception

PKU-Alignment Group and Collaborators. Shadows of intelligence: A comprehensive survey of ai deception. https://deceptionsurvey.com/, 2025. Beta Version V2 (v1 updated on August 28, 2025; v2 updated on September 24, 2025). Preprint to appear

work page 2025

[15] [15]

Reinforced Self-Training (ReST) for Language Modeling

Caglar Gulcehre, Tom Le Paine, Srivatsan Srinivasan, Ksenia Konyushkova, Lotte Weerts, Abhishek Sharma, Aditya Siddhant, Alex Ahern, Miaosen Wang, Chenjie Gu, et al. Reinforced self-training (rest) for language modeling.arXiv preprint arXiv:2308.08998, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[16] [16]

Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations

Hakan Inan, Kartikeya Upasani, Jianfeng Chi, Rashi Rungta, Krithika Iyer, Yuning Mao, Michael Tontchev, Qing Hu, Brian Fuller, Davide Testuggine, et al. Llama guard: Llm-based input-output safeguard for human-ai conversations.arXiv preprint arXiv:2312.06674, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[17] [17]

Llama Guard 3 Vision: Safeguarding Human-AI Image Understanding Conversations

Jianfeng Chi, Ujjwal Karn, Hongyuan Zhan, Eric Smith, Javier Rando, Yiming Zhang, Kate Plawiak, Zacharie Delpierre Coudert, Kartikeya Upasani, and Mahesh Pasupuleti. Llama guard 3 vision: Safeguarding human-ai image understanding conversations.arXiv preprint arXiv:2411.10414, 2024

work page arXiv 2024

[18] [18]

Training language models to follow instructions with human feedback.Advances in neural information processing systems, 35:27730–27744, 2022

Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. Training language models to follow instructions with human feedback.Advances in neural information processing systems, 35:27730–27744, 2022

work page 2022

[19] [19]

Llama 2: Open Foundation and Fine-Tuned Chat Models

Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. Llama 2: Open foundation and fine-tuned chat models.arXiv preprint arXiv:2307.09288, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[20] [20]

Safe RLHF: Safe Reinforcement Learning from Human Feedback

Josef Dai, Xuehai Pan, Ruiyang Sun, Jiaming Ji, Xinbo Xu, Mickel Liu, Yizhou Wang, and Yaodong Yang. Safe rlhf: Safe reinforcement learning from human feedback.arXiv preprint arXiv:2310.12773, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[21] [21]

Align anything: Training all-modality models to follow instructions with language feedback.arXiv preprint arXiv:2412.15838, 2024

Jiaming Ji, Jiayi Zhou, Hantao Lou, Boyuan Chen, Donghai Hong, Xuyao Wang, Wenqi Chen, Kaile Wang, Rui Pan, Jiahao Li, et al. Align anything: Training all-modality models to follow instructions with language feedback.arXiv preprint arXiv:2412.15838, 2024

work page arXiv 2024

[22] [22]

Sequence to sequence reward modeling: Improving rlhf by language feedback

Jiayi Zhou, Jiaming Ji, Josef Dai, and Yaodong Yang. Sequence to sequence reward modeling: Improving rlhf by language feedback. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 27765–27773, 2025

work page 2025

[23] [23]

Aligner: Efficient alignment by learning to correct

Jiaming Ji, Boyuan Chen, Hantao Lou, Donghai Hong, Borong Zhang, Xuehai Pan, Tianyi Alex Qiu, Juntao Dai, and Yaodong Yang. Aligner: Efficient alignment by learning to correct. Advances in Neural Information Processing Systems, 37:90853–90890, 2024

work page 2024

[24] [24]

Med-aligner empowers llm medical applications for complex medical scenarios.The Innovation, page 101002, 2025

Xiangbin Meng, Jia-ming Ji, Xiangyu Yan, Jun-tao Dai, Bo-yuan Chen, Guan Wang, Hua Xu, Jing-jia Wang, Xu-liang Wang, Da Liu, et al. Med-aligner empowers llm medical applications for complex medical scenarios.The Innovation, page 101002, 2025

work page 2025

[25] [25]

Generative rlhf-v: Learning principles from multi-modal human preference.arXiv preprint arXiv:2505.18531, 2025

Jiayi Zhou, Jiaming Ji, Boyuan Chen, Jiapeng Sun, Wenqi Chen, Donghai Hong, Sirui Han, Yike Guo, and Yaodong Yang. Generative rlhf-v: Learning principles from multi-modal human preference.arXiv preprint arXiv:2505.18531, 2025

work page arXiv 2025

[26] [26]

Intermt: Multi-turn interleaved preference alignment with human feedback.arXiv preprint arXiv:2505.23950, 2025

Boyuan Chen, Donghai Hong, Jiaming Ji, Jiacheng Zheng, Bowen Dong, Jiayi Zhou, Kaile Wang, Juntao Dai, Xuyao Wang, Wenqi Chen, et al. Intermt: Multi-turn interleaved preference alignment with human feedback.arXiv preprint arXiv:2505.23950, 2025

work page arXiv 2025

[27] [27]

Safety-critical advanced robots: A survey.Robotics and Autonomous Systems, 94:43–52, 2017

Jérémie Guiochet, Mathilde Machin, and Hélène Waeselynck. Safety-critical advanced robots: A survey.Robotics and Autonomous Systems, 94:43–52, 2017

work page 2017

[28] [28]

Flare: Achieving masterful and adaptive robot policies with large-scale reinforcement learning fine-tuning.arXiv preprint arXiv:2409.16578, 2024

Jiaheng Hu, Rose Hendrix, Ali Farhadi, Aniruddha Kembhavi, Roberto Martín-Martín, Peter Stone, Kuo-Hao Zeng, and Kiana Ehsani. Flare: Achieving masterful and adaptive robot policies with large-scale reinforcement learning fine-tuning.arXiv preprint arXiv:2409.16578, 2024

work page arXiv 2024

[29] [29]

Grape: Generalizing robot policy via preference alignment.arXiv preprint arXiv:2411.19309, 2024

Zijian Zhang, Kaiyuan Zheng, Zhaorun Chen, Joel Jang, Yi Li, Chaoqi Wang, Mingyu Ding, Dieter Fox, and Huaxiu Yao. Grape: Generalizing robot policy via preference alignment.arXiv preprint arXiv:2411.19309, 2024. 12

work page arXiv 2024

[30] [30]

RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Xi Chen, Krzysztof Choro- manski, Tianli Ding, Danny Driess, Avinava Dubey, Chelsea Finn, et al. Rt-2: Vision-language- action models transfer web knowledge to robotic control.arXiv preprint arXiv:2307.15818, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[31] [31]

Rt-trajectory: Robotic task generalization via hindsight trajectory sketches,

Jiayuan Gu, Sean Kirmani, Paul Wohlhart, Yao Lu, Montserrat Gonzalez Arenas, Kanishka Rao, Wenhao Yu, Chuyuan Fu, Keerthana Gopalakrishnan, Zhuo Xu, et al. Rt-trajectory: Robotic task generalization via hindsight trajectory sketches.arXiv preprint arXiv:2311.01977, 2023

work page arXiv 2023

[32] [32]

Spoc: Imitating shortest paths in simulation enables effective navigation and manipulation in the real world

Kiana Ehsani, Tanmay Gupta, Rose Hendrix, Jordi Salvador, Luca Weihs, Kuo-Hao Zeng, Ku- nal Pratap Singh, Yejin Kim, Winson Han, Alvaro Herrasti, et al. Spoc: Imitating shortest paths in simulation enables effective navigation and manipulation in the real world. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 162...

work page 2024

[33] [33]

RT-H: Action Hierarchies Using Language

Suneel Belkhale, Tianli Ding, Ted Xiao, Pierre Sermanet, Quon Vuong, Jonathan Tompson, Yevgen Chebotar, Debidatta Dwibedi, and Dorsa Sadigh. Rt-h: Action hierarchies using language.arXiv preprint arXiv:2403.01823, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[34] [34]

$\pi_0$: A Vision-Language-Action Flow Model for General Robot Control

Kevin Black, Noah Brown, Danny Driess, Adnan Esmail, Michael Equi, Chelsea Finn, Niccolo Fusai, Lachy Groom, Karol Hausman, Brian Ichter, et al. pi0 : A vision-language-action flow model for general robot control.arXiv preprint arXiv:2410.24164, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[35] [35]

RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation

Songming Liu, Lingxuan Wu, Bangguo Li, Hengkai Tan, Huayu Chen, Zhengyi Wang, Ke Xu, Hang Su, and Jun Zhu. Rdt-1b: a diffusion foundation model for bimanual manipulation.arXiv preprint arXiv:2410.07864, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[36] [36]

Safety bounds in human robot interaction: A survey.Safety science, 127:104667, 2020

Angeliki Zacharaki, Ioannis Kostavelis, Antonios Gasteratos, and Ioannis Dokas. Safety bounds in human robot interaction: A survey.Safety science, 127:104667, 2020

work page 2020

[37] [37]

Governing ai safety through independent audits.Nature Machine Intelligence, 3(7):566–571, 2021

Gregory Falco, Ben Shneiderman, Julia Badger, Ryan Carrier, Anton Dahbura, David Danks, Martin Eling, Alwyn Goodloe, Jerry Gupta, Christopher Hart, et al. Governing ai safety through independent audits.Nature Machine Intelligence, 3(7):566–571, 2021

work page 2021

[38] [38]

Routledge, 2021

Eitan Altman.Constrained Markov decision processes. Routledge, 2021

work page 2021

[39] [39]

Omnisafe: An infrastructure for accelerating safe reinforcement learning research.Journal of Machine Learning Research, 25(285):1–6, 2024

Jiaming Ji, Jiayi Zhou, Borong Zhang, Juntao Dai, Xuehai Pan, Ruiyang Sun, Weidong Huang, Yiran Geng, Mickel Liu, and Yaodong Yang. Omnisafe: An infrastructure for accelerating safe reinforcement learning research.Journal of Machine Learning Research, 25(285):1–6, 2024

work page 2024

[40] [40]

A Survey on Vision-Language-Action Models: An Action Tokenization Perspective

Yifan Zhong, Fengshuo Bai, Shaofei Cai, Xuchuan Huang, Zhang Chen, Xiaowei Zhang, Yuanfei Wang, Shaoyang Guo, Tianrui Guan, Ka Nam Lui, et al. A survey on vision-language- action models: An action tokenization perspective.arXiv preprint arXiv:2507.01925, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[41] [41]

Robotic Control via Embodied Chain-of-Thought Reasoning

Michał Zawalski, William Chen, Karl Pertsch, Oier Mees, Chelsea Finn, and Sergey Levine. Robotic control via embodied chain-of-thought reasoning.arXiv preprint arXiv:2407.08693, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[42] [42]

FAST: Efficient Action Tokenization for Vision-Language-Action Models

Karl Pertsch, Kyle Stachowicz, Brian Ichter, Danny Driess, Suraj Nair, Quan Vuong, Oier Mees, Chelsea Finn, and Sergey Levine. Fast: Efficient action tokenization for vision-language-action models.arXiv preprint arXiv:2501.09747, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[43] [43]

Towards testing and evaluating vision-language-action models for robotic manipulation: An empirical study.arXiv preprint arXiv:2409.12894, 2024

Zhijie Wang, Zhehua Zhou, Jiayang Song, Yuheng Huang, Zhan Shu, and Lei Ma. Towards testing and evaluating vision-language-action models for robotic manipulation: An empirical study.arXiv preprint arXiv:2409.12894, 2024

work page arXiv 2024

[44] [44]

Learning to act anywhere with task-centric latent actions.arXiv preprint arXiv:2502.14420, 2025

Qingwen Bu, Yanting Yang, Jisong Cai, Shenyuan Gao, Guanghui Ren, Maoqing Yao, Ping Luo, and Hongyang Li. Learning to act anywhere with task-centric latent actions.arXiv preprint arXiv:2502.14420, 2025

work page arXiv 2025

[45] [45]

Dexgraspvla: A vision-language-action framework towards general dexterous grasping.arXiv preprint arXiv:2502.20900, 2025

Yifan Zhong, Xuchuan Huang, Ruochong Li, Ceyao Zhang, Zhang Chen, Tianrui Guan, Fanlian Zeng, Ka Num Lui, Yuyao Ye, Yitao Liang, et al. Dexgraspvla: A vision-language-action framework towards general dexterous grasping.arXiv preprint arXiv:2502.20900, 2025. 13

work page arXiv 2025

[46] [46]

Gemini Robotics: Bringing AI into the Physical World

Gemini Robotics Team, Saminda Abeyruwan, Joshua Ainslie, Jean-Baptiste Alayrac, Montser- rat Gonzalez Arenas, Travis Armstrong, Ashwin Balakrishna, Robert Baruch, Maria Bauza, Michiel Blokzijl, et al. Gemini robotics: Bringing ai into the physical world.arXiv preprint arXiv:2503.20020, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[47] [47]

Hi Robot: Open-Ended Instruction Following with Hierarchical Vision-Language-Action Models

Lucy Xiaoyang Shi, Brian Ichter, Michael Equi, Liyiming Ke, Karl Pertsch, Quan Vuong, James Tanner, Anna Walling, Haohuan Wang, Niccolo Fusai, et al. Hi robot: Open-ended instruction following with hierarchical vision-language-action models.arXiv preprint arXiv:2502.19417, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[48] [48]

Cot-vla: Visual chain-of-thought reasoning for vision-language-action models

Qingqing Zhao, Yao Lu, Moo Jin Kim, Zipeng Fu, Zhuoyang Zhang, Yecheng Wu, Zhaoshuo Li, Qianli Ma, Song Han, Chelsea Finn, et al. Cot-vla: Visual chain-of-thought reasoning for vision-language-action models. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 1702–1713, 2025

work page 2025

[49] [49]

Vision-language-action model with open-world embodied reasoning from pretrained knowledge.arXiv preprint arXiv:2505.21906, 2025

Zhongyi Zhou, Yichen Zhu, Junjie Wen, Chaomin Shen, and Yi Xu. Vision-language-action model with open-world embodied reasoning from pretrained knowledge.arXiv preprint arXiv:2505.21906, 2025

work page arXiv 2025

[50] [50]

TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies

Ruijie Zheng, Yongyuan Liang, Shuaiyi Huang, Jianfeng Gao, Hal Daumé III, Andrey Kolobov, Furong Huang, and Jianwei Yang. Tracevla: Visual trace prompting enhances spatial-temporal awareness for generalist robotic policies.arXiv preprint arXiv:2412.10345, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[51] [51]

Constitutional AI: Harmlessness from AI Feedback

Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, et al. Constitutional ai: Harmlessness from ai feedback.arXiv preprint arXiv:2212.08073, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[52] [52]

Beavertails: Towards improved safety alignment of llm via a human-preference dataset.Advances in Neural Information Processing Systems, 36, 2024

Jiaming Ji, Mickel Liu, Josef Dai, Xuehai Pan, Chi Zhang, Ce Bian, Boyuan Chen, Ruiyang Sun, Yizhou Wang, and Yaodong Yang. Beavertails: Towards improved safety alignment of llm via a human-preference dataset.Advances in Neural Information Processing Systems, 36, 2024

work page 2024

[53] [53]

Safe rlhf-v: Safe reinforcement learning from multi-modal human feedback.arXiv preprint arXiv:2503.17682, 2025

Jiaming Ji, Xinyu Chen, Rui Pan, Conghui Zhang, Han Zhu, Jiahao Li, Donghai Hong, Boyuan Chen, Jiayi Zhou, Kaile Wang, et al. Safe rlhf-v: Safe reinforcement learning from multi-modal human feedback.arXiv preprint arXiv:2503.17682, 2025

work page arXiv 2025

[54] [54]

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

Deep Ganguli, Liane Lovitt, Jackson Kernion, Amanda Askell, Yuntao Bai, Saurav Kadavath, Ben Mann, Ethan Perez, Nicholas Schiefer, Kamal Ndousse, et al. Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned.arXiv preprint arXiv:2209.07858, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[55] [55]

An overview of catastrophic ai risks

Dan Hendrycks, Mantas Mazeika, and Thomas Woodside. An overview of catastrophic ai risks. arXiv preprint arXiv:2306.12001, 2023

work page arXiv 2023

[56] [56]

arXiv preprint arXiv:2205.10330 (2022)

Shangding Gu, Long Yang, Yali Du, Guang Chen, Florian Walter, Jun Wang, and Alois Knoll. A review of safe reinforcement learning: Methods, theory and applications.arXiv preprint arXiv:2205.10330, 2022

work page arXiv 2022

[57] [57]

Artificial intelligence act.Regulamento da União Europeia (UE), 1689, 2024

Artificial Intelligence Act. Artificial intelligence act.Regulamento da União Europeia (UE), 1689, 2024

work page 2024

[58] [58]

MolmoAct: Action Reasoning Models that can Reason in Space

Jason Lee, Jiafei Duan, Haoquan Fang, Yuquan Deng, Shuo Liu, Boyang Li, Bohan Fang, Jieyu Zhang, Yi Ru Wang, Sangho Lee, et al. Molmoact: Action reasoning models that can reason in space.arXiv preprint arXiv:2508.07917, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[59] [59]

Responsive safety in reinforcement learning by pid lagrangian methods

Adam Stooke, Joshua Achiam, and Pieter Abbeel. Responsive safety in reinforcement learning by pid lagrangian methods. InInternational Conference on Machine Learning, pages 9133–9143. PMLR, 2020

work page 2020

[60] [60]

Augmented proximal policy optimization for safe reinforcement learning

Juntao Dai, Jiaming Ji, Long Yang, Qian Zheng, and Gang Pan. Augmented proximal policy optimization for safe reinforcement learning. InProceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 7288–7295, 2023

work page 2023

[61] [61]

AI Safety Gridworlds

Jan Leike, Miljan Martic, Victoria Krakovna, Pedro A Ortega, Tom Everitt, Andrew Lefrancq, Laurent Orseau, and Shane Legg. Ai safety gridworlds.arXiv preprint arXiv:1711.09883, 2017. 14

work page internal anchor Pith review Pith/arXiv arXiv 2017

[62] [62]

Safe-control-gym: A unified benchmark suite for safe learning- based control and reinforcement learning in robotics.IEEE Robotics and Automation Letters, 7(4):11142–11149, 2022

Zhaocong Yuan, Adam W Hall, Siqi Zhou, Lukas Brunke, Melissa Greeff, Jacopo Panerati, and Angela P Schoellig. Safe-control-gym: A unified benchmark suite for safe learning- based control and reinforcement learning in robotics.IEEE Robotics and Automation Letters, 7(4):11142–11149, 2022

work page 2022

[63] [63]

Safety gymnasium: A unified safe reinforcement learning benchmark.Advances in Neural Information Processing Systems, 36:18964–18993, 2023

Jiaming Ji, Borong Zhang, Jiayi Zhou, Xuehai Pan, Weidong Huang, Ruiyang Sun, Yiran Geng, Yifan Zhong, Josef Dai, and Yaodong Yang. Safety gymnasium: A unified safe reinforcement learning benchmark.Advances in Neural Information Processing Systems, 36:18964–18993, 2023

work page 2023

[64] [64]

Hasard: A benchmark for vision-based safe reinforcement learning in embodied agents.arXiv preprint arXiv:2503.08241, 2025

Tristan Tomilin, Meng Fang, and Mykola Pechenizkiy. Hasard: A benchmark for vision-based safe reinforcement learning in embodied agents.arXiv preprint arXiv:2503.08241, 2025

work page arXiv 2025

[65] [65]

Rlbench: The robot learning benchmark & learning environment.IEEE Robotics and Automation Letters, 5(2):3019–3026, 2020

Stephen James, Zicong Ma, David Rovick Arrojo, and Andrew J Davison. Rlbench: The robot learning benchmark & learning environment.IEEE Robotics and Automation Letters, 5(2):3019–3026, 2020

work page 2020

[66] [66]

Calvin: A benchmark for language-conditioned policy learning for long-horizon robot manipulation tasks.IEEE Robotics and Automation Letters, 7(3):7327–7334, 2022

Oier Mees, Lukas Hermann, Erick Rosete-Beas, and Wolfram Burgard. Calvin: A benchmark for language-conditioned policy learning for long-horizon robot manipulation tasks.IEEE Robotics and Automation Letters, 7(3):7327–7334, 2022

work page 2022

[67] [67]

Vlabench: A large-scale benchmark for language-conditioned robotics manipulation with long-horizon reasoning tasks.arXiv preprint arXiv:2412.18194, 2024

Shiduo Zhang, Zhe Xu, Peiju Liu, Xiaopeng Yu, Yuan Li, Qinghui Gao, Zhaoye Fei, Zhangyue Yin, Zuxuan Wu, Yu-Gang Jiang, et al. Vlabench: A large-scale benchmark for language-conditioned robotics manipulation with long-horizon reasoning tasks.arXiv preprint arXiv:2412.18194, 2024

work page arXiv 2024

[68] [68]

Vision-and-language navigation: Interpreting visually-grounded navigation instructions in real environments

Peter Anderson, Qi Wu, Damien Teney, Jake Bruce, Mark Johnson, Niko Sünderhauf, Ian Reid, Stephen Gould, and Anton Van Den Hengel. Vision-and-language navigation: Interpreting visually-grounded navigation instructions in real environments. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 3674–3683, 2018

work page 2018

[69] [69]

Robothor: An open simulation-to-real embodied ai platform

Matt Deitke, Winson Han, Alvaro Herrasti, Aniruddha Kembhavi, Eric Kolve, Roozbeh Mot- taghi, Jordi Salvador, Dustin Schwenk, Eli VanderBilt, Matthew Wallingford, et al. Robothor: An open simulation-to-real embodied ai platform. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3164–3174, 2020

work page 2020

[70] [70]

Procthor: Large-scale embodied ai using procedural generation.Advances in Neural Information Processing Systems, 35:5982–5994, 2022

Matt Deitke, Eli VanderBilt, Alvaro Herrasti, Luca Weihs, Kiana Ehsani, Jordi Salvador, Winson Han, Eric Kolve, Aniruddha Kembhavi, and Roozbeh Mottaghi. Procthor: Large-scale embodied ai using procedural generation.Advances in Neural Information Processing Systems, 35:5982–5994, 2022

work page 2022

[71] [71]

Objaverse: A universe of annotated 3d objects

Matt Deitke, Dustin Schwenk, Jordi Salvador, Luca Weihs, Oscar Michel, Eli VanderBilt, Ludwig Schmidt, Kiana Ehsani, Aniruddha Kembhavi, and Ali Farhadi. Objaverse: A universe of annotated 3d objects. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13142–13153, 2023

work page 2023

[72] [72]

AI2-THOR: An Interactive 3D Environment for Visual AI

Eric Kolve, Roozbeh Mottaghi, Winson Han, Eli VanderBilt, Luca Weihs, Alvaro Herrasti, Matt Deitke, Kiana Ehsani, Daniel Gordon, Yuke Zhu, et al. Ai2-thor: An interactive 3d environment for visual ai.arXiv preprint arXiv:1712.05474, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[73] [73]

Wright.Numerical Optimization

Jorge Nocedal and Stephen J. Wright.Numerical Optimization. Springer, New York, NY , USA, second edition, 2006

work page 2006

[74] [74]

First order constrained optimization in policy space.Advances in Neural Information Processing Systems, 33:15338–15349, 2020

Yiming Zhang, Quan Vuong, and Keith Ross. First order constrained optimization in policy space.Advances in Neural Information Processing Systems, 33:15338–15349, 2020

work page 2020

[75] [75]

Poliformer: Scaling on-policy rl with transformers results in masterful navigators.arXiv preprint arXiv:2406.20083, 2024

Kuo-Hao Zeng, Zichen Zhang, Kiana Ehsani, Rose Hendrix, Jordi Salvador, Alvaro Herrasti, Ross Girshick, Aniruddha Kembhavi, and Luca Weihs. Poliformer: Scaling on-policy rl with transformers results in masterful navigators.arXiv preprint arXiv:2406.20083, 2024

work page arXiv 2024

[76] [76]

Simple but effective: Clip embeddings for embodied ai

Apoorv Khandelwal, Luca Weihs, Roozbeh Mottaghi, and Aniruddha Kembhavi. Simple but effective: Clip embeddings for embodied ai. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14829–14838, 2022. 15

work page 2022

[77] [77]

Selective visual representations improve convergence and generalization for embodied ai.arXiv preprint arXiv:2311.04193, 2023

Ainaz Eftekhar, Kuo-Hao Zeng, Jiafei Duan, Ali Farhadi, Ani Kembhavi, and Ranjay Krishna. Selective visual representations improve convergence and generalization for embodied ai.arXiv preprint arXiv:2311.04193, 2023

work page arXiv 2023

[78] [78]

A constraint-based method for solving sequential manipulation planning problems

Tomás Lozano-Pérez and Leslie Pack Kaelbling. A constraint-based method for solving sequential manipulation planning problems. In2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 3684–3691. IEEE, 2014

work page 2014

[79] [79]

A real-time approach for chance- constrained motion planning with dynamic obstacles.IEEE Robotics and Automation Letters, 5(2):3620–3625, 2020

Manuel Castillo-Lopez, Philippe Ludivig, Seyed Amin Sajadi-Alamdari, Jose Luis Sanchez- Lopez, Miguel A Olivares-Mendez, and Holger V oos. A real-time approach for chance- constrained motion planning with dynamic obstacles.IEEE Robotics and Automation Letters, 5(2):3620–3625, 2020

work page 2020

[80] [80]

Foundationpose: Unified 6d pose estimation and tracking of novel objects

Bowen Wen, Wei Yang, Jan Kautz, and Stan Birchfield. Foundationpose: Unified 6d pose estimation and tracking of novel objects. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17868–17879, 2024

work page 2024