EARBench: Towards evaluating physical risk awareness for task planning of foundation model-based embodied AI agents

· 2024 · arXiv 2408.04449

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

RoboJailBench: Benchmarking Adversarial Attacks and Defenses in Embodied Robotic Agents

cs.CR · 2026-05-19 · unverdicted · novelty 7.0

RoboJailBench creates a taxonomy-based benchmark, intent-contrast datasets, and evaluation framework for jailbreak attacks and defenses in embodied robotic AI systems.

SafetyALFRED: Evaluating Safety-Conscious Planning of Multimodal Large Language Models

cs.AI · 2026-04-21 · unverdicted · novelty 6.0

SafetyALFRED shows multimodal LLMs recognize kitchen hazards accurately in QA tests but achieve low success rates when required to mitigate those hazards through embodied planning.

Benchmarking the Safety of Large Language Models for Robotic Health Attendant Control

cs.AI · 2026-04-29 · unverdicted · novelty 5.0

LLMs for robotic health attendant control violate safety rules in 54.4% of harmful scenarios on average, with proprietary models at 23.7% median violation versus 72.8% for open-weight models, indicating they are not yet safe for clinical use.

Harnessing Embodied Agents: Runtime Governance for Policy-Constrained Execution

cs.RO · 2026-04-09 · unverdicted · novelty 5.0 · 2 refs

A runtime governance framework for embodied agents intercepts 96.2% of unauthorized actions and achieves 91.4% recovery success in 1000 simulation trials while outperforming baselines.

Towards provable probabilistic safety for scalable embodied AI systems

eess.SY · 2025-06-05 · unverdicted · novelty 4.0

The paper proposes a paradigm of provable probabilistic safety to enable scalable, safe deployment of embodied AI in critical applications.

citing papers explorer

Showing 5 of 5 citing papers.

RoboJailBench: Benchmarking Adversarial Attacks and Defenses in Embodied Robotic Agents cs.CR · 2026-05-19 · unverdicted · none · ref 30
RoboJailBench creates a taxonomy-based benchmark, intent-contrast datasets, and evaluation framework for jailbreak attacks and defenses in embodied robotic AI systems.
SafetyALFRED: Evaluating Safety-Conscious Planning of Multimodal Large Language Models cs.AI · 2026-04-21 · unverdicted · none · ref 29
SafetyALFRED shows multimodal LLMs recognize kitchen hazards accurately in QA tests but achieve low success rates when required to mitigate those hazards through embodied planning.
Benchmarking the Safety of Large Language Models for Robotic Health Attendant Control cs.AI · 2026-04-29 · unverdicted · none · ref 17
LLMs for robotic health attendant control violate safety rules in 54.4% of harmful scenarios on average, with proprietary models at 23.7% median violation versus 72.8% for open-weight models, indicating they are not yet safe for clinical use.
Harnessing Embodied Agents: Runtime Governance for Policy-Constrained Execution cs.RO · 2026-04-09 · unverdicted · none · ref 24 · 2 links
A runtime governance framework for embodied agents intercepts 96.2% of unauthorized actions and achieves 91.4% recovery success in 1000 simulation trials while outperforming baselines.
Towards provable probabilistic safety for scalable embodied AI systems eess.SY · 2025-06-05 · unverdicted · none · ref 10
The paper proposes a paradigm of provable probabilistic safety to enable scalable, safe deployment of embodied AI in critical applications.

EARBench: Towards evaluating physical risk awareness for task planning of foundation model-based embodied AI agents

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer