Canonical reference

Badrobot: Jailbreaking embodied llms in the physical world

Zhang, H · 2024 · arXiv 2407.20242

Canonical reference. 100% of citing Pith papers cite this work as background.

9 Pith papers citing it

Background 100% of classified citations

read on arXiv browse 9 citing papers

citation-role summary

background 6

citation-polarity summary

background 6

representative citing papers

From Prompt to Physical Actuation: Holistic Threat Modeling of LLM-Enabled Robotic Systems

cs.CR · 2026-04-29 · unverdicted · novelty 8.0

A unified threat model for LLM-enabled robots reveals three cross-boundary attack chains from user input to unsafe physical actuation due to missing validations and unmediated crossings.

FlowHijack: A Dynamics-Aware Backdoor Attack on Flow-Matching Vision-Language-Action Models

cs.CV · 2026-03-30 · unverdicted · novelty 8.0

FlowHijack is the first dynamics-aware backdoor attack on flow-matching VLAs that achieves high success rates with stealthy triggers while preserving benign performance and making malicious actions kinematically indistinguishable from normal ones.

RoboJailBench: Benchmarking Adversarial Attacks and Defenses in Embodied Robotic Agents

cs.CR · 2026-05-19 · unverdicted · novelty 7.0

RoboJailBench creates a taxonomy-based benchmark, intent-contrast datasets, and evaluation framework for jailbreak attacks and defenses in embodied robotic AI systems.

Using large language models for embodied planning introduces systematic safety risks

cs.AI · 2026-04-20 · unverdicted · novelty 7.0

LLM planners for robots often produce dangerous plans even when planning succeeds, with safety awareness staying flat as model scale improves planning ability.

Hidden Reliability Risks in Large Language Models: Systematic Identification of Precision-Induced Output Disagreements

cs.AI · 2026-04-02 · unverdicted · novelty 7.0

PrecisionDiff is a differential testing framework that uncovers widespread precision-induced behavioral disagreements in aligned LLMs, including safety-critical jailbreak divergences across precision formats.

How Far Are VLMs from Privacy Awareness in the Physical World? An Empirical Study

cs.CR · 2026-05-06 · unverdicted · novelty 6.0 · 2 refs

Vision-language models exhibit perceptual fragility and fail to consistently respect privacy constraints when operating in simulated physical environments, with performance declining in cluttered scenes and under conflicting commands.

EmbodiedGovBench: A Benchmark for Governance, Recovery, and Upgrade Safety in Embodied Agent Systems

cs.RO · 2026-04-13 · unverdicted · novelty 6.0

EmbodiedGovBench is a new benchmark framework that measures embodied agent systems on seven governance dimensions including policy adherence, recovery success, and upgrade safety.

Toward Seamless Physical Human-Humanoid Interaction: Insights from Control, Intent, and Modeling with a Vision for What Comes Next

cs.RO · 2025-12-08 · unverdicted · novelty 5.0

A literature review of pHHI that proposes a taxonomy of interaction types by modality and engagement level while outlining pathways to integrate control, intent, and modeling for more seamless humanoid-human collaboration.

Vision-Language-Action Safety: Threats, Challenges, Evaluations, and Mechanisms

cs.RO · 2026-04-26 · accept · novelty 4.0

A literature survey that unifies fragmented work on attacks, defenses, evaluations, and deployment challenges for Vision-Language-Action models in robotics.

citing papers explorer

Showing 9 of 9 citing papers.

From Prompt to Physical Actuation: Holistic Threat Modeling of LLM-Enabled Robotic Systems cs.CR · 2026-04-29 · unverdicted · none · ref 21
A unified threat model for LLM-enabled robots reveals three cross-boundary attack chains from user input to unsafe physical actuation due to missing validations and unmediated crossings.
FlowHijack: A Dynamics-Aware Backdoor Attack on Flow-Matching Vision-Language-Action Models cs.CV · 2026-03-30 · unverdicted · none · ref 42
FlowHijack is the first dynamics-aware backdoor attack on flow-matching VLAs that achieves high success rates with stealthy triggers while preserving benign performance and making malicious actions kinematically indistinguishable from normal ones.
RoboJailBench: Benchmarking Adversarial Attacks and Defenses in Embodied Robotic Agents cs.CR · 2026-05-19 · unverdicted · none · ref 29
RoboJailBench creates a taxonomy-based benchmark, intent-contrast datasets, and evaluation framework for jailbreak attacks and defenses in embodied robotic AI systems.
Using large language models for embodied planning introduces systematic safety risks cs.AI · 2026-04-20 · unverdicted · none · ref 54
LLM planners for robots often produce dangerous plans even when planning succeeds, with safety awareness staying flat as model scale improves planning ability.
Hidden Reliability Risks in Large Language Models: Systematic Identification of Precision-Induced Output Disagreements cs.AI · 2026-04-02 · unverdicted · none · ref 51
PrecisionDiff is a differential testing framework that uncovers widespread precision-induced behavioral disagreements in aligned LLMs, including safety-critical jailbreak divergences across precision formats.
How Far Are VLMs from Privacy Awareness in the Physical World? An Empirical Study cs.CR · 2026-05-06 · unverdicted · none · ref 46 · 2 links
Vision-language models exhibit perceptual fragility and fail to consistently respect privacy constraints when operating in simulated physical environments, with performance declining in cluttered scenes and under conflicting commands.
EmbodiedGovBench: A Benchmark for Governance, Recovery, and Upgrade Safety in Embodied Agent Systems cs.RO · 2026-04-13 · unverdicted · none · ref 46
EmbodiedGovBench is a new benchmark framework that measures embodied agent systems on seven governance dimensions including policy adherence, recovery success, and upgrade safety.
Toward Seamless Physical Human-Humanoid Interaction: Insights from Control, Intent, and Modeling with a Vision for What Comes Next cs.RO · 2025-12-08 · unverdicted · none · ref 121
A literature review of pHHI that proposes a taxonomy of interaction types by modality and engagement level while outlining pathways to integrate control, intent, and modeling for more seamless humanoid-human collaboration.
Vision-Language-Action Safety: Threats, Challenges, Evaluations, and Mechanisms cs.RO · 2026-04-26 · accept · none · ref 93
A literature survey that unifies fragmented work on attacks, defenses, evaluations, and deployment challenges for Vision-Language-Action models in robotics.

Badrobot: Jailbreaking embodied llms in the physical world

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer