Is- bench: Evaluating interactive safety of vlm-driven embodied agents in daily household tasks

· 2025 · arXiv 2506.16402

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Taxonomy and Consistency Analysis of Safety Benchmarks for AI Agents

cs.CY · 2026-04-11 · accept · novelty 8.0

This paper delivers the first systematic taxonomy and cross-benchmark consistency analysis of 40 agent safety benchmarks, finding broad but shallow risk coverage, no ranking concordance across evaluations, and that benchmark choice systematically alters reported safety.

Using large language models for embodied planning introduces systematic safety risks

cs.AI · 2026-04-20 · unverdicted · novelty 7.0

LLM planners for robots often produce dangerous plans even when planning succeeds, with safety awareness staying flat as model scale improves planning ability.

HazardArena: Evaluating Semantic Safety in Vision-Language-Action Models

cs.RO · 2026-04-14 · unverdicted · novelty 7.0

HazardArena shows VLA models trained on safe data frequently produce unsafe actions in semantically risky but visually similar settings, and a training-free Safety Option Layer reduces those failures with little performance cost.

From Perception to Planning: Evolving Ego-Centric Task-Oriented Spatiotemporal Reasoning via Curriculum Learning

cs.AI · 2026-04-12 · unverdicted · novelty 6.0

EgoTSR applies a three-stage curriculum on a 46-million-sample dataset to build egocentric spatiotemporal reasoning, reaching 92.4% accuracy on long-horizon tasks and reducing chronological biases.

citing papers explorer

Showing 4 of 4 citing papers.

Taxonomy and Consistency Analysis of Safety Benchmarks for AI Agents cs.CY · 2026-04-11 · accept · none · ref 31
This paper delivers the first systematic taxonomy and cross-benchmark consistency analysis of 40 agent safety benchmarks, finding broad but shallow risk coverage, no ranking concordance across evaluations, and that benchmark choice systematically alters reported safety.
Using large language models for embodied planning introduces systematic safety risks cs.AI · 2026-04-20 · unverdicted · none · ref 8
LLM planners for robots often produce dangerous plans even when planning succeeds, with safety awareness staying flat as model scale improves planning ability.
HazardArena: Evaluating Semantic Safety in Vision-Language-Action Models cs.RO · 2026-04-14 · unverdicted · none · ref 14
HazardArena shows VLA models trained on safe data frequently produce unsafe actions in semantically risky but visually similar settings, and a training-free Safety Option Layer reduces those failures with little performance cost.
From Perception to Planning: Evolving Ego-Centric Task-Oriented Spatiotemporal Reasoning via Curriculum Learning cs.AI · 2026-04-12 · unverdicted · none · ref 13
EgoTSR applies a three-stage curriculum on a 46-million-sample dataset to build egocentric spatiotemporal reasoning, reaching 92.4% accuracy on long-horizon tasks and reducing chronological biases.

Is- bench: Evaluating interactive safety of vlm-driven embodied agents in daily household tasks

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer