RoboStressBench decomposes visual stress into four physically grounded dimensions to benchmark VLM robustness in embodied scenes and proposes a stress-aware solver.
Proceedings of the 33rd
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Humans reach 64.8% accuracy detecting synthetic legal evidence images overall but drop to chance levels on top generators, while MLLMs achieve 100% specificity yet only 5.9% detection on the hardest synthetics, with uncorrelated error patterns.
citing papers explorer
-
RoboStressBench: Benchmarking VLM Robustness to Physical Visual Stress in Embodied Scenes
RoboStressBench decomposes visual stress into four physically grounded dimensions to benchmark VLM robustness in embodied scenes and proposes a stress-aware solver.
-
Can You Trust What You See? Human and AI Detection of Synthetic Legal Evidence
Humans reach 64.8% accuracy detecting synthetic legal evidence images overall but drop to chance levels on top generators, while MLLMs achieve 100% specificity yet only 5.9% detection on the hardest synthetics, with uncorrelated error patterns.