StressWeb is a new diagnostic benchmark that applies structured perturbations to web interactions to expose robustness failures in LLM-based agents that standard clean evaluations miss.
Text") a:has-text(
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.SE 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
StressWeb: A Diagnostic Benchmark for Web Agent Robustness under Realistic Interaction Variability
StressWeb is a new diagnostic benchmark that applies structured perturbations to web interactions to expose robustness failures in LLM-based agents that standard clean evaluations miss.