Introduces an app-content instrumentation framework and benchmark showing that examined GUI agents suffer 42.0% and 36.1% average misleading rates from third-party content in dynamic and static tests respectively.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
representative citing papers
ToxPrune prunes toxic subwords from BPE tokenizers in LLMs to mitigate toxic dialogue responses and improve diversity on both toxic and non-toxic models.
citing papers explorer
-
Mobile GUI Agents under Real-world Threats: Are We There Yet?
Introduces an app-content instrumentation framework and benchmark showing that examined GUI agents suffer 42.0% and 36.1% average misleading rates from third-party content in dynamic and static tests respectively.
-
Toxic Subword Pruning for Dialogue Response Generation on Large Language Models
ToxPrune prunes toxic subwords from BPE tokenizers in LLMs to mitigate toxic dialogue responses and improve diversity on both toxic and non-toxic models.