SafeSearch: Automated Red-Teaming of LLM-Based Search Agents

Han Qiu; Hao Wang; Jianshuo Dong; Ke Xu; Minlie Huang; Sheng Guo; Tianwei Zhang; Xun Chen; Zhuotao Liu

arxiv: 2509.23694 · v6 · pith:WOJK4LFMnew · submitted 2025-09-28 · 💻 cs.AI · cs.CL· cs.CR

SafeSearch: Automated Red-Teaming of LLM-Based Search Agents

Jianshuo Dong , Sheng Guo , Hao Wang , Xun Chen , Zhuotao Liu , Tianwei Zhang , Ke Xu , Minlie Huang

show 1 more author

Han Qiu

This is my paper

classification 💻 cs.AI cs.CLcs.CR

keywords searchagentsllm-basedsafesearchautomatedenablingllmsred-teaming

0 comments

read the original abstract

Search agents connect LLMs to the Internet, enabling them to access broader and more up-to-date information. However, this also introduces a new threat surface: unreliable search results can mislead agents into producing unsafe outputs. Real-world incidents and our two in-the-wild observations show that such failures can occur in practice. To study this threat systematically, we propose SafeSearch, an automated red-teaming framework that is scalable, cost-efficient, and lightweight, enabling sandboxed safety evaluation of search agents. Using this, we generate 300 test cases spanning five risk categories (e.g., misinformation and prompt injection) and evaluate three search agent scaffolds across 17 representative LLMs. Our results reveal substantial vulnerabilities in LLM-based search agents, with the highest ASR reaching 90.5% for GPT-4.1-mini in a search-workflow setting. Moreover, we find that common defenses, such as reminder prompting, offer limited protection. Overall, SafeSearch provides a practical way to measure and improve the safety of LLM-based search agents.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Securing Computer-Use Agents: A Unified Architecture-Lifecycle Framework for Deployment-Grounded Reliability
cs.CL 2026-05 unverdicted novelty 4.0

The paper develops a unified framework that organizes computer-use agent reliability around perception-decision-execution layers and creation-deployment-operation-maintenance stages to map security and alignment inter...