arxiv: 2604.20994 · v1 · submitted 2026-04-22 · 💻 cs.CR · cs.AI· cs.CL

Recognition: unknown

Breaking MCP with Function Hijacking Attacks: Novel Threats for Function Calling and Agentic Models

Yannis Belkhiter , Giulio Zizzo , Sergio Maffeis , Seshu Tirupathi , John D. Kelleher

Authors on Pith no claims yet

Pith reviewed 2026-05-09 23:59 UTC · model grok-4.3

classification 💻 cs.CR cs.AIcs.CL

keywords function hijackingagentic modelsfunction callingLLM securityadversarial attackstool selectionAI vulnerabilities

0 comments

The pith

Agentic AI models can be forced to invoke attacker-chosen functions via a hijacking attack that ignores query context and function set details.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper shows that the function calling mechanism in agentic AI systems can be manipulated so that an attacker dictates which external function gets executed. The attack does not depend on the meaning of the user's request and remains effective even when the available functions change. Tests across five models on a standard benchmark dataset produce success rates between 70 and 100 percent. If these results hold, agentic systems would be open to unauthorized actions such as data theft or disruptive loops without needing to craft semantically convincing prompts. The work therefore points to a gap in current safeguards for models that interact with external tools.

Core claim

The paper establishes that a function hijacking attack manipulates the tool selection process of agentic models to force invocation of a specific attacker-chosen function. This attack is largely agnostic to context semantics and remains robust across different function sets. It can be trained to produce universal adversarial functions that succeed across multiple queries and payload configurations. Experiments on five instructed and reasoning models using the BFCL dataset reach attack success rates from 70 percent to 100 percent. These results demonstrate concrete vulnerabilities in the function calling interface beyond traditional prompt injection methods.

What carries the argument

The function hijacking attack (FHA), a prompt-based manipulation that overrides the model's normal selection of which external function to invoke.

Load-bearing premise

The high success rates and robustness to function sets shown in the experiments will continue to appear when the same attack is applied to real-world deployments with varied and unpredictable function sets.

What would settle it

Testing the attack on a fresh set of models and function-calling tasks that were never seen during development and recording success rates below 50 percent would show the reported performance does not generalize.

Figures

Figures reproduced from arXiv: 2604.20994 by Giulio Zizzo, John D. Kelleher, Sergio Maffeis, Seshu Tirupathi, Yannis Belkhiter.

**Figure 2.** Figure 2: Function name ASR. Takeaway 1.1: FHA successfully hijacks both instructed and reasoning models with high success rate. These results demonstrate the robustness of the attack, given the diversity of the BFCL dataset. Baseline comparison [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: displays the Function Name ASR of our algorithm for Llama-3.2-3B-Instruct for optim_str sizes of 10, 35 and 60 tokens (corresponding to 0.5 to 3 times its original size of 20 tokens). To compare the impact on the algorithm’s efficiency, [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: ASR when varying function positions (A) and number of functions (B). [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Robustness of attack to noise functions The Hijacking index corresponds to the first epoch where the FHA succeed during training and so the model calls the attacked function. We added 1, 2, 3, 5, 10, or 25 out-of-distribution noise functions to the original payload and transferred the attack every 20 epochs. We averaged n = 50 different variations (different noise functions) [PITH_FULL_IMAGE:figures/ful… view at source ↗

**Figure 6.** Figure 6: A: Direct and transferred attacks. B: Synthetic generation strategies. [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗

**Figure 7.** Figure 7: A. Batch of position B. Batch of number Specifically, the Batch of number achieved more than 0.85 Function Name ASR for every perturbation, even with 25 additional functions in the payload. It confirmed our claim: training an attack on sets containing various number of functions increased the robustness to noise functions. Surprisingly, training an attack on same set of functions varying their position als… view at source ↗

**Figure 8.** Figure 8: Function calling syntax of different models. The [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗

**Figure 9.** Figure 9: Illustrations of the GCG attack auto-regressive assumption. For both NLP and FH settings, the [PITH_FULL_IMAGE:figures/full_fig_p020_9.png] view at source ↗

**Figure 10.** Figure 10: Frequency of the number of available function per sample - BFCL (Patil et al., 2023). [PITH_FULL_IMAGE:figures/full_fig_p021_10.png] view at source ↗

**Figure 11.** Figure 11: Cosine Similarity between prompt and function descriptions - BFCL (Patil et al., 2023). [PITH_FULL_IMAGE:figures/full_fig_p021_11.png] view at source ↗

**Figure 12.** Figure 12: Function Injection attack: generation of preferred function given a query or/and the available [PITH_FULL_IMAGE:figures/full_fig_p022_12.png] view at source ↗

**Figure 13.** Figure 13: Reproduction of the attacks (MPMA) [PITH_FULL_IMAGE:figures/full_fig_p023_13.png] view at source ↗

**Figure 14.** Figure 14: Overview of the Function-Calling attacks [PITH_FULL_IMAGE:figures/full_fig_p024_14.png] view at source ↗

**Figure 15.** Figure 15: shows the average proportion of the optim_str in the payloads of the BFCL. This analysis supports our analysis of the experiments from Section 6.1 [PITH_FULL_IMAGE:figures/full_fig_p025_15.png] view at source ↗

**Figure 16.** Figure 16: Example of attack on GitHub MCP Server using [PITH_FULL_IMAGE:figures/full_fig_p026_16.png] view at source ↗

**Figure 17.** Figure 17: Example of attack on Slack MCP Server using [PITH_FULL_IMAGE:figures/full_fig_p026_17.png] view at source ↗

**Figure 18.** Figure 18: Batch of queries using synthetic data generation. For both strategies, we prompted the gpt-4o [PITH_FULL_IMAGE:figures/full_fig_p027_18.png] view at source ↗

**Figure 19.** Figure 19: PCA of BERT embeddings over 10 samples - original, diversity (1), arguments (2), and multi [PITH_FULL_IMAGE:figures/full_fig_p028_19.png] view at source ↗

**Figure 20.** Figure 20: Analysis of the Data Augmentation strategies - [PITH_FULL_IMAGE:figures/full_fig_p028_20.png] view at source ↗

**Figure 21.** Figure 21: Robustness of attack when adding noise functions. [PITH_FULL_IMAGE:figures/full_fig_p029_21.png] view at source ↗

**Figure 22.** Figure 22: Batch attack to increase robustness of attack when adding noise functions. [PITH_FULL_IMAGE:figures/full_fig_p030_22.png] view at source ↗

**Figure 23.** Figure 23: Correlation between number of epochs of FHA and semantic distance of function metadata to [PITH_FULL_IMAGE:figures/full_fig_p031_23.png] view at source ↗

**Figure 24.** Figure 24: Average ASRbatch given the percentage of prompts hijacked per batch [PITH_FULL_IMAGE:figures/full_fig_p032_24.png] view at source ↗

read the original abstract

The growth of agentic AI has drawn significant attention to function calling Large Language Models (LLMs), which are designed to extend the capabilities of AI-powered system by invoking external functions. Injection and jailbreaking attacks have been extensively explored to showcase the vulnerabilities of LLMs to user prompt manipulation. The expanded capabilities of agentic models introduce further vulnerabilities via their function calling interface. Recent work in LLM security showed that function calling can be abused, leading to data tampering and theft, causing disruptive behavior such as endless loops, or causing LLMs to produce harmful content in the style of jailbreaking attacks. This paper introduces a novel function hijacking attack (FHA) that manipulates the tool selection process of agentic models to force the invocation of a specific, attacker-chosen function. While existing attacks focus on semantic preference of the model for function-calling tasks, we show that FHA is largely agnostic to the context semantics and robust to the function sets, making it applicable across diverse domains. We further demonstrate that FHA can be trained to produce universal adversarial functions, enabling a single attacked function to hijack tool selection across multiple queries and payload configurations. We conducted experiments on 5 different models, including instructed and reasoning variants, reaching 70% to 100% ASR over the established BFCL dataset. Our findings further demonstrate the need for strong guardrails and security modules for agentic systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces a novel Function Hijacking Attack (FHA) that manipulates the tool selection process of function-calling agentic LLMs to force invocation of an attacker-chosen function. It claims FHA is largely agnostic to context semantics and robust to varying function sets (unlike prior semantic-preference attacks), supports training of universal adversarial functions effective across queries, and achieves 70-100% attack success rate (ASR) on the BFCL benchmark across five models (instructed and reasoning variants). The work concludes by calling for stronger guardrails in agentic systems.

Significance. If the generality claims hold, the result would be significant for LLM security: it identifies a non-semantic attack vector on tool-use interfaces that could affect deployed agentic systems across domains. The multi-model evaluation on a standard benchmark (BFCL) provides concrete empirical grounding, and the universal-adversarial-function direction is a potentially useful extension. However, the current evidence base is too narrow to establish the claimed robustness properties.

major comments (3)

[Abstract and Experiments] Abstract and Experiments section: The central claim that FHA is 'largely agnostic to the context semantics and robust to the function sets' is load-bearing yet unsupported by the reported results. The 70-100% ASR is measured only on the fixed BFCL dataset; no ablation varies function-set size, description style, or domain while holding the attack construction fixed, leaving open the possibility that success still depends on BFCL-specific semantic cues or distributions.
[Methods] Methods/Attack Construction section: The manuscript provides no concrete details on how the FHA is implemented, how the attack prompt or function is constructed, what baselines or controls are used for comparison, or how confounding factors (model-specific prompting, tool-count effects) are isolated. Without these, the high ASR figures cannot be assessed for reproducibility or validity.
[Experiments] Universal adversarial functions claim: The assertion that a single attacked function can hijack tool selection 'across multiple queries and payload configurations' requires explicit cross-query and cross-payload evaluation; the current BFCL results do not demonstrate this property.

minor comments (2)

[Title and Abstract] The title references 'MCP' without expansion or definition in the abstract; clarify the acronym and its relation to the function-calling setting.
[Abstract] The abstract states results on 'instructed and reasoning variants' but does not name the five models or variants; add this information for clarity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and describe the revisions we will make.

read point-by-point responses

Referee: [Abstract and Experiments] Abstract and Experiments section: The central claim that FHA is 'largely agnostic to the context semantics and robust to the function sets' is load-bearing yet unsupported by the reported results. The 70-100% ASR is measured only on the fixed BFCL dataset; no ablation varies function-set size, description style, or domain while holding the attack construction fixed, leaving open the possibility that success still depends on BFCL-specific semantic cues or distributions.

Authors: We acknowledge that dedicated ablations on function-set size, description style, and domain would strengthen the generality claims. BFCL already spans diverse functions and domains with consistent high ASR, but we will add explicit ablations and clarify the scope of our robustness statements in the revision. revision: partial
Referee: [Methods] Methods/Attack Construction section: The manuscript provides no concrete details on how the FHA is implemented, how the attack prompt or function is constructed, what baselines or controls are used for comparison, or how confounding factors (model-specific prompting, tool-count effects) are isolated. Without these, the high ASR figures cannot be assessed for reproducibility or validity.

Authors: We agree that additional implementation details are required. The revised manuscript will expand the Methods section with precise descriptions of FHA construction, attack prompt and function design, baselines, and controls for confounding factors including tool count and prompting variations. revision: yes
Referee: [Experiments] Universal adversarial functions claim: The assertion that a single attacked function can hijack tool selection 'across multiple queries and payload configurations' requires explicit cross-query and cross-payload evaluation; the current BFCL results do not demonstrate this property.

Authors: The BFCL results cover multiple queries, but we recognize the need for targeted cross-query and cross-payload tests. We will include dedicated experiments evaluating the universal adversarial functions on held-out queries and varied payloads in the revision. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical attack demonstration

full rationale

The paper introduces FHA via description and then reports empirical attack success rates (70-100% ASR) measured on the external BFCL benchmark across five models. No equations, first-principles derivations, fitted parameters renamed as predictions, or self-referential definitions appear. Claims of semantic agnosticism and robustness are presented as experimental outcomes on the fixed benchmark rather than derived quantities. No self-citation chains or ansatzes are invoked to justify core results; the work is self-contained as an empirical security evaluation.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

This is an empirical adversarial attack paper. No free parameters, mathematical axioms, or invented physical entities are invoked; the FHA is a new attack technique whose validity rests on the reported experimental outcomes.

invented entities (1)

Function Hijacking Attack (FHA) no independent evidence
purpose: A technique to manipulate tool selection in function-calling models
Introduced as the central contribution; no independent falsifiable evidence outside the paper's experiments is provided.

pith-pipeline@v0.9.0 · 5571 in / 1096 out tokens · 31489 ms · 2026-05-09T23:59:16.127792+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

30 extracted references · 28 canonical work pages · 12 internal anchors

[1]

URLhttps://arxiv.org/abs/2407.00121. Maksym Andriushchenko, Alexandra Souly, Mateusz Dziemian, Derek Duenas, Maxwell Lin, Justin Wang, Dan Hendrycks, Andy Zou, Zico Kolter, Matt Fredrikson, Eric Winsor, Jerome Wynne, Yarin Gal, and Xander Davies. Agentharm: A benchmark for measuring harmfulness of llm agents,

work page arXiv
[2]

AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents

URLhttps: //arxiv.org/abs/2410.09024. Peter Belcak, Greg Heinrich, Shizhe Diao, Yonggan Fu, Xin Dong, Saurav Muralidharan, Yingyan Celine Lin, and Pavlo Molchanov. Small language models are the future of agentic ai,

work page internal anchor Pith review arXiv
[3]

InFindings of the Associa- tion for Computational Linguistics: ACL 2024, pages 13921–13937, Bangkok, Thailand

URLhttps: //arxiv.org/abs/2506.02153. Edoardo Debenedetti, Jie Zhang, Mislav Balunović, Luca Beurer-Kellner, Marc Fischer, and Florian Tramèr. Agentdojo: A dynamic environment to evaluate prompt injection attacks and defenses for llm agents,

work page arXiv
[4]

AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents

URLhttps://arxiv.org/abs/2406.13352. Taicheng Guo, Xiuying Chen, Yaqi Wang, Ruidi Chang, Shichao Pei, Nitesh V. Chawla, Olaf Wiest, and Xiangliang Zhang. Large language model based multi-agents: A survey of progress and challenges. In Kate Larson (ed.),Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, IJCAI-24, pp....

work page internal anchor Pith review arXiv
[5]

Large language model based multi-agents: A survey of progress and challenges,

doi: 10.24963/ijcai.2024/890. URLhttps://doi.org/10.24963/ijcai.2024/890. Survey Track. 14 Jonathan Hayase, Ema Borevkovic, Nicholas Carlini, Florian Tramèr, and Milad Nasr. Query-based adver- sarial prompt generation,

work page doi:10.24963/ijcai.2024/890 2024
[6]

Query-based adversarial prompt generation

URLhttps://arxiv.org/abs/2402.12329. Feng He, Tianqing Zhu, Dayong Ye, Bo Liu, Wanlei Zhou, and Philip S. Yu. The emerged security and privacy of llm agent: A survey with case studies,

work page arXiv
[7]

The emerged security and privacy of LLM agent: A survey with case studies,

URLhttps://arxiv.org/abs/2407.19354. Xinyi Hou, Yanjie Zhao, Shenao Wang, and Haoyu Wang. Model context protocol (mcp): Landscape, security threats, and future research directions,

work page arXiv
[8]

Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions

URLhttps://arxiv.org/abs/2503.23278. IBM-Research. Granite foundation models (whitepaper),

work page internal anchor Pith review arXiv
[9]

Mistral 7B

URLhttps://arxiv.org/abs/2310.06825. Ishan Kavathekar, Raghav Donakanti, Ponnurangam Kumaraguru, and Karthik Vaidhyanathan. Small models, big tasks: An exploratory empirical study on small language models for function calling,

work page internal anchor Pith review Pith/arXiv arXiv
[10]

Wilhelm Kirch (ed.).Pearson’s Correlation Coefficient, pp

URLhttps://arxiv.org/abs/2504.19277. Wilhelm Kirch (ed.).Pearson’s Correlation Coefficient, pp. 1090–1091. Springer Netherlands, Dordrecht,

work page arXiv
[11]

doi: 10.1007/978-1-4020-5614-7_2569

ISBN 978-1-4020-5614-7. doi: 10.1007/978-1-4020-5614-7_2569. URLhttps://doi.org/10.1007/ 978-1-4020-5614-7_2569. Deepak Kumar, Riccardo Paccagnella, Paul Murley, Eric Hennenfent, Joshua Mason, Adam Bates, and Michael Bailey. Skill squatting attacks on amazon alexa. In27th USENIX Security Symposium (USENIX Security 18), pp. 33–47, Baltimore, MD, August

work page doi:10.1007/978-1-4020-5614-7_2569
[12]

AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models

URLhttps://arxiv.org/abs/2310.04451. Graziano A. Manduzio, Federico A. Galatolo, Mario G. C. A. Cimino, Enzo Pasquale Scilingo, and Lorenzo Cominelli. Improving small-scale large language models function calling for reasoning tasks,

work page internal anchor Pith review arXiv
[13]

OpenAI and al

URL https://arxiv.org/abs/2410.18890. OpenAI and al. Gpt-4 technical report,

work page arXiv
[14]

GPT-4 Technical Report

URLhttps://arxiv.org/abs/2303.08774. Shishir G. Patil, Tianjun Zhang, Xin Wang, and Joseph E. Gonzalez. Gorilla: Large language model connected with massive apis,

work page internal anchor Pith review Pith/arXiv arXiv
[15]

Gorilla: Large Language Model Connected with Massive APIs

URLhttps://arxiv.org/abs/2305.15334. Qwen-Team. Qwen3 technical report,

work page internal anchor Pith review arXiv
[16]

Qwen3 Technical Report

URLhttps://arxiv.org/abs/2505.09388. Ambrish Rawat, Stefan Schoepf, Giulio Zizzo, Giandomenico Cornacchia, Muhammad Zaid Hameed, Kieran Fraser, Erik Miehling, Beat Buesser, Elizabeth M. Daly, Mark Purcell, Prasanna Sattigeri, Pin-Yu Chen, and Kush R. Varshney. Attack atlas: A practitioner’s perspective on challenges and pitfalls in red teaming genai,

work page internal anchor Pith review Pith/arXiv arXiv
[17]

Rawat, S

URLhttps://arxiv.org/abs/2409.15398. Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. Toolformer: Language models can teach themselves to use tools,

work page arXiv
[18]

URLhttps://arxiv.org/abs/2302.04761. 15 Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, and Guillaume Lample. Llama: Open and efficient foundation language models,

work page internal anchor Pith review arXiv
[19]

LLaMA: Open and Efficient Foundation Language Models

URL https://arxiv.org/abs/2302.13971. Anahit Vassilev, Alina Oprea, Alyssa Fordyce, and Hilary Andersen. Adversarial machine learning: A taxonomy and terminology of attacks and mitigations. Nist trustworthy and responsible ai, National Institute of Standards and Technology, Gaithersburg, MD,

work page internal anchor Pith review Pith/arXiv arXiv
[20]

A survey on large language model based autonomous agents,

URLhttps://nvlpubs.nist.gov/ nistpubs/ai/NIST.AI.100-2e2025.pdf. Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, Wayne Xin Zhao, Zhewei Wei, and Jirong Wen. A survey on large language model based autonomous agents.Frontiers of Computer Science, 18(6), March 2024a. ISSN 2095-2236. doi: 1...

work page doi:10.1007/s11704-024-40231-1 2095
[21]

MPMA: Preference Manipulation Attack against Model Context Protocol,

URLhttps://arxiv. org/abs/2505.11154. Zijun Wang, Haoqin Tu, Jieru Mei, Bingchen Zhao, Yisen Wang, and Cihang Xie. Attngcg: Enhancing jailbreaking attacks on llms with attention manipulation, 2024b. URLhttps://arxiv.org/abs/2410. 09040. Mengsong Wu, Tong Zhu, Han Han, Xiang Zhang, Wenbiao Shao, and Wenliang Chen. Chain-of-tools: Utilizing massive unseen t...

work page arXiv
[22]

Chain-of-Tools: Utilizing massive unseen tools in chain-of-thought reasoning

URLhttps:// arxiv.org/abs/2503.16779. Zihui Wu, Haichang Gao, Jianping He, and Ping Wang. The dark side of function calling: Pathways to jailbreaking large language models,

work page arXiv
[23]

The dark side of function calling: Pathways to jailbreaking large language models

URLhttps://arxiv.org/abs/2407.17915. Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models,

work page arXiv
[24]

ReAct: Synergizing Reasoning and Acting in Language Models

URLhttps://arxiv.org/abs/2210.03629. Sibo Yi, Yule Liu, Zhen Sun, Tianshuo Cong, Xinlei He, Jiaxing Song, Ke Xu, and Qi Li. Jailbreak attacks and defenses against large language models: A survey,

work page internal anchor Pith review Pith/arXiv arXiv
[25]

Jailbreak Attacks and Defenses Against Large Language Models: A Survey

URLhttps://arxiv.org/abs/2407.04295. Jiahao Yu, Haozheng Luo, Jerry Yao-Chieh Hu, Wenbo Guo, Han Liu, and Xinyu Xing. Mind the in- conspicuous: Revealing the hidden weakness in aligned llms’ refusal boundaries,

work page internal anchor Pith review arXiv
[26]

Qiusi Zhan, Zhixiang Liang, Zifan Ying, and Daniel Kang

URLhttps: //arxiv.org/abs/2405.20653. Qiusi Zhan, Zhixiang Liang, Zifan Ying, and Daniel Kang. Injecagent: Benchmarking indirect prompt injections in tool-integrated large language model agents,

work page arXiv
[27]

Breaking agents: Compromising autonomous LLM agents through malfunction amplification

URL https://arxiv.org/abs/2407.20859. Andy Zou, Zifan Wang, Nicholas Carlini, Milad Nasr, J. Zico Kolter, and Matt Fredrikson. Universal and transferable adversarial attacks on aligned language models,

work page arXiv
[28]

name": “target_function_name

URLhttps://arxiv.org/abs/2406.04313. 16 Appendix Table of Contents A Function Calling Syntax of Different Models 18 B GCG Algorithm Adapted to the Function-Calling Task 18 B.1 Simple FHA algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 B.2 Universal FHA algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . ....

work page arXiv 2023
[29]

Figures (1), (2) and (3) detail the prompts used for the reformulation (1) and argument-variation (2), and multi-intent variation (3) strategies, respectively

26 G Synthetic Data Generation G.1 Prompts Figure 18 shows the prompts used to perform synthetic data generation usingGPT-4o-mini(OpenAI & al., 2024). Figures (1), (2) and (3) detail the prompts used for the reformulation (1) and argument-variation (2), and multi-intent variation (3) strategies, respectively. Synthetic data generation (1). Forumlation Div...

2024
[30]

Specifically, we observe a peak on lower values for strategy (3), representing the queries containing different intents from the original prompt

and more spread toward lower values. Specifically, we observe a peak on lower values for strategy (3), representing the queries containing different intents from the original prompt. (a) Distance to prompt (b) Intra-batches distance Figure 20: Analysis of the Data Augmentation strategies -FunSecBenchdataset. 28 H Robustness to Payload Perturbation In this...

2024