"Someone Hid It": Query-Agnostic Black-Box Attacks on LLM-Based Retrieval

Chenxiao Yu; Defu Cao; Jiate Li; Li Li; Ryan A. Rossi; Tiannuo Yang; Wei Yang; Xiyang Hu; Yan Liu; Yuehan Qin

arxiv: 2602.00364 · v4 · pith:SGUMAJX2new · submitted 2026-01-30 · 💻 cs.CR

"Someone Hid It": Query-Agnostic Black-Box Attacks on LLM-Based Retrieval

Jiate Li , Defu Cao , Li Li , Wei Yang , Yuehan Qin , Chenxiao Yu , Tiannuo Yang , Ryan A. Rossi

show 3 more authors

Yan Liu Xiyang Hu Yue Zhao

This is my paper

Pith reviewed 2026-05-21 14:13 UTC · model grok-4.3

classification 💻 cs.CR

keywords black-box attacksLLM retrievaladversarial injectionsquery-agnostictransferable attacksRAG securityinformation retrieval

0 comments

The pith

Surrogate models let attackers craft query-free injections that shift LLM retrieval rankings.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that LLM-based retrieval can be manipulated by injecting special tokens into documents even when the attacker knows nothing about the user's query and has no access to the target model's parameters or outputs. It does this by first laying out a theoretical account of how these retrieval systems rank documents, then turning the problem of finding effective injections into a min-max optimization that is solved on surrogate models using simulated queries. A reader should care because real-world systems such as RAG pipelines are exposed if the same tokens transfer across models and queries; the work also notes that ordinary document changes might produce comparable effects. The method is tested on standard retrieval benchmarks and several popular LLM retrievers.

Core claim

We establish a theoretical framework for LLM-based retrieval and use it to formulate transferable adversarial injection as a min-max problem. We solve the problem with an adversarial learning procedure that optimizes injection tokens on zero-shot surrogate models while treating queries as learnable variables. The resulting tokens alter document rankings on benchmark datasets across multiple LLM retrievers without any knowledge of the victim query or model.

What carries the argument

Min-max simulation of transferable attack solved by adversarial learning over surrogate models and learnable query samples.

If this is right

The attack succeeds without any query being supplied to the attacker.
The same tokens affect retrieval performance across different LLM retrievers.
Ordinary document edits could produce similar unintended ranking shifts.
Defenses for retrieval systems must address query-independent threats.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Document pipelines may need automated checks for token patterns that mimic these injections.
Robustness benchmarks for retrievers should include query-agnostic test cases.
Natural wording variations in real documents could be tested to see if they trigger comparable retrieval biases.

Load-bearing premise

Injection tokens found on surrogate models will transfer to unknown victim retrievers even when the attacker has no query information at all.

What would settle it

Apply the generated tokens to documents and measure whether retrieval rank changes occur on held-out LLM retrievers when the attack procedure is given no query whatsoever.

Figures

Figures reproduced from arXiv: 2602.00364 by Chenxiao Yu, Defu Cao, Jiate Li, Li Li, Ryan A. Rossi, Tiannuo Yang, Wei Yang, Xiyang Hu, Yan Liu, Yuehan Qin, Yue Zhao.

**Figure 1.** Figure 1: In many practical scenarios, attackers may hope to hide web documents from retrieval systems. These websites usually allow normal public users to edit in format of content contribution or discussion replies. 1. Introduction Retrieval system, which aims to efficiently seek most relevant documents for given user queries, not only occupies great importance in applications like search engines and recommendati… view at source ↗

**Figure 2.** Figure 2: Illustration of LLM-based Retrieval. Documents in the corpus are firstly embedded in the last-hidden embeddings and stored. When a user query comes and get embedded, it matches relevant documents in embedding similarity in high efficiency. 1. We first investigate the vulnerability of LLMR in the face of query-agnostic black-box settings, where the attacker has no access to the victim model, the document c… view at source ↗

**Figure 3.** Figure 3: We sample 40 knowledge contexts on each of four roughly-defined topics and visualize their LLMR embeddings by Principal Component Analysis (PCA) reduction. Embeddings of contexts within the same topic (R:science, G:politic, Y:movie, B:architecture) tend to cluster together in the embedding space. 2. The LLM retriever model f is a complete black-box, which means the attacker has no knowledge of f including… view at source ↗

**Figure 4.** Figure 4: The DQ-A learning pipeline of our attack method. Query samples are first generated by a third party casual LLM. Then in every learning step, injected document tokens are first optimized away from queries, and all queries tokens are optimized towards the document. Both surrogate and Casual LLMs require no learning. pϵ. To verify the rationality of this statement, we conduct an experiment: we utilize a Casua… view at source ↗

**Figure 5.** Figure 5: Impact of Different |S| metrics, especially around a 2%-10% drop in Recall@25 and a 1%-7% drop in Recall@50. In dataset Robotics, our attack can achieve even an 8% performance drop for Qwen1.5 and 6% for Jinaai in Recall@50, which reduces these best performing retrievers’ around 30% and 20% drop in fraction of their original performance. Only on Qwen3-Emb-0.6B, both our attack and other baselines find it … view at source ↗

**Figure 6.** Figure 6: Impact of Injected Token Length increases from 10 to 15, some of attacks become less effective. From this phenomenon we infer that there is also a limit in |S|’s population effect. When |S| is larger than the inherent optimal ηX ηd ’s require, attacks become less optimal. Impact of Injected Token Amount We also study the impact of the injected token amount (constrained by the δ in Eq.1) on our attack and … view at source ↗

read the original abstract

Large language models (LLMs) have been serving as effective backbones for retrieval systems, including Retrieval-Augmentation-Generation (RAG), Dense Information Retriever (IR), and Agent Memory Retrieval. Recent studies have demonstrated that such LLM-based Retrieval (LLMR) is vulnerable to adversarial attacks, which manipulates documents by token-level injections and enables adversaries to either boost or diminish these documents in retrieval tasks. However, existing attack studies mainly (1) presume a known query is given to the attacker, and (2) highly rely on access to the victim model's parameters or interactions, which are hardly accessible in real-world scenarios, leading to limited validity. To further explore the secure risks of LLMR, we propose a practical black-box attack method that generates transferable injection tokens based on zero-shot surrogate LLMs without need of victim queries or victim models knowledge. The effectiveness of our attack raises such a robustness issue that similar effects may arise from benign or unintended document edits in the real world. To achieve our attack, we first establish a theoretical framework of LLMR and empirically verify it. Under the framework, we simulate the transferable attack as a min-max problem, and propose an adversarial learning mechanism that finds optimal adversarial tokens with learnable query samples. Our attack is validated to be effective on benchmark datasets across popular LLM retrievers.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes a query-agnostic black-box attack on LLM-based retrieval (LLMR) systems for RAG, dense IR, and agent memory. It first establishes and empirically verifies a theoretical framework modeling LLMR via inner-product similarity in embeddings. The attack is then formulated as a min-max optimization over surrogate LLM parameters and learnable query samples to discover transferable token injections, without requiring victim queries or model access. Effectiveness is validated on benchmark datasets across popular LLM retrievers, with the claim that similar effects could arise from benign document edits.

Significance. If the transferability results hold, the work is significant for highlighting practical robustness risks in deployed LLM retrieval pipelines. The theoretical framework plus the surrogate-based min-max simulation provides a principled way to study query-agnostic attacks, and the empirical validation on multiple retrievers strengthens the case that black-box threats are realistic. Credit is due for the zero-shot surrogate approach and the explicit framing of the attack as a simulation tool rather than a fitted model.

major comments (2)

[§3] §3: The LLMR framework reduces retrieval to inner-product similarity, which is then used to justify the min-max attack formulation. However, the transferability premise—that surrogate-optimized tokens and learned queries will align with an unseen victim retriever’s embedding geometry without any query adaptation—is stated but not bounded or tested for distribution shift; this assumption is load-bearing for the central query-agnostic claim.
[§5] §5 (empirical validation): The reported effectiveness on benchmarks lacks explicit controls for embedding-space misalignment between surrogate and victim models, effect-size reporting, or ablation studies isolating the contribution of the learnable query samples versus fixed queries. Without these, it is difficult to confirm that the attack generalizes query-agnostically rather than succeeding only under favorable alignment conditions.

minor comments (2)

[Abstract] Abstract: The sentence 'which manipulates documents by token-level injections and enables adversaries to either boost or diminish these documents' contains a subject-verb agreement issue ('manipulates' should be 'manipulate' or rephrased).
Notation: The distinction between surrogate parameters and victim parameters in the min-max objective could be clarified with an explicit variable table or consistent subscripting.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below, providing our perspective on the current manuscript and indicating planned revisions where appropriate.

read point-by-point responses

Referee: [§3] §3: The LLMR framework reduces retrieval to inner-product similarity, which is then used to justify the min-max attack formulation. However, the transferability premise—that surrogate-optimized tokens and learned queries will align with an unseen victim retriever’s embedding geometry without any query adaptation—is stated but not bounded or tested for distribution shift; this assumption is load-bearing for the central query-agnostic claim.

Authors: Our theoretical framework explicitly models LLMR retrieval as inner-product similarity in embedding space and we empirically verify this modeling choice on the surrogate models used. The min-max formulation with learnable queries is designed to optimize for tokens that remain effective across query variations, which underpins the query-agnostic transfer claim. We demonstrate this empirically by transferring the resulting injection tokens to multiple victim retrievers with distinct embedding models, without any query-specific adaptation. We agree, however, that the manuscript would benefit from a more explicit treatment of distribution shift. In revision we will add a dedicated discussion subsection that derives the expected transfer conditions from the inner-product assumption and include new experiments that quantify embedding misalignment (via average cosine similarity on shared documents) between surrogate and victim models. revision: yes
Referee: [§5] §5 (empirical validation): The reported effectiveness on benchmarks lacks explicit controls for embedding-space misalignment between surrogate and victim models, effect-size reporting, or ablation studies isolating the contribution of the learnable query samples versus fixed queries. Without these, it is difficult to confirm that the attack generalizes query-agnostically rather than succeeding only under favorable alignment conditions.

Authors: We acknowledge that the current empirical section would be strengthened by additional controls and ablations. In the revised manuscript we will: (i) report quantitative measures of embedding-space misalignment between each surrogate and victim model on the evaluation datasets, (ii) include effect-size statistics (mean rank change with standard deviation and confidence intervals) alongside the existing success-rate tables, and (iii) add ablation experiments that compare the full adversarial-learning procedure against variants that use fixed or randomly sampled queries. These changes will more clearly isolate the contribution of the learnable-query component and support the query-agnostic generalization argument. revision: yes

Circularity Check

0 steps flagged

No significant circularity; theoretical framework and min-max simulation are independent of fitted outputs.

full rationale

The paper first establishes a theoretical framework for LLM-based retrieval based on embedding inner-product similarity and empirically verifies it on data. It then formulates the attack as a min-max optimization over surrogate parameters and learnable query samples to generate transferable tokens. This structure does not reduce any claimed prediction or result to its own inputs by construction, nor does it rely on load-bearing self-citations or imported uniqueness theorems. The transferability is presented as an empirical outcome validated on benchmarks rather than a definitional equivalence or fitted renaming. The derivation chain remains self-contained against external benchmarks with no quoted reduction of outputs to inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides insufficient detail to enumerate specific free parameters, axioms, or invented entities; the theoretical framework and min-max simulation are referenced but not expanded.

pith-pipeline@v0.9.0 · 5801 in / 992 out tokens · 43580 ms · 2026-05-21T14:13:19.376072+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We first establish a theoretical framework of LLMR and empirically verify it. Under the framework, we simulate the transferable attack as a min-max problem...
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Definition 3.1 (ε-pϵ-Precise Retriever)... sim(f(Xi), f(Xj)) ≥ sim(f(Xk), f(X′)) + ϵ

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages · 1 internal anchor

[1]

A sur- vey of reasoning and agentic systems in time series with large language models.arXiv preprint arXiv:2509.11575,

Chang, C., Shi, Y ., Cao, D., Yang, W., Hwang, J., Wang, H., Pang, J., Wang, W., Liu, Y ., Peng, W.-C., et al. A sur- vey of reasoning and agentic systems in time series with large language models.arXiv preprint arXiv:2509.11575,

work page arXiv
[2]

Yann Dubois, Balázs Galambosi, Percy Liang, and Tat- sunori B Hashimoto

Dong, S., Xu, S., He, P., Li, Y ., Tang, J., Liu, T., Liu, H., and Xiang, Z. A practical memory injection attack against llm agents.arXiv preprint arXiv:2503.03704,

work page arXiv
[3]

Attacking large language models with projected gradient descent

Geisler, S., Wollschläger, T., Abdalla, M., Gasteiger, J., and Günnemann, S. Attacking large language models with projected gradient descent. InICML 2024 Next Generation of AI Safety Workshop. Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y . Generative adversarial nets.Advances in neural ...

work page 2024
[4]

Hong, G., Kim, J., Kang, J., Myaeng, S.-H., and Whang, J. J. Why so gullible? enhancing the robustness of retrieval- augmented models against counterfactual noise. InFind- ings of the Association for Computational Linguistics: NAACL 2024, pp. 2474–2495,

work page 2024
[5]

arXiv preprint arXiv:2404.07981 , year=

Hongjin, S., Yen, H., Xia, M., Shi, W., Muennighoff, N., Wang, H.-y., Haisu, L., Shi, Q., Siegel, Z. S., Tang, M., et al. Bright: A realistic and challenging benchmark for reasoning-intensive retrieval. InThe Thirteenth Interna- tional Conference on Learning Representations. Jia, X., Pang, T., Du, C., Huang, Y ., Gu, J., Liu, Y ., Cao, X., and Lin, M. Imp...

work page arXiv
[6]

Graphrag under fire.arXiv preprint arXiv:2501.14050,

Liang, J., Wang, Y ., Li, C., Zhu, R., Jiang, T., Gong, N., and Wang, T. Graphrag under fire.arXiv preprint arXiv:2501.14050,

work page arXiv
[7]

doi: 10.1109/SP54263.2024. 00049. Liu, Y .-A., Zhang, R., Guo, J., de Rijke, M., Chen, W., Fan, Y ., and Cheng, X. Black-box adversarial attacks against dense retrieval models: A multi-view contrastive learning method. InProceedings of the 32nd ACM International Conference on Information and Knowledge Management, pp. 1647–1656, 2023a. Liu, Y .-A., Zhang, ...

work page doi:10.1109/sp54263.2024 2024
[8]

ISBN 9798400704314

Association for Computing Machinery. ISBN 9798400704314. doi: 10.1145/3626772.3657704. Long, Q., Deng, Y ., Gan, L., Wang, W., and Pan, S. J. Whispers in grammars: Injecting covert backdoors to compromise dense retrieval systems.arXiv preprint arXiv:2402.13532,

work page doi:10.1145/3626772.3657704
[9]

J., and Huang, F

Pathmanathan, P., Panaitescu-Liess, M.-A., Chiang, C.-Y . J., and Huang, F. Ragpart & ragmask: Retrieval-stage de- fenses against corpus poisoning in retrieval-augmented generation.arXiv preprint arXiv:2512.24268,

work page arXiv
[10]

Ranking manipulation for conversational search engines.arXiv preprint arXiv:2406.03589, 2024

Pfrommer, S., Bai, Y ., Gautam, T., and Sojoudi, S. Ranking manipulation for conversational search engines.arXiv preprint arXiv:2406.03589,

work page arXiv
[11]

Ignore this title and hackaprompt: Ex- posing systemic vulnerabilities of llms through a global prompt hacking competition

Schulhoff, S., Pinto, J., Khan, A., Bouchard, L.-F., Si, C., Anati, S., Tagliabue, V ., Kost, A., Carnahan, C., and Boyd-Graber, J. Ignore this title and hackaprompt: Ex- posing systemic vulnerabilities of llms through a global prompt hacking competition. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp. 4945–4977,

work page 2023
[12]

do anything now

Shen, X., Chen, Z., Backes, M., Shen, Y ., and Zhang, Y . " do anything now": Characterizing and evaluating in- the-wild jailbreak prompts on large language models. In Proceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security, pp. 1671–1685,

work page 2024
[13]

Stealthrank: Llm ranking manipulation via stealthy prompt optimization.arXiv preprint arXiv:2504.05804,

Tang, Y ., Fan, Y ., Yu, C., Yang, T., Zhao, Y ., and Hu, X. Stealthrank: Llm ranking manipulation via stealthy prompt optimization.arXiv preprint arXiv:2504.05804,

work page arXiv
[14]

Bert rankers are brit- tle: a study using adversarial document perturbations

Wang, Y ., Lyu, L., and Anand, A. Bert rankers are brit- tle: a study using adversarial document perturbations. In Proceedings of the 2022 ACM SIGIR International Con- ference on Theory of Information Retrieval, pp. 115–120,

work page 2022
[15]

Certifiably robust rag against retrieval corruption

Xiang, C., Wu, T., Zhong, Z., Wagner, D., Chen, D., and Mittal, P. Certifiably robust rag against retrieval corrup- tion.arXiv preprint arXiv:2405.15556,

work page arXiv
[16]

Maestro: Learning to collaborate via conditional listwise policy optimization for multi-agent llms.arXiv preprint arXiv:2511.06134, 2025a

Yang, W., Pang, J., Li, S., Bogdan, P., Tu, S., and Thoma- son, J. Maestro: Learning to collaborate via conditional listwise policy optimization for multi-agent llms.arXiv preprint arXiv:2511.06134, 2025a. Yang, W., Weng, M., Pang, J., Cao, D., Ping, H., Zhang, P., Li, S., Zhao, Y ., Yang, Q., Wang, M., et al. Toward evolutionary intelligence: Llm-based a...

work page arXiv
[17]

Someone Hid It!

doi: 10.1145/3637870. Zhong, Z., Huang, Z., Wettig, A., and Chen, D. Poisoning retrieval corpora by injecting adversarial passages. In 10 “Someone Hid It!”: Query-Agnostic Black-Box Attacks on LLM-Based Retrieval The 2023 Conference on Empirical Methods in Natural Language Processing, 2023a. Zhong, Z., Huang, Z., Wettig, A., and Chen, D. Poisoning retriev...

work page doi:10.1145/3637870 2023
[18]

Universal and Transferable Adversarial Attacks on Aligned Language Models

Zou, A., Wang, Z., Carlini, N., Nasr, M., Kolter, J. Z., and Fredrikson, M. Universal and transferable adversar- ial attacks on aligned language models.arXiv preprint arXiv:2307.15043,

work page internal anchor Pith review Pith/arXiv arXiv

[1] [1]

A sur- vey of reasoning and agentic systems in time series with large language models.arXiv preprint arXiv:2509.11575,

Chang, C., Shi, Y ., Cao, D., Yang, W., Hwang, J., Wang, H., Pang, J., Wang, W., Liu, Y ., Peng, W.-C., et al. A sur- vey of reasoning and agentic systems in time series with large language models.arXiv preprint arXiv:2509.11575,

work page arXiv

[2] [2]

Yann Dubois, Balázs Galambosi, Percy Liang, and Tat- sunori B Hashimoto

Dong, S., Xu, S., He, P., Li, Y ., Tang, J., Liu, T., Liu, H., and Xiang, Z. A practical memory injection attack against llm agents.arXiv preprint arXiv:2503.03704,

work page arXiv

[3] [3]

Attacking large language models with projected gradient descent

Geisler, S., Wollschläger, T., Abdalla, M., Gasteiger, J., and Günnemann, S. Attacking large language models with projected gradient descent. InICML 2024 Next Generation of AI Safety Workshop. Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y . Generative adversarial nets.Advances in neural ...

work page 2024

[4] [4]

Hong, G., Kim, J., Kang, J., Myaeng, S.-H., and Whang, J. J. Why so gullible? enhancing the robustness of retrieval- augmented models against counterfactual noise. InFind- ings of the Association for Computational Linguistics: NAACL 2024, pp. 2474–2495,

work page 2024

[5] [5]

arXiv preprint arXiv:2404.07981 , year=

Hongjin, S., Yen, H., Xia, M., Shi, W., Muennighoff, N., Wang, H.-y., Haisu, L., Shi, Q., Siegel, Z. S., Tang, M., et al. Bright: A realistic and challenging benchmark for reasoning-intensive retrieval. InThe Thirteenth Interna- tional Conference on Learning Representations. Jia, X., Pang, T., Du, C., Huang, Y ., Gu, J., Liu, Y ., Cao, X., and Lin, M. Imp...

work page arXiv

[6] [6]

Graphrag under fire.arXiv preprint arXiv:2501.14050,

Liang, J., Wang, Y ., Li, C., Zhu, R., Jiang, T., Gong, N., and Wang, T. Graphrag under fire.arXiv preprint arXiv:2501.14050,

work page arXiv

[7] [7]

doi: 10.1109/SP54263.2024. 00049. Liu, Y .-A., Zhang, R., Guo, J., de Rijke, M., Chen, W., Fan, Y ., and Cheng, X. Black-box adversarial attacks against dense retrieval models: A multi-view contrastive learning method. InProceedings of the 32nd ACM International Conference on Information and Knowledge Management, pp. 1647–1656, 2023a. Liu, Y .-A., Zhang, ...

work page doi:10.1109/sp54263.2024 2024

[8] [8]

ISBN 9798400704314

Association for Computing Machinery. ISBN 9798400704314. doi: 10.1145/3626772.3657704. Long, Q., Deng, Y ., Gan, L., Wang, W., and Pan, S. J. Whispers in grammars: Injecting covert backdoors to compromise dense retrieval systems.arXiv preprint arXiv:2402.13532,

work page doi:10.1145/3626772.3657704

[9] [9]

J., and Huang, F

Pathmanathan, P., Panaitescu-Liess, M.-A., Chiang, C.-Y . J., and Huang, F. Ragpart & ragmask: Retrieval-stage de- fenses against corpus poisoning in retrieval-augmented generation.arXiv preprint arXiv:2512.24268,

work page arXiv

[10] [10]

Ranking manipulation for conversational search engines.arXiv preprint arXiv:2406.03589, 2024

Pfrommer, S., Bai, Y ., Gautam, T., and Sojoudi, S. Ranking manipulation for conversational search engines.arXiv preprint arXiv:2406.03589,

work page arXiv

[11] [11]

Ignore this title and hackaprompt: Ex- posing systemic vulnerabilities of llms through a global prompt hacking competition

Schulhoff, S., Pinto, J., Khan, A., Bouchard, L.-F., Si, C., Anati, S., Tagliabue, V ., Kost, A., Carnahan, C., and Boyd-Graber, J. Ignore this title and hackaprompt: Ex- posing systemic vulnerabilities of llms through a global prompt hacking competition. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp. 4945–4977,

work page 2023

[12] [12]

do anything now

Shen, X., Chen, Z., Backes, M., Shen, Y ., and Zhang, Y . " do anything now": Characterizing and evaluating in- the-wild jailbreak prompts on large language models. In Proceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security, pp. 1671–1685,

work page 2024

[13] [13]

Stealthrank: Llm ranking manipulation via stealthy prompt optimization.arXiv preprint arXiv:2504.05804,

Tang, Y ., Fan, Y ., Yu, C., Yang, T., Zhao, Y ., and Hu, X. Stealthrank: Llm ranking manipulation via stealthy prompt optimization.arXiv preprint arXiv:2504.05804,

work page arXiv

[14] [14]

Bert rankers are brit- tle: a study using adversarial document perturbations

Wang, Y ., Lyu, L., and Anand, A. Bert rankers are brit- tle: a study using adversarial document perturbations. In Proceedings of the 2022 ACM SIGIR International Con- ference on Theory of Information Retrieval, pp. 115–120,

work page 2022

[15] [15]

Certifiably robust rag against retrieval corruption

Xiang, C., Wu, T., Zhong, Z., Wagner, D., Chen, D., and Mittal, P. Certifiably robust rag against retrieval corrup- tion.arXiv preprint arXiv:2405.15556,

work page arXiv

[16] [16]

Maestro: Learning to collaborate via conditional listwise policy optimization for multi-agent llms.arXiv preprint arXiv:2511.06134, 2025a

Yang, W., Pang, J., Li, S., Bogdan, P., Tu, S., and Thoma- son, J. Maestro: Learning to collaborate via conditional listwise policy optimization for multi-agent llms.arXiv preprint arXiv:2511.06134, 2025a. Yang, W., Weng, M., Pang, J., Cao, D., Ping, H., Zhang, P., Li, S., Zhao, Y ., Yang, Q., Wang, M., et al. Toward evolutionary intelligence: Llm-based a...

work page arXiv

[17] [17]

Someone Hid It!

doi: 10.1145/3637870. Zhong, Z., Huang, Z., Wettig, A., and Chen, D. Poisoning retrieval corpora by injecting adversarial passages. In 10 “Someone Hid It!”: Query-Agnostic Black-Box Attacks on LLM-Based Retrieval The 2023 Conference on Empirical Methods in Natural Language Processing, 2023a. Zhong, Z., Huang, Z., Wettig, A., and Chen, D. Poisoning retriev...

work page doi:10.1145/3637870 2023

[18] [18]

Universal and Transferable Adversarial Attacks on Aligned Language Models

Zou, A., Wang, Z., Carlini, N., Nasr, M., Kolter, J. Z., and Fredrikson, M. Universal and transferable adversar- ial attacks on aligned language models.arXiv preprint arXiv:2307.15043,

work page internal anchor Pith review Pith/arXiv arXiv