Adversarial Hubness in Multi-Modal Retrieval

Collin Zhang; Fnu Suya; Rishi Jha; Tingwei Zhang; Vitaly Shmatikov

arxiv: 2412.14113 · v4 · pith:HJOJJ35Knew · submitted 2024-12-18 · 💻 cs.CR · cs.IR

Adversarial Hubness in Multi-Modal Retrieval

Tingwei Zhang , Fnu Suya , Rishi Jha , Collin Zhang , Vitaly Shmatikov This is my paper

Pith reviewed 2026-05-23 06:36 UTC · model grok-4.3

classification 💻 cs.CR cs.IR

keywords adversarial hubnessmulti-modal retrievalhubness phenomenonadversarial attacksvector databasestext-to-image retrievalimage retrieval

0 comments

The pith

Attackers can turn any image into an adversarial hub retrieved as top-1 for over 21,000 out of 25,000 queries in multi-modal systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that the known hubness problem in high-dimensional spaces, where certain points end up close to many others, can be deliberately amplified by an attacker. An optimization process creates a single input that becomes unusually close to a wide range of query embeddings, causing it to be returned as the most relevant result for thousands of unrelated queries. This enables both broad injection of unwanted content and targeted attacks on chosen concepts. Experiments on standard caption-to-image benchmarks and a commercial vector database confirm the effect, with one crafted hub dominating far more queries than any natural hub. Standard techniques that reduce natural hubness provide no protection against these crafted versions.

Core claim

A method exists for generating adversarial hubs from arbitrary images or audio that are retrieved as the top match for a large fraction of queries in multi-modal retrieval, with one such hub serving as top-1 for more than 21,000 of 25,000 test queries compared to only 102 for the strongest natural hub.

What carries the argument

An optimization procedure that crafts an input to increase its proximity to many random or targeted query embeddings in the shared multi-modal space.

If this is right

Universal spam or malicious content can be injected so that it appears in response to thousands of unrelated user queries.
Targeted attacks become feasible by steering hubs toward queries related to attacker-chosen concepts.
Existing methods for reducing natural hubness do not prevent adversarial hubs from dominating retrieval results.
The attack applies to both benchmark datasets and production vector databases such as Pinecone.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Retrieval systems may need new defenses that detect or penalize artificially created hubs rather than relying on natural-distribution assumptions.
Similar hub-creation attacks could extend to other high-dimensional retrieval tasks beyond multi-modal embeddings.
Service providers using vector search should test whether their current models already contain or can be forced to contain such hubs.

Load-bearing premise

The optimization used to create adversarial hubs works across different embedding models and retrieval systems without needing white-box access to the target.

What would settle it

Running the same hub-generation procedure on a multi-modal embedding model different from those tested and measuring whether the resulting hub still ranks in the top position for thousands of queries.

Figures

Figures reproduced from arXiv: 2412.14113 by Collin Zhang, Fnu Suya, Rishi Jha, Tingwei Zhang, Vitaly Shmatikov.

**Figure 1.** Figure 1: A cross-modal adversarial hub. allowing queries to be effectively matched to items regardless of modality, based only on embedding similarity [59]. Embedding-based retrieval is more scalable and accurate than traditional techniques based on metadata or keywords [4, 35]. High-dimensional embedding spaces are prone to hubness, a well-known manifestation of the curse of dimensionality in information retrieva… view at source ↗

**Figure 2.** Figure 2: An adversarial hub in text-to-image retrieval. Query Result [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: An adversarial hub in image-to-image retrieval. Our Contributions. We show that hubness, a natural phenomenon in high-dimensional spaces, can be adversarially exploited. By applying a small perturbation, an attacker can transform any image or audio input with adversary-chosen semantics (e.g., an advertisement, product promotion, song, etc.) into a hub. We demonstrate that a single adversarial hub affects … view at source ↗

**Figure 5.** Figure 5: shows our threat model. The attacker aims to inject a malicious input ga into the gallery that functions as an adverAdd perturbation Upload Vector DB Query 1 Encode Search Embedding Adv hub Encode Retrieve Top k documents Query 2 … Output 2 Output 1 … Gallery [PITH_FULL_IMAGE:figures/full_fig_p003_5.png] view at source ↗

**Figure 6.** Figure 6: Natural hubs and adversarial hubs. battling malicious actors who attempt to promote their content and manipulate platforms into streaming it for many users. In online retail, hubness attacks could manipulate product search. In social media, they could propagate misinformation and amplify misleading content. These examples show that adversaries have an economic incentive to exploit hubness. 3.2 Attacker Kno… view at source ↗

**Figure 7.** Figure 7: Retrieval frequency of adversarial and natural hubs. Adversarial hubs (red dots) are retrieved significantly more frequently than natural hubs. Plot width indicates retrieval frequency (log scale), with results averaged over 100 trials. query set Qt . We evaluate attack effectiveness using (1) retrieval performance on the intended concept-specific queries (higher is better), and (2) collateral damage, def… view at source ↗

**Figure 8.** Figure 8: Performance vs. sample size. Attack success rate (ASR) and cosine similarity between adversarial hub and query embeddings improve with larger target sample sizes, converging at around 100 samples. embedding distribution of a much larger set. The diminishing returns beyond this saturation point are likely due to the compactness of the query embedding space, allowing 100 samples to provide a strong approxim… view at source ↗

read the original abstract

Hubness is a phenomenon in high-dimensional vector spaces where a point from the natural distribution is unusually close to many other points. This is a well-known problem in information retrieval that causes some items to accidentally (and incorrectly) appear relevant to many queries. In this paper, we investigate how attackers can exploit hubness to turn any image or audio input in a multi-modal retrieval system into an adversarial hub. Adversarial hubs can be used to inject universal adversarial content (e.g., spam) that will be retrieved in response to thousands of different queries, and also for targeted attacks on queries related to specific, attacker-chosen concepts. We present a method for creating adversarial hubs and evaluate the resulting hubs on benchmark multi-modal retrieval datasets and an image-to-image retrieval system implemented by Pinecone, a popular vector database. For example, in text-caption-to-image retrieval, a single adversarial hub, generated using 100 random queries, is retrieved as the top-1 most relevant image for more than 21,000 out of 25,000 test queries (by contrast, the most common natural hub is the top-1 response to only 102 queries), demonstrating the strong generalization capabilities of adversarial hubs. We also investigate whether techniques for mitigating natural hubness can also mitigate adversarial hubs, and show that they are not effective against hubs that target queries related to specific concepts.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper claims that attackers can exploit hubness in multi-modal embedding spaces to create 'adversarial hubs'—optimized inputs that become universal top-1 retrieval results for thousands of queries. Using an optimization procedure on 100 random queries, a single adversarial hub is reported as top-1 for >21,000 of 25,000 held-out test queries in text-to-image retrieval (vs. 102 for the strongest natural hub). The work evaluates this on benchmark datasets and a Pinecone vector database, shows targeted attacks on concept-related queries, and finds that standard natural-hubness mitigations fail against adversarial hubs.

Significance. If the central empirical result holds under the stated threat model, the work identifies a new, high-impact attack surface in multi-modal retrieval: a single crafted item can dominate retrieval for the vast majority of queries, enabling scalable spam injection or concept-targeted poisoning. The reported generalization (100 optimization queries → 21k+ test queries) is quantitatively striking and would constitute a qualitatively stronger universal attack than typical per-query adversarial examples. Credit is due for the concrete Pinecone evaluation and the comparison against natural hub baselines.

major comments (3)

[Method] Method (optimization procedure): the described procedure optimizes directly against the multi-modal embedding function, implying white-box or gradient access. This directly contradicts the claim of applicability to black-box deployed systems such as Pinecone; no surrogate-model transfer, query-only optimization, or API-only attack is described or evaluated.
[Experiments] Experiments (Pinecone evaluation): the reported success on Pinecone is presented as evidence of real-world applicability, yet the manuscript provides no details on how the adversarial hub was generated or inserted without direct embedding access. If the numbers were obtained with white-box access to the underlying model, they do not support the black-box attack claim.
[Experiments] Evaluation (held-out test queries): the 21,000/25,000 top-1 figure is load-bearing for the generalization claim, yet it is unclear whether the 100 optimization queries were drawn from the same distribution as the test set or whether any data leakage occurred; this must be clarified with explicit train/test splits and query provenance.

minor comments (2)

[Abstract] Abstract: the claim that adversarial hubs work for both image and audio inputs is stated but the quantitative results focus exclusively on text-to-image; a brief statement of audio results (or their absence) would improve clarity.
[Introduction] Notation: the term 'adversarial hub' is introduced without a formal definition distinguishing it from a standard adversarial example; a short definitional paragraph early in the paper would help.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive report, and for recognizing the quantitative strength of the generalization result and the Pinecone evaluation. We address each major comment below and will revise the manuscript accordingly to clarify the threat model and experimental details.

read point-by-point responses

Referee: [Method] Method (optimization procedure): the described procedure optimizes directly against the multi-modal embedding function, implying white-box or gradient access. This directly contradicts the claim of applicability to black-box deployed systems such as Pinecone; no surrogate-model transfer, query-only optimization, or API-only attack is described or evaluated.

Authors: We agree that the optimization procedure requires direct (white-box) access to the embedding function and its gradients. The manuscript does not describe or evaluate any black-box, query-only, or surrogate-based method. We will revise the text to explicitly state the white-box threat model for hub generation and to remove any implication of black-box applicability to systems such as Pinecone. revision: yes
Referee: [Experiments] Experiments (Pinecone evaluation): the reported success on Pinecone is presented as evidence of real-world applicability, yet the manuscript provides no details on how the adversarial hub was generated or inserted without direct embedding access. If the numbers were obtained with white-box access to the underlying model, they do not support the black-box attack claim.

Authors: The Pinecone results were obtained by first generating the adversarial hub with white-box access to the underlying embedding model and then inserting the resulting vector into the Pinecone index. We will add a dedicated paragraph in the revised manuscript that (i) states the white-box assumption for generation and (ii) clarifies that the Pinecone experiment demonstrates the retrieval impact once an adversarial item has been inserted into a production vector database, rather than claiming a black-box generation procedure. revision: yes
Referee: [Experiments] Evaluation (held-out test queries): the 21,000/25,000 top-1 figure is load-bearing for the generalization claim, yet it is unclear whether the 100 optimization queries were drawn from the same distribution as the test set or whether any data leakage occurred; this must be clarified with explicit train/test splits and query provenance.

Authors: The 100 optimization queries were sampled from the training split of the underlying dataset and are disjoint from the 25,000 queries in the standard held-out test split. We will add an explicit statement of the train/test split provenance and confirm the absence of overlap in the revised experimental section. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical evaluation on held-out queries

full rationale

The paper's central results consist of direct empirical measurements: an optimization procedure produces a hub that is then counted as top-1 for >21k/25k held-out test queries, contrasted with natural-hub baselines. No equations or claims reduce a derived quantity to its own fitted inputs by construction, no load-bearing self-citations are invoked to establish uniqueness or correctness, and the method is presented as a standard optimization evaluated on external benchmarks and Pinecone. The derivation chain is therefore self-contained against external data.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Based on abstract only; central claim rests on existence of an unspecified optimization procedure that produces hubs with the stated generalization property and on the assumption that benchmark datasets and Pinecone are representative of real systems.

axioms (1)

domain assumption Hubness is a phenomenon in high-dimensional vector spaces where a point from the natural distribution is unusually close to many other points.
Explicitly stated as a well-known problem in the abstract.

invented entities (1)

adversarial hub no independent evidence
purpose: Crafted input that is retrieved for many queries
Core new concept introduced to describe the attack output; no independent evidence outside the paper is mentioned.

pith-pipeline@v0.9.0 · 5783 in / 1151 out tokens · 39526 ms · 2026-05-23T06:36:50.654232+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

One Single Hub Text Breaks CLIP: Identifying Vulnerabilities in Cross-Modal Encoders via Hubness
cs.CL 2026-04 unverdicted novelty 5.0

A single hub text can unreasonably match many images in CLIP-based similarity, exposing vulnerabilities in cross-modal encoders for caption evaluation and retrieval.

Reference graph

Works this paper leans on

94 extracted references · 94 canonical work pages · cited by 1 Pith paper · 10 internal anchors

[1]

Square Attack: a query- efficient black-box adversarial attack via random search

Maksym Andriushchenko, Francesco Croce, Nicolas Flammarion, and Matthias Hein. Square Attack: a query- efficient black-box adversarial attack via random search. In European Conference on Computer Vision (ECCV), 2020

work page 2020
[2]

Introducing the next generation of Claude

Anthropic. Introducing the next generation of Claude. https://www.anthropic.com/news/ claude-3-family, March 2024. Accessed: August 24, 2025

work page 2024
[3]

Ob- fuscated gradients give a false sense of security: Circum- venting defenses to adversarial examples

Anish Athalye, Nicholas Carlini, and David Wagner. Ob- fuscated gradients give a false sense of security: Circum- venting defenses to adversarial examples. In Interna- tional Conference on Machine Learning (ICML), pages 274–283. PMLR, 2018

work page 2018
[4]

Match- ing words and pictures

Kobus Barnard, Pinar Duygulu, David Forsyth, Nando De Freitas, David M Blei, and Michael I Jordan. Match- ing words and pictures. The Journal of Machine Learn- ing Research, 3:1107–1135, 2003. 2https://github.com/Tingwei-Zhang/adv_hub 13

work page 2003
[5]

Cross modal retrieval with Querybank normalisation

Simion-Vlad Bogolin, Ioana Croitoru, Hailin Jin, Yang Liu, and Samuel Albanie. Cross modal retrieval with Querybank normalisation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022

work page 2022
[6]

Adversarial Patch

Tom B Brown, Dandelion Mané, Aurko Roy, Martín Abadi, and Justin Gilmer. Adversarial patch. arXiv:1712.09665, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[7]

Poisoning web-scale training datasets is practi- cal

Nicholas Carlini, Matthew Jagielski, Christopher A Choquette-Choo, Daniel Paleka, Will Pearce, Hyrum Anderson, Andreas Terzis, Kurt Thomas, and Florian Tramèr. Poisoning web-scale training datasets is practi- cal. In IEEE Symposium on Security and Privacy (S&P), pages 407–425, 2024

work page 2024
[8]

Adversarial examples are not easily detected: Bypassing ten detection methods

Nicholas Carlini, Milad Nasr, Christopher A. Choquette- Choo, Matthew Jagielski, Irena Gao, Anas Awadalla, Pang Wei Koh, Daphne Ippolito, Katherine Lee, Florian Tramer, and Ludwig Schmidt. Are aligned neural net- works adversarially aligned? arXiv:2306.15447, 2023

work page arXiv 2023
[9]

Towards evaluating the robustness of neural networks

Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In IEEE Symposium on Security and Privacy (S&P), pages 39–57, 2017

work page 2017
[10]

Audio adversarial examples: Targeted attacks on speech-to-text

Nicholas Carlini and David Wagner. Audio adversarial examples: Targeted attacks on speech-to-text. In IEEE Symposium on Security and Privacy Workshops, 2018

work page 2018
[11]

Phantom: General trigger attacks on retrieval augmented language generation,

Harsh Chaudhari, Giorgio Severi, John Abascal, Matthew Jagielski, Christopher A Choquette-Choo, Mi- lad Nasr, Cristina Nita-Rotaru, and Alina Oprea. Phan- tom: General trigger attacks on retrieval augmented lan- guage generation. arXiv:2405.20485, 2024

work page arXiv 2024
[12]

Microsoft COCO Captions: Data Collection and Evaluation Server

Xinlei Chen, Hao Fang, Tsung-Yi Lin, Ramakrishna Vedantam, Saurabh Gupta, Piotr Dollár, and C Lawrence Zitnick. Microsoft COCO captions: Data collection and evaluation server. arXiv:1504.00325, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[13]

Reproducible scaling laws for contrastive language- image learning

Mehdi Cherti, Romain Beaumont, Ross Wightman, Mitchell Wortsman, Gabriel Ilharco, Cade Gordon, Christoph Schuhmann, Ludwig Schmidt, and Jenia Jit- sev. Reproducible scaling laws for contrastive language- image learning. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023

work page 2023
[14]

Nearest neighbor normalization improves multimodal retrieval

Neil Chowdhury, Franklin Wang, Sumedh Shenoy, Douwe Kiela, Sarah Schwettmann, and Tristan Thrush. Nearest neighbor normalization improves multimodal retrieval. In Conference on Empirical Methods in Natu- ral Language Processing (EMNLP), 2024

work page 2024
[15]

Embeddings apis overview

Google Cloud. Embeddings apis overview. https://cloud.google.com/vertex-ai/ generative-ai/docs/embeddings, 2025. Ac- cessed: 2025-04-09

work page 2025
[16]

Cer- tified adversarial robustness via randomized smooth- ing

Jeremy Cohen, Elan Rosenfeld, and Zico Kolter. Cer- tified adversarial robustness via randomized smooth- ing. In International Conference on Machine Learning (ICML), pages 1310–1320. PMLR, 2019

work page 2019
[17]

Word transla- tion without parallel data

Alexis Conneau, Guillaume Lample, Marc’Aurelio Ran- zato, Ludovic Denoyer, and Hervé Jégou. Word transla- tion without parallel data. In International Conference on Learning Representations (ICLR), 2018

work page 2018
[18]

Improving zero-shot learning by mitigating the hubness problem

Georgiana Dinu, Angeliki Lazaridou, and Marco Baroni. Improving zero-shot learning by mitigating the hubness problem. arXiv:1412.6568, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[19]

How robust is Google’s Bard to adversarial image attacks? arXiv:2309.11751, 2023

Yinpeng Dong, Huanran Chen, Jiawei Chen, Zhengwei Fang, Xiao Yang, Yichi Zhang, Yu Tian, Hang Su, and Jun Zhu. How robust is Google’s Bard to adversarial image attacks? arXiv:2309.11751, 2023

work page arXiv 2023
[20]

Adversarial attacks to multi-modal models

Zhihao Dou, Xin Hu, Haibo Yang, Zhuqing Liu, and Minghong Fang. Adversarial attacks to multi-modal models. arXiv:2409.06793, 2024

work page arXiv 2024
[21]

Hubness as a case of technical algo- rithmic bias in music recommendation

Arthur Flexer, Monika Dörfler, Jan Schlüter, and Thomas Grill. Hubness as a case of technical algo- rithmic bias in music recommendation. In IEEE In- ternational Conference on Data Mining Workshops (ICDMW), 2018

work page 2018
[22]

Adversarial robustness for visual ground- ing of multimodal large language models

Kuofeng Gao, Yang Bai, Jiawang Bai, Yong Yang, and Shu-Tao Xia. Adversarial robustness for visual ground- ing of multimodal large language models. InICLR Work- shop on Reliable and Responsible Foundation Models, 2024

work page 2024
[23]

Gemini: A Family of Highly Capable Multimodal Models

Gemini Team, Google. Gemini: A family of highly capable multimodal models. arXiv preprint arXiv:2312.11805, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[24]

ImageBind: One embedding space to bind them all

Rohit Girdhar, Alaaeldin El-Nouby, Zhuang Liu, Man- nat Singh, Kalyan Vasudev Alwala, Armand Joulin, and Ishan Misra. ImageBind: One embedding space to bind them all. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023

work page 2023
[25]

Explaining and harnessing adversarial exam- ples

Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial exam- ples. In International Conference on Learning Repre- sentations (ICLR), 2015

work page 2015
[26]

On the effectiveness of interval bound propagation for training verifiably robust models

Sven Gowal, Krishnamurthy Dvijotham, Robert Stan- forth, Rudy Bunel, Chongli Qin, Jonathan Uesato, Relja Arandjelovic, Timothy A Mann, and Pushmeet Kohli. On the effectiveness of interval bound propagation for training verifiably robust models. arXiv:1810.12715, 2018. 14

work page arXiv 2018
[27]

Countering Adversarial Images using Input Transformations

C Guo, M Rana, M Cisse, and L Van Der Maaten. Coun- tering adversarial images using input transformations. arXiv:1711.00117, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[28]

AudioCLIP: Extending CLIP to image, text and audio

Andrey Guzhov, Federico Raue, Jörn Hees, and Andreas Dengel. AudioCLIP: Extending CLIP to image, text and audio. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022

work page 2022
[29]

Localized centering: Reducing hubness in large-sample data

Kazuo Hara, Ikumi Suzuki, Masashi Shimbo, Kei Kobayashi, Kenji Fukumizu, and Miloš Radovanovi´c. Localized centering: Reducing hubness in large-sample data. In Proceedings of the AAAI Conference on Artifi- cial Intelligence (AAAI), 2015

work page 2015
[30]

Adversarial example defense: Ensem- bles of weak defenses are not strong

Warren He, James Wei, Xinyun Chen, Nicholas Carlini, and Dawn Song. Adversarial example defense: Ensem- bles of weak defenses are not strong. In 11th USENIX workshop on offensive technologies (WOOT), 2017

work page 2017
[31]

Defending a music recommender against hubness-based adversarial attacks

Katharina Hoedt, Arthur Flexer, and Gerhard Widmer. Defending a music recommender against hubness-based adversarial attacks. In Proceedings of the 19th Sound and Music Computing Conference (SMC), 2022

work page 2022
[32]

Deceiving Google's Perspective API Built for Detecting Toxic Comments

Hossein Hosseini, Sreeram Kannan, Baosen Zhang, and Radha Poovendran. Deceiving Google’s per- spective API built for detecting toxic comments. arXiv:1702.08138, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[33]

Subpopula- tion data poisoning attacks

Matthew Jagielski, Giorgio Severi, Niklas Pousette Harger, and Alina Oprea. Subpopula- tion data poisoning attacks. In ACM SIGSAC Conference on Computer and Communications Security (CCS), pages 3104–3122, 2021

work page 2021
[34]

A contextual dissimilarity measure for accurate and effi- cient image search

Herve Jegou, Hedi Harzallah, and Cordelia Schmid. A contextual dissimilarity measure for accurate and effi- cient image search. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2007

work page 2007
[35]

Automatic image annotation and retrieval using cross- media relevance models

Jiwoon Jeon, Victor Lavrenko, and Raghavan Manmatha. Automatic image annotation and retrieval using cross- media relevance models. In 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2003

work page 2003
[36]

Adversarial examples for evaluating reading comprehension systems

Robin Jia and Percy Liang. Adversarial examples for evaluating reading comprehension systems. In Con- ference on Empirical Methods in Natural Language Processing (EMNLP), 2017

work page 2017
[37]

La cryptographie militaire

Auguste Kerckhoffs. La cryptographie militaire. BoD– Books on Demand, 2023

work page 2023
[38]

AudioCaps: Generating captions for audios in the wild

Chris Dongjoo Kim, Byeongchang Kim, Hyunmin Lee, and Gunhee Kim. AudioCaps: Generating captions for audios in the wild. In Conference of the North American Chapter of the Association for Computational Linguis- tics: Human Language Technologies (NAACL), 2019

work page 2019
[39]

Ad- versarial self-supervised contrastive learning

Minseon Kim, Jihoon Tack, and Sung Ju Hwang. Ad- versarial self-supervised contrastive learning. In Annual Conference on Neural Information Processing Systems (NeurIPS), 2020

work page 2020
[40]

Learning multiple layers of fea- tures from tiny images

Alex Krizhevsky. Learning multiple layers of fea- tures from tiny images. Technical report, University of Toronto, 2009

work page 2009
[41]

Multi-targeted adversarial example in evasion attack on deep neural network.IEEE Access, 6:46084–46096, 2018

Hyun Kwon, Yongchul Kim, Ki-Woong Park, Hyunsoo Yoon, and Daeseon Choi. Multi-targeted adversarial example in evasion attack on deep neural network.IEEE Access, 6:46084–46096, 2018

work page 2018
[42]

Hubness and pollution: Delving into cross-space map- ping for zero-shot learning

Angeliki Lazaridou, Georgiana Dinu, and Marco Baroni. Hubness and pollution: Delving into cross-space map- ping for zero-shot learning. In 53rd Annual Meeting of the Association for Computational Linguistics (ACL) , 2015

work page 2015
[43]

HAL: Improved text-image matching by mitigating visual semantic hubs

Fangyu Liu, Rongtian Ye, Xun Wang, and Shuaipeng Li. HAL: Improved text-image matching by mitigating visual semantic hubs. In AAAI Conference on Artificial Intelligence (AAAI), 2020

work page 2020
[44]

Delving into transferable adversarial examples and black-box attacks

Yanpei Liu, Xinyun Chen, Chang Liu, and Dawn Song. Delving into transferable adversarial examples and black-box attacks. In International Conference on Learning Representations (ICLR), 2017

work page 2017
[45]

Feature distillation: Dnn- oriented jpeg compression against adversarial examples

Zihao Liu, Qi Liu, Tao Liu, Nuo Xu, Xue Lin, Yanzhi Wang, and Wujie Wen. Feature distillation: Dnn- oriented jpeg compression against adversarial examples. In IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR), 2019

work page 2019
[46]

The hubness phenomenon: Fact or artifact? Towards Advanced Data Analysis by Com- bining Soft Computing and Statistics , pages 267–278, 2013

Thomas Low, Christian Borgelt, Sebastian Stober, and Andreas Nürnberger. The hubness phenomenon: Fact or artifact? Towards Advanced Data Analysis by Com- bining Soft Computing and Statistics , pages 267–278, 2013

work page 2013
[47]

Distinctive image features from scale- invariant keypoints

David G Lowe. Distinctive image features from scale- invariant keypoints. International Journal of Computer Vision, 60:91–110, 2004

work page 2004
[48]

Towards deep learning models resistant to adversarial attacks

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations (ICLR), 2018

work page 2018
[49]

Universal adversarial perturbations

Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. Universal adversarial perturbations. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017. 15

work page 2017
[50]

Representation Learning with Contrastive Predictive Coding

Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Rep- resentation learning with contrastive predictive coding. arXiv:1807.03748, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[51]

Gpt-4 technical report

OpenAI. Gpt-4 technical report. https://openai. com/research/gpt-4, 2023. Accessed: 2025-04-09

work page 2023
[52]

Hello GPT-4o

OpenAI. Hello GPT-4o. https://openai.com/ index/hello-gpt-4o/, May 2024. Accessed: Au- gust 24, 2025

work page 2024
[53]

Speechguard: Exploring the adversarial robustness of multimodal large language models, 2024

Raghuveer Peri, Sai Muralidhar Jayanthi, Srikanth Ro- nanki, Anshu Bhatia, Karel Mundnich, Saket Dingliwal, Nilaksh Das, Zejiang Hou, Goeric Huybrechts, Srikanth Vishnubhotla, et al. SpeechGuard: Exploring the adver- sarial robustness of multimodal large language models. arXiv:2405.08317, 2024

work page arXiv 2024
[54]

Pinecone - vector database for machine learn- ing

Pinecone. Pinecone - vector database for machine learn- ing. https://www.pinecone.io, 2024. Accessed: 2024-11-12

work page 2024
[55]

Visual adversarial examples jailbreak aligned large language models

Xiangyu Qi, Kaixuan Huang, Ashwinee Panda, Mengdi Wang, and Prateek Mittal. Visual adversarial examples jailbreak aligned large language models. In Interna- tional Conference on Machine Learning (ICML) Work- shop on New Frontiers in Adversarial Machine Learn- ing, 2023

work page 2023
[56]

Learning transferable visual models from natural lan- guage supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sas- try, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural lan- guage supervision. In International Conference on Ma- chine Learning (ICML), 2021

work page 2021
[57]

Hubs in space: Popular nearest neighbors in high-dimensional data

Milos Radovanovic, Alexandros Nanopoulos, and Mir- jana Ivanovic. Hubs in space: Popular nearest neighbors in high-dimensional data. Journal of Machine Learning Research (JMLR), 11(9):2487–2531, 2010

work page 2010
[58]

Term- weighting approaches in automatic text retrieval

Gerard Salton and Christopher Buckley. Term- weighting approaches in automatic text retrieval. In- formation Processing & Management, 24(5):513–523, 1988

work page 1988
[59]

Learning cross-modal embeddings for cooking recipes and food images

Amaia Salvador, Nicholas Hynes, Yusuf Aytar, Javier Marin, Ferda Ofli, Ingmar Weber, and Antonio Torralba. Learning cross-modal embeddings for cooking recipes and food images. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017

work page 2017
[60]

Local and global scaling reduce hubs in space

Dominik Schnitzer, Arthur Flexer, Markus Schedl, and Gerhard Widmer. Local and global scaling reduce hubs in space. Journal of Machine Learning Research (JMLR), 13:2871–2902, 2012

work page 2012
[61]

FaceNet: A unified embedding for face recog- nition and clustering

Florian Schro ff, Dmitry Kalenichenko, and James Philbin. FaceNet: A unified embedding for face recog- nition and clustering. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015

work page 2015
[62]

Poison frogs! Targeted clean-label poisoning at- tacks on neural networks

Ali Shafahi, W Ronny Huang, Mahyar Najibi, Octavian Suciu, Christoph Studer, Tudor Dumitras, and Tom Gold- stein. Poison frogs! Targeted clean-label poisoning at- tacks on neural networks. In Advances in Neural Infor- mation Processing Systems (NIPS), 2018

work page 2018
[63]

Machine against the RAG: Jamming retrieval- augmented generation with blocker documents

Avital Shafran, Roei Schuster, and Vitaly Shmatikov. Machine against the RAG: Jamming retrieval- augmented generation with blocker documents. In USENIX Security Symposium, 2024

work page 2024
[64]

Plug and Pray: Exploiting off-the-shelf components of multi-modal models

Erfan Shayegani, Yue Dong, and Nael Abu-Ghazaleh. Plug and Pray: Exploiting off-the-shelf components of multi-modal models. arXiv:2307.14539, 2023

work page arXiv 2023
[65]

JPEG-resistant adver- sarial images

Richard Shin and Dawn Song. JPEG-resistant adver- sarial images. In NIPS 2017 Workshop on Machine Learning and Computer Security, 2017

work page 2017
[66]

Offline bilingual word vectors, orthogonal transformations and the inverted softmax

Samuel L Smith, David HP Turban, Steven Hamblin, and Nils Y Hammerla. Offline bilingual word vectors, orthogonal transformations and the inverted softmax. In International Conference on Learning Representations (ICLR), 2017

work page 2017
[67]

Artificial streaming, 2023

Spotify for Artists. Artificial streaming, 2023. Accessed: 2024-11-13

work page 2023
[68]

Hybrid batch attacks: Finding black-box adversarial ex- amples with limited queries

Fnu Suya, Jianfeng Chi, David Evans, and Yuan Tian. Hybrid batch attacks: Finding black-box adversarial ex- amples with limited queries. In USENIX Security Sym- posium, pages 1327–1344, 2020

work page 2020
[69]

Model-targeted poisoning attacks with provable convergence

Fnu Suya, Saeed Mahloujifar, Anshuman Suri, David Evans, and Yuan Tian. Model-targeted poisoning attacks with provable convergence. In International Confer- ence on Machine Learning (ICML), pages 10000–10010. PMLR, 2021

work page 2021
[70]

Investigating the e ffec- tiveness of laplacian-based kernels in hub reduction

Ikumi Suzuki, Kazuo Hara, Masashi Shimbo, Yuji Mat- sumoto, and Marco Saerens. Investigating the e ffec- tiveness of laplacian-based kernels in hub reduction. In AAAI Conference on Artificial Intelligence (AAAI), 2012

work page 2012
[71]

Centering similarity mea- sures to reduce hubs

Ikumi Suzuki, Kazuo Hara, Masashi Shimbo, Marco Saerens, and Kenji Fukumizu. Centering similarity mea- sures to reduce hubs. In Conference on Empirical Meth- ods in Natural Language Processing (EMNLP), 2013

work page 2013
[72]

Intriguing properties of neural networks

C Szegedy. Intriguing properties of neural networks. In International Conference on Learning Representations (ICLR), 2014. 16

work page 2014
[73]

Adversarial training and robustness for multiple perturbations

Florian Tramer and Dan Boneh. Adversarial training and robustness for multiple perturbations. Advances in Neural Information Processing Systems (NIPS), 2019

work page 2019
[74]

Caltech-UCSD Birds-200- 2011 dataset

Catherine Wah, Steve Branson, Peter Welinder, Pietro Perona, and Serge Belongie. Caltech-UCSD Birds-200- 2011 dataset. 2011

work page 2011
[75]

Adversarial cross-modal retrieval

Bokun Wang, Yang Yang, Xing Xu, Alan Hanjalic, and Heng Tao Shen. Adversarial cross-modal retrieval. In 25th ACM International Conference on Multimedia (ACM MM), 2017

work page 2017
[76]

A Comprehensive Survey on Cross-modal Retrieval

Kaiye Wang, Qiyue Yin, Wei Wang, Shu Wu, and Liang Wang. A comprehensive survey on cross-modal retrieval. arXiv:1607.06215, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[77]

Cross-modal retrieval: a systematic review of methods and future directions.Pro- ceedings of the IEEE, 112(11):1716–1754, 2024

Tianshi Wang, Fengling Li, Lei Zhu, Jingjing Li, Zheng Zhang, and Heng Tao Shen. Cross-modal retrieval: a systematic review of methods and future directions.Pro- ceedings of the IEEE, 112(11):1716–1754, 2024

work page 2024
[78]

Balance Act: Mitigating hubness in cross-modal retrieval with query and gallery banks

Yimu Wang, Xiangru Jian, and Bo Xue. Balance Act: Mitigating hubness in cross-modal retrieval with query and gallery banks. In Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023

work page 2023
[79]

Provable defenses against adversarial examples via the convex outer adversarial polytope

Eric Wong and Zico Kolter. Provable defenses against adversarial examples via the convex outer adversarial polytope. In International Conference on Machine Learning (ICML), pages 5286–5295. PMLR, 2018

work page 2018
[80]

Adversarial at- tacks on multimodal agents

Chen Henry Wu, Jing Yu Koh, Ruslan Salakhutdinov, Daniel Fried, and Aditi Raghunathan. Adversarial at- tacks on multimodal agents. arXiv:2406.12814, 2024

work page arXiv 2024

Showing first 80 references.

[1] [1]

Square Attack: a query- efficient black-box adversarial attack via random search

Maksym Andriushchenko, Francesco Croce, Nicolas Flammarion, and Matthias Hein. Square Attack: a query- efficient black-box adversarial attack via random search. In European Conference on Computer Vision (ECCV), 2020

work page 2020

[2] [2]

Introducing the next generation of Claude

Anthropic. Introducing the next generation of Claude. https://www.anthropic.com/news/ claude-3-family, March 2024. Accessed: August 24, 2025

work page 2024

[3] [3]

Ob- fuscated gradients give a false sense of security: Circum- venting defenses to adversarial examples

Anish Athalye, Nicholas Carlini, and David Wagner. Ob- fuscated gradients give a false sense of security: Circum- venting defenses to adversarial examples. In Interna- tional Conference on Machine Learning (ICML), pages 274–283. PMLR, 2018

work page 2018

[4] [4]

Match- ing words and pictures

Kobus Barnard, Pinar Duygulu, David Forsyth, Nando De Freitas, David M Blei, and Michael I Jordan. Match- ing words and pictures. The Journal of Machine Learn- ing Research, 3:1107–1135, 2003. 2https://github.com/Tingwei-Zhang/adv_hub 13

work page 2003

[5] [5]

Cross modal retrieval with Querybank normalisation

Simion-Vlad Bogolin, Ioana Croitoru, Hailin Jin, Yang Liu, and Samuel Albanie. Cross modal retrieval with Querybank normalisation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022

work page 2022

[6] [6]

Adversarial Patch

Tom B Brown, Dandelion Mané, Aurko Roy, Martín Abadi, and Justin Gilmer. Adversarial patch. arXiv:1712.09665, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[7] [7]

Poisoning web-scale training datasets is practi- cal

Nicholas Carlini, Matthew Jagielski, Christopher A Choquette-Choo, Daniel Paleka, Will Pearce, Hyrum Anderson, Andreas Terzis, Kurt Thomas, and Florian Tramèr. Poisoning web-scale training datasets is practi- cal. In IEEE Symposium on Security and Privacy (S&P), pages 407–425, 2024

work page 2024

[8] [8]

Adversarial examples are not easily detected: Bypassing ten detection methods

Nicholas Carlini, Milad Nasr, Christopher A. Choquette- Choo, Matthew Jagielski, Irena Gao, Anas Awadalla, Pang Wei Koh, Daphne Ippolito, Katherine Lee, Florian Tramer, and Ludwig Schmidt. Are aligned neural net- works adversarially aligned? arXiv:2306.15447, 2023

work page arXiv 2023

[9] [9]

Towards evaluating the robustness of neural networks

Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In IEEE Symposium on Security and Privacy (S&P), pages 39–57, 2017

work page 2017

[10] [10]

Audio adversarial examples: Targeted attacks on speech-to-text

Nicholas Carlini and David Wagner. Audio adversarial examples: Targeted attacks on speech-to-text. In IEEE Symposium on Security and Privacy Workshops, 2018

work page 2018

[11] [11]

Phantom: General trigger attacks on retrieval augmented language generation,

Harsh Chaudhari, Giorgio Severi, John Abascal, Matthew Jagielski, Christopher A Choquette-Choo, Mi- lad Nasr, Cristina Nita-Rotaru, and Alina Oprea. Phan- tom: General trigger attacks on retrieval augmented lan- guage generation. arXiv:2405.20485, 2024

work page arXiv 2024

[12] [12]

Microsoft COCO Captions: Data Collection and Evaluation Server

Xinlei Chen, Hao Fang, Tsung-Yi Lin, Ramakrishna Vedantam, Saurabh Gupta, Piotr Dollár, and C Lawrence Zitnick. Microsoft COCO captions: Data collection and evaluation server. arXiv:1504.00325, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[13] [13]

Reproducible scaling laws for contrastive language- image learning

Mehdi Cherti, Romain Beaumont, Ross Wightman, Mitchell Wortsman, Gabriel Ilharco, Cade Gordon, Christoph Schuhmann, Ludwig Schmidt, and Jenia Jit- sev. Reproducible scaling laws for contrastive language- image learning. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023

work page 2023

[14] [14]

Nearest neighbor normalization improves multimodal retrieval

Neil Chowdhury, Franklin Wang, Sumedh Shenoy, Douwe Kiela, Sarah Schwettmann, and Tristan Thrush. Nearest neighbor normalization improves multimodal retrieval. In Conference on Empirical Methods in Natu- ral Language Processing (EMNLP), 2024

work page 2024

[15] [15]

Embeddings apis overview

Google Cloud. Embeddings apis overview. https://cloud.google.com/vertex-ai/ generative-ai/docs/embeddings, 2025. Ac- cessed: 2025-04-09

work page 2025

[16] [16]

Cer- tified adversarial robustness via randomized smooth- ing

Jeremy Cohen, Elan Rosenfeld, and Zico Kolter. Cer- tified adversarial robustness via randomized smooth- ing. In International Conference on Machine Learning (ICML), pages 1310–1320. PMLR, 2019

work page 2019

[17] [17]

Word transla- tion without parallel data

Alexis Conneau, Guillaume Lample, Marc’Aurelio Ran- zato, Ludovic Denoyer, and Hervé Jégou. Word transla- tion without parallel data. In International Conference on Learning Representations (ICLR), 2018

work page 2018

[18] [18]

Improving zero-shot learning by mitigating the hubness problem

Georgiana Dinu, Angeliki Lazaridou, and Marco Baroni. Improving zero-shot learning by mitigating the hubness problem. arXiv:1412.6568, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[19] [19]

How robust is Google’s Bard to adversarial image attacks? arXiv:2309.11751, 2023

Yinpeng Dong, Huanran Chen, Jiawei Chen, Zhengwei Fang, Xiao Yang, Yichi Zhang, Yu Tian, Hang Su, and Jun Zhu. How robust is Google’s Bard to adversarial image attacks? arXiv:2309.11751, 2023

work page arXiv 2023

[20] [20]

Adversarial attacks to multi-modal models

Zhihao Dou, Xin Hu, Haibo Yang, Zhuqing Liu, and Minghong Fang. Adversarial attacks to multi-modal models. arXiv:2409.06793, 2024

work page arXiv 2024

[21] [21]

Hubness as a case of technical algo- rithmic bias in music recommendation

Arthur Flexer, Monika Dörfler, Jan Schlüter, and Thomas Grill. Hubness as a case of technical algo- rithmic bias in music recommendation. In IEEE In- ternational Conference on Data Mining Workshops (ICDMW), 2018

work page 2018

[22] [22]

Adversarial robustness for visual ground- ing of multimodal large language models

Kuofeng Gao, Yang Bai, Jiawang Bai, Yong Yang, and Shu-Tao Xia. Adversarial robustness for visual ground- ing of multimodal large language models. InICLR Work- shop on Reliable and Responsible Foundation Models, 2024

work page 2024

[23] [23]

Gemini: A Family of Highly Capable Multimodal Models

Gemini Team, Google. Gemini: A family of highly capable multimodal models. arXiv preprint arXiv:2312.11805, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[24] [24]

ImageBind: One embedding space to bind them all

Rohit Girdhar, Alaaeldin El-Nouby, Zhuang Liu, Man- nat Singh, Kalyan Vasudev Alwala, Armand Joulin, and Ishan Misra. ImageBind: One embedding space to bind them all. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023

work page 2023

[25] [25]

Explaining and harnessing adversarial exam- ples

Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial exam- ples. In International Conference on Learning Repre- sentations (ICLR), 2015

work page 2015

[26] [26]

On the effectiveness of interval bound propagation for training verifiably robust models

Sven Gowal, Krishnamurthy Dvijotham, Robert Stan- forth, Rudy Bunel, Chongli Qin, Jonathan Uesato, Relja Arandjelovic, Timothy A Mann, and Pushmeet Kohli. On the effectiveness of interval bound propagation for training verifiably robust models. arXiv:1810.12715, 2018. 14

work page arXiv 2018

[27] [27]

Countering Adversarial Images using Input Transformations

C Guo, M Rana, M Cisse, and L Van Der Maaten. Coun- tering adversarial images using input transformations. arXiv:1711.00117, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021

[28] [28]

AudioCLIP: Extending CLIP to image, text and audio

Andrey Guzhov, Federico Raue, Jörn Hees, and Andreas Dengel. AudioCLIP: Extending CLIP to image, text and audio. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022

work page 2022

[29] [29]

Localized centering: Reducing hubness in large-sample data

Kazuo Hara, Ikumi Suzuki, Masashi Shimbo, Kei Kobayashi, Kenji Fukumizu, and Miloš Radovanovi´c. Localized centering: Reducing hubness in large-sample data. In Proceedings of the AAAI Conference on Artifi- cial Intelligence (AAAI), 2015

work page 2015

[30] [30]

Adversarial example defense: Ensem- bles of weak defenses are not strong

Warren He, James Wei, Xinyun Chen, Nicholas Carlini, and Dawn Song. Adversarial example defense: Ensem- bles of weak defenses are not strong. In 11th USENIX workshop on offensive technologies (WOOT), 2017

work page 2017

[31] [31]

Defending a music recommender against hubness-based adversarial attacks

Katharina Hoedt, Arthur Flexer, and Gerhard Widmer. Defending a music recommender against hubness-based adversarial attacks. In Proceedings of the 19th Sound and Music Computing Conference (SMC), 2022

work page 2022

[32] [32]

Deceiving Google's Perspective API Built for Detecting Toxic Comments

Hossein Hosseini, Sreeram Kannan, Baosen Zhang, and Radha Poovendran. Deceiving Google’s per- spective API built for detecting toxic comments. arXiv:1702.08138, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[33] [33]

Subpopula- tion data poisoning attacks

Matthew Jagielski, Giorgio Severi, Niklas Pousette Harger, and Alina Oprea. Subpopula- tion data poisoning attacks. In ACM SIGSAC Conference on Computer and Communications Security (CCS), pages 3104–3122, 2021

work page 2021

[34] [34]

A contextual dissimilarity measure for accurate and effi- cient image search

Herve Jegou, Hedi Harzallah, and Cordelia Schmid. A contextual dissimilarity measure for accurate and effi- cient image search. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2007

work page 2007

[35] [35]

Automatic image annotation and retrieval using cross- media relevance models

Jiwoon Jeon, Victor Lavrenko, and Raghavan Manmatha. Automatic image annotation and retrieval using cross- media relevance models. In 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2003

work page 2003

[36] [36]

Adversarial examples for evaluating reading comprehension systems

Robin Jia and Percy Liang. Adversarial examples for evaluating reading comprehension systems. In Con- ference on Empirical Methods in Natural Language Processing (EMNLP), 2017

work page 2017

[37] [37]

La cryptographie militaire

Auguste Kerckhoffs. La cryptographie militaire. BoD– Books on Demand, 2023

work page 2023

[38] [38]

AudioCaps: Generating captions for audios in the wild

Chris Dongjoo Kim, Byeongchang Kim, Hyunmin Lee, and Gunhee Kim. AudioCaps: Generating captions for audios in the wild. In Conference of the North American Chapter of the Association for Computational Linguis- tics: Human Language Technologies (NAACL), 2019

work page 2019

[39] [39]

Ad- versarial self-supervised contrastive learning

Minseon Kim, Jihoon Tack, and Sung Ju Hwang. Ad- versarial self-supervised contrastive learning. In Annual Conference on Neural Information Processing Systems (NeurIPS), 2020

work page 2020

[40] [40]

Learning multiple layers of fea- tures from tiny images

Alex Krizhevsky. Learning multiple layers of fea- tures from tiny images. Technical report, University of Toronto, 2009

work page 2009

[41] [41]

Multi-targeted adversarial example in evasion attack on deep neural network.IEEE Access, 6:46084–46096, 2018

Hyun Kwon, Yongchul Kim, Ki-Woong Park, Hyunsoo Yoon, and Daeseon Choi. Multi-targeted adversarial example in evasion attack on deep neural network.IEEE Access, 6:46084–46096, 2018

work page 2018

[42] [42]

Hubness and pollution: Delving into cross-space map- ping for zero-shot learning

Angeliki Lazaridou, Georgiana Dinu, and Marco Baroni. Hubness and pollution: Delving into cross-space map- ping for zero-shot learning. In 53rd Annual Meeting of the Association for Computational Linguistics (ACL) , 2015

work page 2015

[43] [43]

HAL: Improved text-image matching by mitigating visual semantic hubs

Fangyu Liu, Rongtian Ye, Xun Wang, and Shuaipeng Li. HAL: Improved text-image matching by mitigating visual semantic hubs. In AAAI Conference on Artificial Intelligence (AAAI), 2020

work page 2020

[44] [44]

Delving into transferable adversarial examples and black-box attacks

Yanpei Liu, Xinyun Chen, Chang Liu, and Dawn Song. Delving into transferable adversarial examples and black-box attacks. In International Conference on Learning Representations (ICLR), 2017

work page 2017

[45] [45]

Feature distillation: Dnn- oriented jpeg compression against adversarial examples

Zihao Liu, Qi Liu, Tao Liu, Nuo Xu, Xue Lin, Yanzhi Wang, and Wujie Wen. Feature distillation: Dnn- oriented jpeg compression against adversarial examples. In IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR), 2019

work page 2019

[46] [46]

The hubness phenomenon: Fact or artifact? Towards Advanced Data Analysis by Com- bining Soft Computing and Statistics , pages 267–278, 2013

Thomas Low, Christian Borgelt, Sebastian Stober, and Andreas Nürnberger. The hubness phenomenon: Fact or artifact? Towards Advanced Data Analysis by Com- bining Soft Computing and Statistics , pages 267–278, 2013

work page 2013

[47] [47]

Distinctive image features from scale- invariant keypoints

David G Lowe. Distinctive image features from scale- invariant keypoints. International Journal of Computer Vision, 60:91–110, 2004

work page 2004

[48] [48]

Towards deep learning models resistant to adversarial attacks

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations (ICLR), 2018

work page 2018

[49] [49]

Universal adversarial perturbations

Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. Universal adversarial perturbations. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017. 15

work page 2017

[50] [50]

Representation Learning with Contrastive Predictive Coding

Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Rep- resentation learning with contrastive predictive coding. arXiv:1807.03748, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[51] [51]

Gpt-4 technical report

OpenAI. Gpt-4 technical report. https://openai. com/research/gpt-4, 2023. Accessed: 2025-04-09

work page 2023

[52] [52]

Hello GPT-4o

OpenAI. Hello GPT-4o. https://openai.com/ index/hello-gpt-4o/, May 2024. Accessed: Au- gust 24, 2025

work page 2024

[53] [53]

Speechguard: Exploring the adversarial robustness of multimodal large language models, 2024

Raghuveer Peri, Sai Muralidhar Jayanthi, Srikanth Ro- nanki, Anshu Bhatia, Karel Mundnich, Saket Dingliwal, Nilaksh Das, Zejiang Hou, Goeric Huybrechts, Srikanth Vishnubhotla, et al. SpeechGuard: Exploring the adver- sarial robustness of multimodal large language models. arXiv:2405.08317, 2024

work page arXiv 2024

[54] [54]

Pinecone - vector database for machine learn- ing

Pinecone. Pinecone - vector database for machine learn- ing. https://www.pinecone.io, 2024. Accessed: 2024-11-12

work page 2024

[55] [55]

Visual adversarial examples jailbreak aligned large language models

Xiangyu Qi, Kaixuan Huang, Ashwinee Panda, Mengdi Wang, and Prateek Mittal. Visual adversarial examples jailbreak aligned large language models. In Interna- tional Conference on Machine Learning (ICML) Work- shop on New Frontiers in Adversarial Machine Learn- ing, 2023

work page 2023

[56] [56]

Learning transferable visual models from natural lan- guage supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sas- try, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural lan- guage supervision. In International Conference on Ma- chine Learning (ICML), 2021

work page 2021

[57] [57]

Hubs in space: Popular nearest neighbors in high-dimensional data

Milos Radovanovic, Alexandros Nanopoulos, and Mir- jana Ivanovic. Hubs in space: Popular nearest neighbors in high-dimensional data. Journal of Machine Learning Research (JMLR), 11(9):2487–2531, 2010

work page 2010

[58] [58]

Term- weighting approaches in automatic text retrieval

Gerard Salton and Christopher Buckley. Term- weighting approaches in automatic text retrieval. In- formation Processing & Management, 24(5):513–523, 1988

work page 1988

[59] [59]

Learning cross-modal embeddings for cooking recipes and food images

Amaia Salvador, Nicholas Hynes, Yusuf Aytar, Javier Marin, Ferda Ofli, Ingmar Weber, and Antonio Torralba. Learning cross-modal embeddings for cooking recipes and food images. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017

work page 2017

[60] [60]

Local and global scaling reduce hubs in space

Dominik Schnitzer, Arthur Flexer, Markus Schedl, and Gerhard Widmer. Local and global scaling reduce hubs in space. Journal of Machine Learning Research (JMLR), 13:2871–2902, 2012

work page 2012

[61] [61]

FaceNet: A unified embedding for face recog- nition and clustering

Florian Schro ff, Dmitry Kalenichenko, and James Philbin. FaceNet: A unified embedding for face recog- nition and clustering. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015

work page 2015

[62] [62]

Poison frogs! Targeted clean-label poisoning at- tacks on neural networks

Ali Shafahi, W Ronny Huang, Mahyar Najibi, Octavian Suciu, Christoph Studer, Tudor Dumitras, and Tom Gold- stein. Poison frogs! Targeted clean-label poisoning at- tacks on neural networks. In Advances in Neural Infor- mation Processing Systems (NIPS), 2018

work page 2018

[63] [63]

Machine against the RAG: Jamming retrieval- augmented generation with blocker documents

Avital Shafran, Roei Schuster, and Vitaly Shmatikov. Machine against the RAG: Jamming retrieval- augmented generation with blocker documents. In USENIX Security Symposium, 2024

work page 2024

[64] [64]

Plug and Pray: Exploiting off-the-shelf components of multi-modal models

Erfan Shayegani, Yue Dong, and Nael Abu-Ghazaleh. Plug and Pray: Exploiting off-the-shelf components of multi-modal models. arXiv:2307.14539, 2023

work page arXiv 2023

[65] [65]

JPEG-resistant adver- sarial images

Richard Shin and Dawn Song. JPEG-resistant adver- sarial images. In NIPS 2017 Workshop on Machine Learning and Computer Security, 2017

work page 2017

[66] [66]

Offline bilingual word vectors, orthogonal transformations and the inverted softmax

Samuel L Smith, David HP Turban, Steven Hamblin, and Nils Y Hammerla. Offline bilingual word vectors, orthogonal transformations and the inverted softmax. In International Conference on Learning Representations (ICLR), 2017

work page 2017

[67] [67]

Artificial streaming, 2023

Spotify for Artists. Artificial streaming, 2023. Accessed: 2024-11-13

work page 2023

[68] [68]

Hybrid batch attacks: Finding black-box adversarial ex- amples with limited queries

Fnu Suya, Jianfeng Chi, David Evans, and Yuan Tian. Hybrid batch attacks: Finding black-box adversarial ex- amples with limited queries. In USENIX Security Sym- posium, pages 1327–1344, 2020

work page 2020

[69] [69]

Model-targeted poisoning attacks with provable convergence

Fnu Suya, Saeed Mahloujifar, Anshuman Suri, David Evans, and Yuan Tian. Model-targeted poisoning attacks with provable convergence. In International Confer- ence on Machine Learning (ICML), pages 10000–10010. PMLR, 2021

work page 2021

[70] [70]

Investigating the e ffec- tiveness of laplacian-based kernels in hub reduction

Ikumi Suzuki, Kazuo Hara, Masashi Shimbo, Yuji Mat- sumoto, and Marco Saerens. Investigating the e ffec- tiveness of laplacian-based kernels in hub reduction. In AAAI Conference on Artificial Intelligence (AAAI), 2012

work page 2012

[71] [71]

Centering similarity mea- sures to reduce hubs

Ikumi Suzuki, Kazuo Hara, Masashi Shimbo, Marco Saerens, and Kenji Fukumizu. Centering similarity mea- sures to reduce hubs. In Conference on Empirical Meth- ods in Natural Language Processing (EMNLP), 2013

work page 2013

[72] [72]

Intriguing properties of neural networks

C Szegedy. Intriguing properties of neural networks. In International Conference on Learning Representations (ICLR), 2014. 16

work page 2014

[73] [73]

Adversarial training and robustness for multiple perturbations

Florian Tramer and Dan Boneh. Adversarial training and robustness for multiple perturbations. Advances in Neural Information Processing Systems (NIPS), 2019

work page 2019

[74] [74]

Caltech-UCSD Birds-200- 2011 dataset

Catherine Wah, Steve Branson, Peter Welinder, Pietro Perona, and Serge Belongie. Caltech-UCSD Birds-200- 2011 dataset. 2011

work page 2011

[75] [75]

Adversarial cross-modal retrieval

Bokun Wang, Yang Yang, Xing Xu, Alan Hanjalic, and Heng Tao Shen. Adversarial cross-modal retrieval. In 25th ACM International Conference on Multimedia (ACM MM), 2017

work page 2017

[76] [76]

A Comprehensive Survey on Cross-modal Retrieval

Kaiye Wang, Qiyue Yin, Wei Wang, Shu Wu, and Liang Wang. A comprehensive survey on cross-modal retrieval. arXiv:1607.06215, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[77] [77]

Cross-modal retrieval: a systematic review of methods and future directions.Pro- ceedings of the IEEE, 112(11):1716–1754, 2024

Tianshi Wang, Fengling Li, Lei Zhu, Jingjing Li, Zheng Zhang, and Heng Tao Shen. Cross-modal retrieval: a systematic review of methods and future directions.Pro- ceedings of the IEEE, 112(11):1716–1754, 2024

work page 2024

[78] [78]

Balance Act: Mitigating hubness in cross-modal retrieval with query and gallery banks

Yimu Wang, Xiangru Jian, and Bo Xue. Balance Act: Mitigating hubness in cross-modal retrieval with query and gallery banks. In Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023

work page 2023

[79] [79]

Provable defenses against adversarial examples via the convex outer adversarial polytope

Eric Wong and Zico Kolter. Provable defenses against adversarial examples via the convex outer adversarial polytope. In International Conference on Machine Learning (ICML), pages 5286–5295. PMLR, 2018

work page 2018

[80] [80]

Adversarial at- tacks on multimodal agents

Chen Henry Wu, Jing Yu Koh, Ruslan Salakhutdinov, Daniel Fried, and Aditi Raghunathan. Adversarial at- tacks on multimodal agents. arXiv:2406.12814, 2024

work page arXiv 2024