Adversarial Hubness in Multi-Modal Retrieval
Pith reviewed 2026-05-23 06:36 UTC · model grok-4.3
The pith
Attackers can turn any image into an adversarial hub retrieved as top-1 for over 21,000 out of 25,000 queries in multi-modal systems.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A method exists for generating adversarial hubs from arbitrary images or audio that are retrieved as the top match for a large fraction of queries in multi-modal retrieval, with one such hub serving as top-1 for more than 21,000 of 25,000 test queries compared to only 102 for the strongest natural hub.
What carries the argument
An optimization procedure that crafts an input to increase its proximity to many random or targeted query embeddings in the shared multi-modal space.
If this is right
- Universal spam or malicious content can be injected so that it appears in response to thousands of unrelated user queries.
- Targeted attacks become feasible by steering hubs toward queries related to attacker-chosen concepts.
- Existing methods for reducing natural hubness do not prevent adversarial hubs from dominating retrieval results.
- The attack applies to both benchmark datasets and production vector databases such as Pinecone.
Where Pith is reading between the lines
- Retrieval systems may need new defenses that detect or penalize artificially created hubs rather than relying on natural-distribution assumptions.
- Similar hub-creation attacks could extend to other high-dimensional retrieval tasks beyond multi-modal embeddings.
- Service providers using vector search should test whether their current models already contain or can be forced to contain such hubs.
Load-bearing premise
The optimization used to create adversarial hubs works across different embedding models and retrieval systems without needing white-box access to the target.
What would settle it
Running the same hub-generation procedure on a multi-modal embedding model different from those tested and measuring whether the resulting hub still ranks in the top position for thousands of queries.
Figures
read the original abstract
Hubness is a phenomenon in high-dimensional vector spaces where a point from the natural distribution is unusually close to many other points. This is a well-known problem in information retrieval that causes some items to accidentally (and incorrectly) appear relevant to many queries. In this paper, we investigate how attackers can exploit hubness to turn any image or audio input in a multi-modal retrieval system into an adversarial hub. Adversarial hubs can be used to inject universal adversarial content (e.g., spam) that will be retrieved in response to thousands of different queries, and also for targeted attacks on queries related to specific, attacker-chosen concepts. We present a method for creating adversarial hubs and evaluate the resulting hubs on benchmark multi-modal retrieval datasets and an image-to-image retrieval system implemented by Pinecone, a popular vector database. For example, in text-caption-to-image retrieval, a single adversarial hub, generated using 100 random queries, is retrieved as the top-1 most relevant image for more than 21,000 out of 25,000 test queries (by contrast, the most common natural hub is the top-1 response to only 102 queries), demonstrating the strong generalization capabilities of adversarial hubs. We also investigate whether techniques for mitigating natural hubness can also mitigate adversarial hubs, and show that they are not effective against hubs that target queries related to specific concepts.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that attackers can exploit hubness in multi-modal embedding spaces to create 'adversarial hubs'—optimized inputs that become universal top-1 retrieval results for thousands of queries. Using an optimization procedure on 100 random queries, a single adversarial hub is reported as top-1 for >21,000 of 25,000 held-out test queries in text-to-image retrieval (vs. 102 for the strongest natural hub). The work evaluates this on benchmark datasets and a Pinecone vector database, shows targeted attacks on concept-related queries, and finds that standard natural-hubness mitigations fail against adversarial hubs.
Significance. If the central empirical result holds under the stated threat model, the work identifies a new, high-impact attack surface in multi-modal retrieval: a single crafted item can dominate retrieval for the vast majority of queries, enabling scalable spam injection or concept-targeted poisoning. The reported generalization (100 optimization queries → 21k+ test queries) is quantitatively striking and would constitute a qualitatively stronger universal attack than typical per-query adversarial examples. Credit is due for the concrete Pinecone evaluation and the comparison against natural hub baselines.
major comments (3)
- [Method] Method (optimization procedure): the described procedure optimizes directly against the multi-modal embedding function, implying white-box or gradient access. This directly contradicts the claim of applicability to black-box deployed systems such as Pinecone; no surrogate-model transfer, query-only optimization, or API-only attack is described or evaluated.
- [Experiments] Experiments (Pinecone evaluation): the reported success on Pinecone is presented as evidence of real-world applicability, yet the manuscript provides no details on how the adversarial hub was generated or inserted without direct embedding access. If the numbers were obtained with white-box access to the underlying model, they do not support the black-box attack claim.
- [Experiments] Evaluation (held-out test queries): the 21,000/25,000 top-1 figure is load-bearing for the generalization claim, yet it is unclear whether the 100 optimization queries were drawn from the same distribution as the test set or whether any data leakage occurred; this must be clarified with explicit train/test splits and query provenance.
minor comments (2)
- [Abstract] Abstract: the claim that adversarial hubs work for both image and audio inputs is stated but the quantitative results focus exclusively on text-to-image; a brief statement of audio results (or their absence) would improve clarity.
- [Introduction] Notation: the term 'adversarial hub' is introduced without a formal definition distinguishing it from a standard adversarial example; a short definitional paragraph early in the paper would help.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive report, and for recognizing the quantitative strength of the generalization result and the Pinecone evaluation. We address each major comment below and will revise the manuscript accordingly to clarify the threat model and experimental details.
read point-by-point responses
-
Referee: [Method] Method (optimization procedure): the described procedure optimizes directly against the multi-modal embedding function, implying white-box or gradient access. This directly contradicts the claim of applicability to black-box deployed systems such as Pinecone; no surrogate-model transfer, query-only optimization, or API-only attack is described or evaluated.
Authors: We agree that the optimization procedure requires direct (white-box) access to the embedding function and its gradients. The manuscript does not describe or evaluate any black-box, query-only, or surrogate-based method. We will revise the text to explicitly state the white-box threat model for hub generation and to remove any implication of black-box applicability to systems such as Pinecone. revision: yes
-
Referee: [Experiments] Experiments (Pinecone evaluation): the reported success on Pinecone is presented as evidence of real-world applicability, yet the manuscript provides no details on how the adversarial hub was generated or inserted without direct embedding access. If the numbers were obtained with white-box access to the underlying model, they do not support the black-box attack claim.
Authors: The Pinecone results were obtained by first generating the adversarial hub with white-box access to the underlying embedding model and then inserting the resulting vector into the Pinecone index. We will add a dedicated paragraph in the revised manuscript that (i) states the white-box assumption for generation and (ii) clarifies that the Pinecone experiment demonstrates the retrieval impact once an adversarial item has been inserted into a production vector database, rather than claiming a black-box generation procedure. revision: yes
-
Referee: [Experiments] Evaluation (held-out test queries): the 21,000/25,000 top-1 figure is load-bearing for the generalization claim, yet it is unclear whether the 100 optimization queries were drawn from the same distribution as the test set or whether any data leakage occurred; this must be clarified with explicit train/test splits and query provenance.
Authors: The 100 optimization queries were sampled from the training split of the underlying dataset and are disjoint from the 25,000 queries in the standard held-out test split. We will add an explicit statement of the train/test split provenance and confirm the absence of overlap in the revised experimental section. revision: yes
Circularity Check
No circularity: empirical evaluation on held-out queries
full rationale
The paper's central results consist of direct empirical measurements: an optimization procedure produces a hub that is then counted as top-1 for >21k/25k held-out test queries, contrasted with natural-hub baselines. No equations or claims reduce a derived quantity to its own fitted inputs by construction, no load-bearing self-citations are invoked to establish uniqueness or correctness, and the method is presented as a standard optimization evaluated on external benchmarks and Pinecone. The derivation chain is therefore self-contained against external data.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Hubness is a phenomenon in high-dimensional vector spaces where a point from the natural distribution is unusually close to many other points.
invented entities (1)
-
adversarial hub
no independent evidence
Forward citations
Cited by 1 Pith paper
-
One Single Hub Text Breaks CLIP: Identifying Vulnerabilities in Cross-Modal Encoders via Hubness
A single hub text can unreasonably match many images in CLIP-based similarity, exposing vulnerabilities in cross-modal encoders for caption evaluation and retrieval.
Reference graph
Works this paper leans on
-
[1]
Square Attack: a query- efficient black-box adversarial attack via random search
Maksym Andriushchenko, Francesco Croce, Nicolas Flammarion, and Matthias Hein. Square Attack: a query- efficient black-box adversarial attack via random search. In European Conference on Computer Vision (ECCV), 2020
work page 2020
-
[2]
Introducing the next generation of Claude
Anthropic. Introducing the next generation of Claude. https://www.anthropic.com/news/ claude-3-family, March 2024. Accessed: August 24, 2025
work page 2024
-
[3]
Anish Athalye, Nicholas Carlini, and David Wagner. Ob- fuscated gradients give a false sense of security: Circum- venting defenses to adversarial examples. In Interna- tional Conference on Machine Learning (ICML), pages 274–283. PMLR, 2018
work page 2018
-
[4]
Kobus Barnard, Pinar Duygulu, David Forsyth, Nando De Freitas, David M Blei, and Michael I Jordan. Match- ing words and pictures. The Journal of Machine Learn- ing Research, 3:1107–1135, 2003. 2https://github.com/Tingwei-Zhang/adv_hub 13
work page 2003
-
[5]
Cross modal retrieval with Querybank normalisation
Simion-Vlad Bogolin, Ioana Croitoru, Hailin Jin, Yang Liu, and Samuel Albanie. Cross modal retrieval with Querybank normalisation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022
work page 2022
-
[6]
Tom B Brown, Dandelion Mané, Aurko Roy, Martín Abadi, and Justin Gilmer. Adversarial patch. arXiv:1712.09665, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[7]
Poisoning web-scale training datasets is practi- cal
Nicholas Carlini, Matthew Jagielski, Christopher A Choquette-Choo, Daniel Paleka, Will Pearce, Hyrum Anderson, Andreas Terzis, Kurt Thomas, and Florian Tramèr. Poisoning web-scale training datasets is practi- cal. In IEEE Symposium on Security and Privacy (S&P), pages 407–425, 2024
work page 2024
-
[8]
Adversarial examples are not easily detected: Bypassing ten detection methods
Nicholas Carlini, Milad Nasr, Christopher A. Choquette- Choo, Matthew Jagielski, Irena Gao, Anas Awadalla, Pang Wei Koh, Daphne Ippolito, Katherine Lee, Florian Tramer, and Ludwig Schmidt. Are aligned neural net- works adversarially aligned? arXiv:2306.15447, 2023
-
[9]
Towards evaluating the robustness of neural networks
Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In IEEE Symposium on Security and Privacy (S&P), pages 39–57, 2017
work page 2017
-
[10]
Audio adversarial examples: Targeted attacks on speech-to-text
Nicholas Carlini and David Wagner. Audio adversarial examples: Targeted attacks on speech-to-text. In IEEE Symposium on Security and Privacy Workshops, 2018
work page 2018
-
[11]
Phantom: General trigger attacks on retrieval augmented language generation,
Harsh Chaudhari, Giorgio Severi, John Abascal, Matthew Jagielski, Christopher A Choquette-Choo, Mi- lad Nasr, Cristina Nita-Rotaru, and Alina Oprea. Phan- tom: General trigger attacks on retrieval augmented lan- guage generation. arXiv:2405.20485, 2024
-
[12]
Microsoft COCO Captions: Data Collection and Evaluation Server
Xinlei Chen, Hao Fang, Tsung-Yi Lin, Ramakrishna Vedantam, Saurabh Gupta, Piotr Dollár, and C Lawrence Zitnick. Microsoft COCO captions: Data collection and evaluation server. arXiv:1504.00325, 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[13]
Reproducible scaling laws for contrastive language- image learning
Mehdi Cherti, Romain Beaumont, Ross Wightman, Mitchell Wortsman, Gabriel Ilharco, Cade Gordon, Christoph Schuhmann, Ludwig Schmidt, and Jenia Jit- sev. Reproducible scaling laws for contrastive language- image learning. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023
work page 2023
-
[14]
Nearest neighbor normalization improves multimodal retrieval
Neil Chowdhury, Franklin Wang, Sumedh Shenoy, Douwe Kiela, Sarah Schwettmann, and Tristan Thrush. Nearest neighbor normalization improves multimodal retrieval. In Conference on Empirical Methods in Natu- ral Language Processing (EMNLP), 2024
work page 2024
-
[15]
Google Cloud. Embeddings apis overview. https://cloud.google.com/vertex-ai/ generative-ai/docs/embeddings, 2025. Ac- cessed: 2025-04-09
work page 2025
-
[16]
Cer- tified adversarial robustness via randomized smooth- ing
Jeremy Cohen, Elan Rosenfeld, and Zico Kolter. Cer- tified adversarial robustness via randomized smooth- ing. In International Conference on Machine Learning (ICML), pages 1310–1320. PMLR, 2019
work page 2019
-
[17]
Word transla- tion without parallel data
Alexis Conneau, Guillaume Lample, Marc’Aurelio Ran- zato, Ludovic Denoyer, and Hervé Jégou. Word transla- tion without parallel data. In International Conference on Learning Representations (ICLR), 2018
work page 2018
-
[18]
Improving zero-shot learning by mitigating the hubness problem
Georgiana Dinu, Angeliki Lazaridou, and Marco Baroni. Improving zero-shot learning by mitigating the hubness problem. arXiv:1412.6568, 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[19]
How robust is Google’s Bard to adversarial image attacks? arXiv:2309.11751, 2023
Yinpeng Dong, Huanran Chen, Jiawei Chen, Zhengwei Fang, Xiao Yang, Yichi Zhang, Yu Tian, Hang Su, and Jun Zhu. How robust is Google’s Bard to adversarial image attacks? arXiv:2309.11751, 2023
-
[20]
Adversarial attacks to multi-modal models
Zhihao Dou, Xin Hu, Haibo Yang, Zhuqing Liu, and Minghong Fang. Adversarial attacks to multi-modal models. arXiv:2409.06793, 2024
-
[21]
Hubness as a case of technical algo- rithmic bias in music recommendation
Arthur Flexer, Monika Dörfler, Jan Schlüter, and Thomas Grill. Hubness as a case of technical algo- rithmic bias in music recommendation. In IEEE In- ternational Conference on Data Mining Workshops (ICDMW), 2018
work page 2018
-
[22]
Adversarial robustness for visual ground- ing of multimodal large language models
Kuofeng Gao, Yang Bai, Jiawang Bai, Yong Yang, and Shu-Tao Xia. Adversarial robustness for visual ground- ing of multimodal large language models. InICLR Work- shop on Reliable and Responsible Foundation Models, 2024
work page 2024
-
[23]
Gemini: A Family of Highly Capable Multimodal Models
Gemini Team, Google. Gemini: A family of highly capable multimodal models. arXiv preprint arXiv:2312.11805, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[24]
ImageBind: One embedding space to bind them all
Rohit Girdhar, Alaaeldin El-Nouby, Zhuang Liu, Man- nat Singh, Kalyan Vasudev Alwala, Armand Joulin, and Ishan Misra. ImageBind: One embedding space to bind them all. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023
work page 2023
-
[25]
Explaining and harnessing adversarial exam- ples
Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial exam- ples. In International Conference on Learning Repre- sentations (ICLR), 2015
work page 2015
-
[26]
On the effectiveness of interval bound propagation for training verifiably robust models
Sven Gowal, Krishnamurthy Dvijotham, Robert Stan- forth, Rudy Bunel, Chongli Qin, Jonathan Uesato, Relja Arandjelovic, Timothy A Mann, and Pushmeet Kohli. On the effectiveness of interval bound propagation for training verifiably robust models. arXiv:1810.12715, 2018. 14
-
[27]
Countering Adversarial Images using Input Transformations
C Guo, M Rana, M Cisse, and L Van Der Maaten. Coun- tering adversarial images using input transformations. arXiv:1711.00117, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[28]
AudioCLIP: Extending CLIP to image, text and audio
Andrey Guzhov, Federico Raue, Jörn Hees, and Andreas Dengel. AudioCLIP: Extending CLIP to image, text and audio. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022
work page 2022
-
[29]
Localized centering: Reducing hubness in large-sample data
Kazuo Hara, Ikumi Suzuki, Masashi Shimbo, Kei Kobayashi, Kenji Fukumizu, and Miloš Radovanovi´c. Localized centering: Reducing hubness in large-sample data. In Proceedings of the AAAI Conference on Artifi- cial Intelligence (AAAI), 2015
work page 2015
-
[30]
Adversarial example defense: Ensem- bles of weak defenses are not strong
Warren He, James Wei, Xinyun Chen, Nicholas Carlini, and Dawn Song. Adversarial example defense: Ensem- bles of weak defenses are not strong. In 11th USENIX workshop on offensive technologies (WOOT), 2017
work page 2017
-
[31]
Defending a music recommender against hubness-based adversarial attacks
Katharina Hoedt, Arthur Flexer, and Gerhard Widmer. Defending a music recommender against hubness-based adversarial attacks. In Proceedings of the 19th Sound and Music Computing Conference (SMC), 2022
work page 2022
-
[32]
Deceiving Google's Perspective API Built for Detecting Toxic Comments
Hossein Hosseini, Sreeram Kannan, Baosen Zhang, and Radha Poovendran. Deceiving Google’s per- spective API built for detecting toxic comments. arXiv:1702.08138, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[33]
Subpopula- tion data poisoning attacks
Matthew Jagielski, Giorgio Severi, Niklas Pousette Harger, and Alina Oprea. Subpopula- tion data poisoning attacks. In ACM SIGSAC Conference on Computer and Communications Security (CCS), pages 3104–3122, 2021
work page 2021
-
[34]
A contextual dissimilarity measure for accurate and effi- cient image search
Herve Jegou, Hedi Harzallah, and Cordelia Schmid. A contextual dissimilarity measure for accurate and effi- cient image search. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2007
work page 2007
-
[35]
Automatic image annotation and retrieval using cross- media relevance models
Jiwoon Jeon, Victor Lavrenko, and Raghavan Manmatha. Automatic image annotation and retrieval using cross- media relevance models. In 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2003
work page 2003
-
[36]
Adversarial examples for evaluating reading comprehension systems
Robin Jia and Percy Liang. Adversarial examples for evaluating reading comprehension systems. In Con- ference on Empirical Methods in Natural Language Processing (EMNLP), 2017
work page 2017
-
[37]
Auguste Kerckhoffs. La cryptographie militaire. BoD– Books on Demand, 2023
work page 2023
-
[38]
AudioCaps: Generating captions for audios in the wild
Chris Dongjoo Kim, Byeongchang Kim, Hyunmin Lee, and Gunhee Kim. AudioCaps: Generating captions for audios in the wild. In Conference of the North American Chapter of the Association for Computational Linguis- tics: Human Language Technologies (NAACL), 2019
work page 2019
-
[39]
Ad- versarial self-supervised contrastive learning
Minseon Kim, Jihoon Tack, and Sung Ju Hwang. Ad- versarial self-supervised contrastive learning. In Annual Conference on Neural Information Processing Systems (NeurIPS), 2020
work page 2020
-
[40]
Learning multiple layers of fea- tures from tiny images
Alex Krizhevsky. Learning multiple layers of fea- tures from tiny images. Technical report, University of Toronto, 2009
work page 2009
-
[41]
Hyun Kwon, Yongchul Kim, Ki-Woong Park, Hyunsoo Yoon, and Daeseon Choi. Multi-targeted adversarial example in evasion attack on deep neural network.IEEE Access, 6:46084–46096, 2018
work page 2018
-
[42]
Hubness and pollution: Delving into cross-space map- ping for zero-shot learning
Angeliki Lazaridou, Georgiana Dinu, and Marco Baroni. Hubness and pollution: Delving into cross-space map- ping for zero-shot learning. In 53rd Annual Meeting of the Association for Computational Linguistics (ACL) , 2015
work page 2015
-
[43]
HAL: Improved text-image matching by mitigating visual semantic hubs
Fangyu Liu, Rongtian Ye, Xun Wang, and Shuaipeng Li. HAL: Improved text-image matching by mitigating visual semantic hubs. In AAAI Conference on Artificial Intelligence (AAAI), 2020
work page 2020
-
[44]
Delving into transferable adversarial examples and black-box attacks
Yanpei Liu, Xinyun Chen, Chang Liu, and Dawn Song. Delving into transferable adversarial examples and black-box attacks. In International Conference on Learning Representations (ICLR), 2017
work page 2017
-
[45]
Feature distillation: Dnn- oriented jpeg compression against adversarial examples
Zihao Liu, Qi Liu, Tao Liu, Nuo Xu, Xue Lin, Yanzhi Wang, and Wujie Wen. Feature distillation: Dnn- oriented jpeg compression against adversarial examples. In IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR), 2019
work page 2019
-
[46]
Thomas Low, Christian Borgelt, Sebastian Stober, and Andreas Nürnberger. The hubness phenomenon: Fact or artifact? Towards Advanced Data Analysis by Com- bining Soft Computing and Statistics , pages 267–278, 2013
work page 2013
-
[47]
Distinctive image features from scale- invariant keypoints
David G Lowe. Distinctive image features from scale- invariant keypoints. International Journal of Computer Vision, 60:91–110, 2004
work page 2004
-
[48]
Towards deep learning models resistant to adversarial attacks
Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations (ICLR), 2018
work page 2018
-
[49]
Universal adversarial perturbations
Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. Universal adversarial perturbations. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017. 15
work page 2017
-
[50]
Representation Learning with Contrastive Predictive Coding
Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Rep- resentation learning with contrastive predictive coding. arXiv:1807.03748, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[51]
OpenAI. Gpt-4 technical report. https://openai. com/research/gpt-4, 2023. Accessed: 2025-04-09
work page 2023
-
[52]
OpenAI. Hello GPT-4o. https://openai.com/ index/hello-gpt-4o/, May 2024. Accessed: Au- gust 24, 2025
work page 2024
-
[53]
Speechguard: Exploring the adversarial robustness of multimodal large language models, 2024
Raghuveer Peri, Sai Muralidhar Jayanthi, Srikanth Ro- nanki, Anshu Bhatia, Karel Mundnich, Saket Dingliwal, Nilaksh Das, Zejiang Hou, Goeric Huybrechts, Srikanth Vishnubhotla, et al. SpeechGuard: Exploring the adver- sarial robustness of multimodal large language models. arXiv:2405.08317, 2024
-
[54]
Pinecone - vector database for machine learn- ing
Pinecone. Pinecone - vector database for machine learn- ing. https://www.pinecone.io, 2024. Accessed: 2024-11-12
work page 2024
-
[55]
Visual adversarial examples jailbreak aligned large language models
Xiangyu Qi, Kaixuan Huang, Ashwinee Panda, Mengdi Wang, and Prateek Mittal. Visual adversarial examples jailbreak aligned large language models. In Interna- tional Conference on Machine Learning (ICML) Work- shop on New Frontiers in Adversarial Machine Learn- ing, 2023
work page 2023
-
[56]
Learning transferable visual models from natural lan- guage supervision
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sas- try, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural lan- guage supervision. In International Conference on Ma- chine Learning (ICML), 2021
work page 2021
-
[57]
Hubs in space: Popular nearest neighbors in high-dimensional data
Milos Radovanovic, Alexandros Nanopoulos, and Mir- jana Ivanovic. Hubs in space: Popular nearest neighbors in high-dimensional data. Journal of Machine Learning Research (JMLR), 11(9):2487–2531, 2010
work page 2010
-
[58]
Term- weighting approaches in automatic text retrieval
Gerard Salton and Christopher Buckley. Term- weighting approaches in automatic text retrieval. In- formation Processing & Management, 24(5):513–523, 1988
work page 1988
-
[59]
Learning cross-modal embeddings for cooking recipes and food images
Amaia Salvador, Nicholas Hynes, Yusuf Aytar, Javier Marin, Ferda Ofli, Ingmar Weber, and Antonio Torralba. Learning cross-modal embeddings for cooking recipes and food images. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017
work page 2017
-
[60]
Local and global scaling reduce hubs in space
Dominik Schnitzer, Arthur Flexer, Markus Schedl, and Gerhard Widmer. Local and global scaling reduce hubs in space. Journal of Machine Learning Research (JMLR), 13:2871–2902, 2012
work page 2012
-
[61]
FaceNet: A unified embedding for face recog- nition and clustering
Florian Schro ff, Dmitry Kalenichenko, and James Philbin. FaceNet: A unified embedding for face recog- nition and clustering. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015
work page 2015
-
[62]
Poison frogs! Targeted clean-label poisoning at- tacks on neural networks
Ali Shafahi, W Ronny Huang, Mahyar Najibi, Octavian Suciu, Christoph Studer, Tudor Dumitras, and Tom Gold- stein. Poison frogs! Targeted clean-label poisoning at- tacks on neural networks. In Advances in Neural Infor- mation Processing Systems (NIPS), 2018
work page 2018
-
[63]
Machine against the RAG: Jamming retrieval- augmented generation with blocker documents
Avital Shafran, Roei Schuster, and Vitaly Shmatikov. Machine against the RAG: Jamming retrieval- augmented generation with blocker documents. In USENIX Security Symposium, 2024
work page 2024
-
[64]
Plug and Pray: Exploiting off-the-shelf components of multi-modal models
Erfan Shayegani, Yue Dong, and Nael Abu-Ghazaleh. Plug and Pray: Exploiting off-the-shelf components of multi-modal models. arXiv:2307.14539, 2023
-
[65]
JPEG-resistant adver- sarial images
Richard Shin and Dawn Song. JPEG-resistant adver- sarial images. In NIPS 2017 Workshop on Machine Learning and Computer Security, 2017
work page 2017
-
[66]
Offline bilingual word vectors, orthogonal transformations and the inverted softmax
Samuel L Smith, David HP Turban, Steven Hamblin, and Nils Y Hammerla. Offline bilingual word vectors, orthogonal transformations and the inverted softmax. In International Conference on Learning Representations (ICLR), 2017
work page 2017
-
[67]
Spotify for Artists. Artificial streaming, 2023. Accessed: 2024-11-13
work page 2023
-
[68]
Hybrid batch attacks: Finding black-box adversarial ex- amples with limited queries
Fnu Suya, Jianfeng Chi, David Evans, and Yuan Tian. Hybrid batch attacks: Finding black-box adversarial ex- amples with limited queries. In USENIX Security Sym- posium, pages 1327–1344, 2020
work page 2020
-
[69]
Model-targeted poisoning attacks with provable convergence
Fnu Suya, Saeed Mahloujifar, Anshuman Suri, David Evans, and Yuan Tian. Model-targeted poisoning attacks with provable convergence. In International Confer- ence on Machine Learning (ICML), pages 10000–10010. PMLR, 2021
work page 2021
-
[70]
Investigating the e ffec- tiveness of laplacian-based kernels in hub reduction
Ikumi Suzuki, Kazuo Hara, Masashi Shimbo, Yuji Mat- sumoto, and Marco Saerens. Investigating the e ffec- tiveness of laplacian-based kernels in hub reduction. In AAAI Conference on Artificial Intelligence (AAAI), 2012
work page 2012
-
[71]
Centering similarity mea- sures to reduce hubs
Ikumi Suzuki, Kazuo Hara, Masashi Shimbo, Marco Saerens, and Kenji Fukumizu. Centering similarity mea- sures to reduce hubs. In Conference on Empirical Meth- ods in Natural Language Processing (EMNLP), 2013
work page 2013
-
[72]
Intriguing properties of neural networks
C Szegedy. Intriguing properties of neural networks. In International Conference on Learning Representations (ICLR), 2014. 16
work page 2014
-
[73]
Adversarial training and robustness for multiple perturbations
Florian Tramer and Dan Boneh. Adversarial training and robustness for multiple perturbations. Advances in Neural Information Processing Systems (NIPS), 2019
work page 2019
-
[74]
Caltech-UCSD Birds-200- 2011 dataset
Catherine Wah, Steve Branson, Peter Welinder, Pietro Perona, and Serge Belongie. Caltech-UCSD Birds-200- 2011 dataset. 2011
work page 2011
-
[75]
Adversarial cross-modal retrieval
Bokun Wang, Yang Yang, Xing Xu, Alan Hanjalic, and Heng Tao Shen. Adversarial cross-modal retrieval. In 25th ACM International Conference on Multimedia (ACM MM), 2017
work page 2017
-
[76]
A Comprehensive Survey on Cross-modal Retrieval
Kaiye Wang, Qiyue Yin, Wei Wang, Shu Wu, and Liang Wang. A comprehensive survey on cross-modal retrieval. arXiv:1607.06215, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[77]
Tianshi Wang, Fengling Li, Lei Zhu, Jingjing Li, Zheng Zhang, and Heng Tao Shen. Cross-modal retrieval: a systematic review of methods and future directions.Pro- ceedings of the IEEE, 112(11):1716–1754, 2024
work page 2024
-
[78]
Balance Act: Mitigating hubness in cross-modal retrieval with query and gallery banks
Yimu Wang, Xiangru Jian, and Bo Xue. Balance Act: Mitigating hubness in cross-modal retrieval with query and gallery banks. In Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
work page 2023
-
[79]
Provable defenses against adversarial examples via the convex outer adversarial polytope
Eric Wong and Zico Kolter. Provable defenses against adversarial examples via the convex outer adversarial polytope. In International Conference on Machine Learning (ICML), pages 5286–5295. PMLR, 2018
work page 2018
-
[80]
Adversarial at- tacks on multimodal agents
Chen Henry Wu, Jing Yu Koh, Ruslan Salakhutdinov, Daniel Fried, and Aditi Raghunathan. Adversarial at- tacks on multimodal agents. arXiv:2406.12814, 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.