Text embeddings reveal (almost) as much as text

John Morris, Volodymyr Kuleshov, Vitaly Shmatikov, Alexander Rush · 2023 · DOI 10.18653/v1/2023.emnlp-main.765

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

open at publisher browse 4 citing papers

representative citing papers

cs.CR · 2026-07-01 · unverdicted · novelty 7.0

Tailored queries enable identification of the embedding model used by a black-box IR system from the unordered set of retrieved documents, even when a reranker is present.

PRISM: Recovering Instruction Sets from Language Model Activations

cs.AI · 2026-06-08 · unverdicted · novelty 7.0

PRISM is a new activation-conditioned model that recovers full sets of simultaneous instructions from LLM hidden states via judge-guided GRPO training and outperforms prior activation-to-language methods on security-relevant tasks.

One Single Hub Text Breaks CLIP: Identifying Vulnerabilities in Cross-Modal Encoders via Hubness

cs.CL · 2026-04-30 · unverdicted · novelty 5.0

A single hub text can unreasonably match many images in CLIP-based similarity, exposing vulnerabilities in cross-modal encoders for caption evaluation and retrieval.

Do Activation Verbalization Methods Convey Privileged Information?

cs.CL · 2025-09-16 · unverdicted · novelty 5.0

Activation verbalization methods for LLMs largely reflect the verbalizer model's parametric knowledge rather than privileged information from the target model's activations.

citing papers explorer

Showing 4 of 4 citing papers after filters.

Embedding Inference Attack cs.CR · 2026-07-01 · unverdicted · none · ref 27
Tailored queries enable identification of the embedding model used by a black-box IR system from the unordered set of retrieved documents, even when a reranker is present.
PRISM: Recovering Instruction Sets from Language Model Activations cs.AI · 2026-06-08 · unverdicted · none · ref 40
PRISM is a new activation-conditioned model that recovers full sets of simultaneous instructions from LLM hidden states via judge-guided GRPO training and outperforms prior activation-to-language methods on security-relevant tasks.
One Single Hub Text Breaks CLIP: Identifying Vulnerabilities in Cross-Modal Encoders via Hubness cs.CL · 2026-04-30 · unverdicted · none · ref 33
A single hub text can unreasonably match many images in CLIP-based similarity, exposing vulnerabilities in cross-modal encoders for caption evaluation and retrieval.
Do Activation Verbalization Methods Convey Privileged Information? cs.CL · 2025-09-16 · unverdicted · none · ref 36
Activation verbalization methods for LLMs largely reflect the verbalizer model's parametric knowledge rather than privileged information from the target model's activations.

Text embeddings reveal (almost) as much as text

fields

years

verdicts

representative citing papers

citing papers explorer