Why representation engineering works: A theoretical and empirical study in vision-language models

Out-of-distribution detection with deep nearest neighbors · 2022 · arXiv 2503.22720

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

cs.LG · 2026-05-14 · unverdicted · novelty 7.0

Dynamic Latent Routing jointly learns discrete latent codes, routing policies, and model parameters via dynamic search to match or exceed supervised fine-tuning by 6.6 points on average in low-data settings across four datasets and six models.

Why MLLMs Struggle to Determine Object Orientations

cs.CV · 2026-04-14 · accept · novelty 7.0

Orientation information is recoverable from MLLM visual encoder embeddings via linear regression, contradicting the hypothesis that failures originate in the encoders.

Rethinking Jailbreak Detection of Large Vision Language Models with Representational Contrastive Scoring

cs.CR · 2025-12-12 · unverdicted · novelty 6.0

RCS learns projections on LVLM internal representations to produce contrastive scores that separate malicious jailbreaks from benign inputs, with MCD and KCD variants claiming SOTA generalization to unseen attacks.

citing papers explorer

Showing 3 of 3 citing papers.

Dynamic Latent Routing cs.LG · 2026-05-14 · unverdicted · none · ref 48
Dynamic Latent Routing jointly learns discrete latent codes, routing policies, and model parameters via dynamic search to match or exceed supervised fine-tuning by 6.6 points on average in low-data settings across four datasets and six models.
Why MLLMs Struggle to Determine Object Orientations cs.CV · 2026-04-14 · accept · none · ref 31
Orientation information is recoverable from MLLM visual encoder embeddings via linear regression, contradicting the hypothesis that failures originate in the encoders.
Rethinking Jailbreak Detection of Large Vision Language Models with Representational Contrastive Scoring cs.CR · 2025-12-12 · unverdicted · none · ref 17
RCS learns projections on LVLM internal representations to produce contrastive scores that separate malicious jailbreaks from benign inputs, with MCD and KCD variants claiming SOTA generalization to unseen attacks.

Why representation engineering works: A theoretical and empirical study in vision-language models

fields

years

verdicts

representative citing papers

citing papers explorer