Omnimodal LLMs encode premise-perception mismatches in hidden states yet almost never reject false textual claims, exposing a representation-action gap that is modality-asymmetric and prompt-resistant.
Truthfulqa: Measuring how models mimic human falsehoods
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
SKOP uses key-orthogonal projections to steer LLM activations while preserving attention patterns on focus tokens, cutting utility degradation by 5-7x and retaining over 95% of standard steering efficacy.
citing papers explorer
-
Senses Wide Shut: A Representation-Action Gap in Omnimodal LLMs
Omnimodal LLMs encode premise-perception mismatches in hidden states yet almost never reject false textual claims, exposing a representation-action gap that is modality-asymmetric and prompt-resistant.
-
Don't Lose Focus: Activation Steering via Key-Orthogonal Projections
SKOP uses key-orthogonal projections to steer LLM activations while preserving attention patterns on focus tokens, cutting utility degradation by 5-7x and retaining over 95% of standard steering efficacy.