Function vectors steer LLMs successfully where the logit lens fails to decode the target answer, showing the two properties come apart.
hub Canonical reference
In-context vectors: Making in context learning more effective and controllable through latent space steering
Canonical reference. 80% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
GCAD reduces coherence drift from -18.6 to -1.9 and raises turn-10 trait expression from 78.0 to 93.1 in persona-steering tasks by using gated attention-delta interventions from system prompts.
CODI compresses explicit CoT into continuous space via self-distillation and is the first implicit method to match explicit CoT performance on GSM8k at GPT-2 scale with 3.1x compression and 28.2% higher accuracy than prior implicit approaches.
Activation Addition steers language models by adding contrastive activation vectors from prompt pairs to control high-level properties like sentiment and toxicity at inference time without training.
Steering LLM residual streams with random sparse vectors creates detectable self-recognition fingerprints that enable over 98% accurate attribution of generated text to specific models without degrading output quality.
DIVE improves in-context vector distillation for medical report generation via decisive-token supervision on pathology terms and EOS plus state-conditioned dynamic steering, achieving top BLEU-4, ROUGE-L and RadGraph F1 on MIMIC-CXR and CheXpert Plus.
DACO curates a 15,000-concept dictionary from 400K image-caption pairs and uses it to initialize an SAE that enables granular, concept-specific steering of MLLM activations, raising safety scores on MM-SafetyBench and JailBreakV while preserving general capabilities.
The Master Key Hypothesis states that capabilities are low-dimensional directions transferable across models through linear subspace alignment, with UNLOCK demonstrating gains such as 12.1% accuracy improvement on MATH when transferring CoT from 14B to 7B models.
A new framework using Task Subspace Logit Attribution localizes attention heads specialized for task recognition and task learning in in-context learning, showing they align and rotate hidden states within a task subspace.
Disentangled Safety Adapters decouple safety computations from task-optimized LLMs via lightweight adapters, yielding up to 53% better AUC on safety tasks and dynamic inference-time alignment with reduced performance trade-offs.
Contrastive Activation Addition steers Llama 2 Chat by adding averaged residual-stream activation differences from contrastive example pairs to control targeted behaviors at inference time.
MemOS introduces a unified memory management framework for LLMs using MemCubes to handle and evolve different memory types for improved controllability and evolvability.
citing papers explorer
-
Steering Language Models With Activation Engineering
Activation Addition steers language models by adding contrastive activation vectors from prompt pairs to control high-level properties like sentiment and toxicity at inference time without training.
-
Steering Llama 2 via Contrastive Activation Addition
Contrastive Activation Addition steers Llama 2 Chat by adding averaged residual-stream activation differences from contrastive example pairs to control targeted behaviors at inference time.