hub Canonical reference

In-context vectors: Making in context learning more effective and controllable through latent space steering

· 2023 · arXiv 2311.06668

Canonical reference. 80% of citing Pith papers cite this work as background.

11 Pith papers citing it

Background 80% of classified citations

read on arXiv browse 11 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 4 method 1

citation-polarity summary

background 4 use method 1

representative citing papers

Steerable but Not Decodable: Function Vectors Operate Beyond the Logit Lens

cs.LG · 2026-04-03 · accept · novelty 8.0

Function vectors steer LLMs successfully where the logit lens fails to decode the target answer, showing the two properties come apart.

Prompt-Activation Duality: Improving Activation Steering via Attention-Level Interventions

cs.CL · 2026-05-11 · unverdicted · novelty 7.0 · 2 refs

GCAD reduces coherence drift from -18.6 to -1.9 and raises turn-10 trait expression from 78.0 to 93.1 in persona-steering tasks by using gated attention-delta interventions from system prompts.

Exploring Language-Agnosticity in Function Vectors: A Case Study in Machine Translation

cs.CL · 2026-04-21 · unverdicted · novelty 7.0

Translation function vectors extracted from English to one target language improve correct token ranking for translations to multiple other unseen target languages in decoder-only multilingual LLMs.

CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation

cs.CL · 2025-02-28 · unverdicted · novelty 7.0

CODI compresses explicit CoT into continuous space via self-distillation and is the first implicit method to match explicit CoT performance on GSM8k at GPT-2 scale with 3.1x compression and 28.2% higher accuracy than prior implicit approaches.

Steering Language Models With Activation Engineering

cs.CL · 2023-08-20 · unverdicted · novelty 7.0

Activation Addition steers language models by adding contrastive activation vectors from prompt pairs to control high-level properties like sentiment and toxicity at inference time without training.

Dictionary-Aligned Concept Control for Safeguarding Multimodal LLMs

cs.LG · 2026-04-10 · unverdicted · novelty 6.0

DACO curates a 15,000-concept dictionary from 400K image-caption pairs and uses it to initialize an SAE that enables granular, concept-specific steering of MLLM activations, raising safety scores on MM-SafetyBench and JailBreakV while preserving general capabilities.

The Master Key Hypothesis: Unlocking Cross-Model Capability Transfer via Linear Subspace Alignment

cs.LG · 2026-04-07 · unverdicted · novelty 6.0

The Master Key Hypothesis states that capabilities are low-dimensional directions transferable across models through linear subspace alignment, with UNLOCK demonstrating gains such as 12.1% accuracy improvement on MATH when transferring CoT from 14B to 7B models.

Localizing Task Recognition and Task Learning in In-Context Learning via Attention Head Analysis

cs.CL · 2025-09-29 · unverdicted · novelty 6.0

A new framework using Task Subspace Logit Attribution localizes attention heads specialized for task recognition and task learning in in-context learning, showing they align and rotate hidden states within a task subspace.

Disentangled Safety Adapters Enable Efficient Guardrails and Flexible Inference-Time Alignment

cs.LG · 2025-05-30 · unverdicted · novelty 6.0

Disentangled Safety Adapters decouple safety computations from task-optimized LLMs via lightweight adapters, yielding up to 53% better AUC on safety tasks and dynamic inference-time alignment with reduced performance trade-offs.

Steering Llama 2 via Contrastive Activation Addition

cs.CL · 2023-12-09 · unverdicted · novelty 6.0

Contrastive Activation Addition steers Llama 2 Chat by adding averaged residual-stream activation differences from contrastive example pairs to control targeted behaviors at inference time.

MemOS: A Memory OS for AI System

cs.CL · 2025-07-04 · unverdicted · novelty 5.0

MemOS introduces a unified memory management framework for LLMs using MemCubes to handle and evolve different memory types for improved controllability and evolvability.

citing papers explorer

Showing 11 of 11 citing papers.

Steerable but Not Decodable: Function Vectors Operate Beyond the Logit Lens cs.LG · 2026-04-03 · accept · none · ref 14
Function vectors steer LLMs successfully where the logit lens fails to decode the target answer, showing the two properties come apart.
Prompt-Activation Duality: Improving Activation Steering via Attention-Level Interventions cs.CL · 2026-05-11 · unverdicted · none · ref 11 · 2 links
GCAD reduces coherence drift from -18.6 to -1.9 and raises turn-10 trait expression from 78.0 to 93.1 in persona-steering tasks by using gated attention-delta interventions from system prompts.
Exploring Language-Agnosticity in Function Vectors: A Case Study in Machine Translation cs.CL · 2026-04-21 · unverdicted · none · ref 24
Translation function vectors extracted from English to one target language improve correct token ranking for translations to multiple other unseen target languages in decoder-only multilingual LLMs.
CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation cs.CL · 2025-02-28 · unverdicted · none · ref 111
CODI compresses explicit CoT into continuous space via self-distillation and is the first implicit method to match explicit CoT performance on GSM8k at GPT-2 scale with 3.1x compression and 28.2% higher accuracy than prior implicit approaches.
Steering Language Models With Activation Engineering cs.CL · 2023-08-20 · unverdicted · none · ref 85
Activation Addition steers language models by adding contrastive activation vectors from prompt pairs to control high-level properties like sentiment and toxicity at inference time without training.
Dictionary-Aligned Concept Control for Safeguarding Multimodal LLMs cs.LG · 2026-04-10 · unverdicted · none · ref 49
DACO curates a 15,000-concept dictionary from 400K image-caption pairs and uses it to initialize an SAE that enables granular, concept-specific steering of MLLM activations, raising safety scores on MM-SafetyBench and JailBreakV while preserving general capabilities.
The Master Key Hypothesis: Unlocking Cross-Model Capability Transfer via Linear Subspace Alignment cs.LG · 2026-04-07 · unverdicted · none · ref 41
The Master Key Hypothesis states that capabilities are low-dimensional directions transferable across models through linear subspace alignment, with UNLOCK demonstrating gains such as 12.1% accuracy improvement on MATH when transferring CoT from 14B to 7B models.
Localizing Task Recognition and Task Learning in In-Context Learning via Attention Head Analysis cs.CL · 2025-09-29 · unverdicted · none · ref 23
A new framework using Task Subspace Logit Attribution localizes attention heads specialized for task recognition and task learning in in-context learning, showing they align and rotate hidden states within a task subspace.
Disentangled Safety Adapters Enable Efficient Guardrails and Flexible Inference-Time Alignment cs.LG · 2025-05-30 · unverdicted · none · ref 26
Disentangled Safety Adapters decouple safety computations from task-optimized LLMs via lightweight adapters, yielding up to 53% better AUC on safety tasks and dynamic inference-time alignment with reduced performance trade-offs.
Steering Llama 2 via Contrastive Activation Addition cs.CL · 2023-12-09 · unverdicted · none · ref 10
Contrastive Activation Addition steers Llama 2 Chat by adding averaged residual-stream activation differences from contrastive example pairs to control targeted behaviors at inference time.
MemOS: A Memory OS for AI System cs.CL · 2025-07-04 · unverdicted · none · ref 36
MemOS introduces a unified memory management framework for LLMs using MemCubes to handle and evolve different memory types for improved controllability and evolvability.

In-context vectors: Making in context learning more effective and controllable through latent space steering

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer