arXiv preprint arXiv:2506.18167 , year=

Constantin Venhoff, Iván Arcuschin, Philip Torr, Arthur Conmy, Neel Nanda · 2025 · arXiv 2506.18167

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

HyperTransport: Amortized Conditioning of T2I Generative Models

cs.LG · 2026-05-07 · unverdicted · novelty 7.0

HyperTransport amortizes activation steering for T2I models via a hypernetwork that predicts intervention parameters from CLIP embeddings, delivering 3600-7000x speedup and matching per-concept baselines on 167 unseen concepts.

How Do Answer Tokens Read Reasoning Traces? Self-Reading Patterns in Thinking LLMs for Quantitative Reasoning

cs.CL · 2026-04-21 · unverdicted · novelty 7.0

Answer tokens show forward drift and key-anchor focus when reading correct reasoning traces; a geometric-plus-semantic SRQ steering method boosts quantitative reasoning accuracy without training.

What Drives Representation Steering? A Mechanistic Case Study on Steering Refusal

cs.LG · 2026-04-09 · unverdicted · novelty 6.0

Steering vectors for refusal primarily modify the OV circuit in attention, ignore most of the QK circuit, and can be sparsified to 1-10% of dimensions while retaining performance.

The Master Key Hypothesis: Unlocking Cross-Model Capability Transfer via Linear Subspace Alignment

cs.LG · 2026-04-07 · unverdicted · novelty 6.0

The Master Key Hypothesis states that capabilities are low-dimensional directions transferable across models through linear subspace alignment, with UNLOCK demonstrating gains such as 12.1% accuracy improvement on MATH when transferring CoT from 14B to 7B models.

Neuroscience-Inspired Analyses of Visual Interestingness in Multimodal Transformers

cs.CV · 2026-05-05 · unverdicted · novelty 4.0

Human visual interestingness is linearly decodable from final-layer embeddings in Qwen3-VL-8B and becomes progressively more structured across vision and language layers without explicit supervision.

Can Aha Moments Be Fake? Towards Quantifying Decorative and True Thinking in Chain-of-Thought

cs.LG · 2025-10-28

citing papers explorer

Showing 6 of 6 citing papers.

HyperTransport: Amortized Conditioning of T2I Generative Models cs.LG · 2026-05-07 · unverdicted · none · ref 9
HyperTransport amortizes activation steering for T2I models via a hypernetwork that predicts intervention parameters from CLIP embeddings, delivering 3600-7000x speedup and matching per-concept baselines on 167 unseen concepts.
How Do Answer Tokens Read Reasoning Traces? Self-Reading Patterns in Thinking LLMs for Quantitative Reasoning cs.CL · 2026-04-21 · unverdicted · none · ref 6
Answer tokens show forward drift and key-anchor focus when reading correct reasoning traces; a geometric-plus-semantic SRQ steering method boosts quantitative reasoning accuracy without training.
What Drives Representation Steering? A Mechanistic Case Study on Steering Refusal cs.LG · 2026-04-09 · unverdicted · none · ref 39
Steering vectors for refusal primarily modify the OV circuit in attention, ignore most of the QK circuit, and can be sparsified to 1-10% of dimensions while retaining performance.
The Master Key Hypothesis: Unlocking Cross-Model Capability Transfer via Linear Subspace Alignment cs.LG · 2026-04-07 · unverdicted · none · ref 69
The Master Key Hypothesis states that capabilities are low-dimensional directions transferable across models through linear subspace alignment, with UNLOCK demonstrating gains such as 12.1% accuracy improvement on MATH when transferring CoT from 14B to 7B models.
Neuroscience-Inspired Analyses of Visual Interestingness in Multimodal Transformers cs.CV · 2026-05-05 · unverdicted · none · ref 48
Human visual interestingness is linearly decodable from final-layer embeddings in Qwen3-VL-8B and becomes progressively more structured across vision and language layers without explicit supervision.
Can Aha Moments Be Fake? Towards Quantifying Decorative and True Thinking in Chain-of-Thought cs.LG · 2025-10-28 · unreviewed · ref 29

arXiv preprint arXiv:2506.18167 , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer