Incorporating hierarchical semantics in sparse autoencoder architectures.arXiv preprint arXiv:2506.01197

Mark Muchane, Sean Richardson, Kiho Park, Victor Veitch · 2025 · arXiv 2506.01197

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

HH-SAE: Discovering and Steering Hierarchical Knowledge of Complex Manifolds

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

HH-SAE factorizes manifolds into nested contextual (L0), atomic (f1), and compository (f2) tiers, achieving 0.9156 cross-domain zero-shot AUC in fraud detection and +9.9% AUPRC lift in steered synthesis.

Tensor Product Representation Probes Reveal Shared Structure Across Linear Directions

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

Linear probes for Othello board states factor into tensor-product structure with square and color embeddings composed by a binding matrix, from which the linear probes can be directly recovered.

Dictionary-Aligned Concept Control for Safeguarding Multimodal LLMs

cs.LG · 2026-04-10 · unverdicted · novelty 6.0

DACO curates a 15,000-concept dictionary from 400K image-caption pairs and uses it to initialize an SAE that enables granular, concept-specific steering of MLLM activations, raising safety scores on MM-SafetyBench and JailBreakV while preserving general capabilities.

citing papers explorer

Showing 3 of 3 citing papers.

HH-SAE: Discovering and Steering Hierarchical Knowledge of Complex Manifolds cs.LG · 2026-05-11 · unverdicted · none · ref 14
HH-SAE factorizes manifolds into nested contextual (L0), atomic (f1), and compository (f2) tiers, achieving 0.9156 cross-domain zero-shot AUC in fraud detection and +9.9% AUPRC lift in steered synthesis.
Tensor Product Representation Probes Reveal Shared Structure Across Linear Directions cs.LG · 2026-05-11 · unverdicted · none · ref 16
Linear probes for Othello board states factor into tensor-product structure with square and color embeddings composed by a binding matrix, from which the linear probes can be directly recovered.
Dictionary-Aligned Concept Control for Safeguarding Multimodal LLMs cs.LG · 2026-04-10 · unverdicted · none · ref 71
DACO curates a 15,000-concept dictionary from 400K image-caption pairs and uses it to initialize an SAE that enables granular, concept-specific steering of MLLM activations, raising safety scores on MM-SafetyBench and JailBreakV while preserving general capabilities.

Incorporating hierarchical semantics in sparse autoencoder architectures.arXiv preprint arXiv:2506.01197

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer