pith. sign in

arxiv: 2508.17320 · v3 · pith:ZEHD5TIYnew · submitted 2025-08-24 · 💻 cs.LG

AdaptiveK: Complexity-Driven Sparse Autoencoders for Interpretable Language Model Representations

classification 💻 cs.LG
keywords adaptivekautoencoderscomplexitylanguagerepresentationssparseapproachescomplexity-driven
0
0 comments X
read the original abstract

Understanding the internal representations of large language models (LLMs) remains a central challenge for interpretability research. Sparse autoencoders (SAEs) offer a promising solution by decomposing activations into interpretable features, but existing approaches rely on fixed sparsity constraints that fail to account for input complexity. We propose AdaptiveK SAE (Adaptive Top K Sparse Autoencoders), a novel framework that dynamically adjusts sparsity levels based on the semantic complexity of each input. Leveraging linear probes, we demonstrate that context complexity is linearly encoded in LLM representations, and we use this signal to guide feature allocation during training. Experiments across ten language models demonstrate that this complexity-driven adaptation outperforms fixed-sparsity approaches on reconstruction fidelity, explained variance, cosine similarity and interpretability metrics while eliminating the burden of extensive hyperparameter tuning. Our code is available at: https://github.com/hiyukie/adaptiveK.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Beyond the Hard Budget: Sparsity Regularizers for More Interpretable Top-k Sparse Autoencoders

    cs.LG 2026-06 unverdicted novelty 7.0

    Sparsity regularizers applied before Top-k selection in SAEs improve monosemanticity and make reconstruction robust to inference-time k across vision models and datasets.