DB-KSVD: Scalable Alternating Optimization for Disentangling High-Dimensional Embedding Spaces

Mykel J. Kochenderfer; Romeo Valentin; Sydney M. Katz; Vincent Vanhoucke

arxiv: 2505.18441 · v2 · submitted 2025-05-24 · 💻 cs.LG · cs.MS· stat.AP

DB-KSVD: Scalable Alternating Optimization for Disentangling High-Dimensional Embedding Spaces

Romeo Valentin , Sydney M. Katz , Vincent Vanhoucke , Mykel J. Kochenderfer This is my paper

Pith reviewed 2026-05-19 14:12 UTC · model grok-4.3

classification 💻 cs.LG cs.MSstat.AP

keywords dictionary learningKSVDsparse autoencodersmechanistic interpretabilityembedding disentanglementalternating optimizationtransformer embeddings

0 comments

The pith

A scaled adaptation of the classic KSVD algorithm matches sparse autoencoder performance when disentangling high-dimensional embeddings from language and vision models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Double-Batch KSVD as a way to perform dictionary learning on the large datasets required for interpreting modern transformer embeddings. It modifies the established KSVD procedure with batching to handle millions of samples and thousands of feature dimensions. When applied to embeddings from Gemma-2-2B, Pythia-160M, DINOv2-S, and DINOv2-B, the method produces results competitive with sparse autoencoders across six SAEBench metrics. A sympathetic reader would conclude from this equivalence that sparse autoencoders already locate strong solutions to the dictionary learning task and that older alternating-optimization techniques can be made practical at the needed scale.

Core claim

DB-KSVD adapts the classic KSVD algorithm by performing dictionary updates and sparse encoding steps on double batches, enabling it to process the high-dimensional, high-sample-size regimes of current embedding spaces. On text embeddings from Gemma-2-2B and Pythia-160M as well as image embeddings from DINOv2-S and DINOv2-B, DB-KSVD achieves competitive scores on the six SAEBench metrics relative to established sparse-autoencoder methods. Because this outcome is obtained with an optimization approach that differs from the linear encoder used inside sparse autoencoders, the results indicate that sparse autoencoders locate strong solutions to the dictionary learning problem and that traditional

What carries the argument

Double-Batch KSVD (DB-KSVD), which alternates between batch-wise dictionary refinement and sparse coding subproblems to scale the original KSVD procedure to millions of samples.

If this is right

Dictionary learning can now be performed at the scale of current embedding spaces with classical alternating optimization rather than neural encoders alone.
Sparse autoencoders appear to locate good approximate solutions despite the NP-hard character of exact sparse coding.
Further work can combine the scalability of DB-KSVD with more advanced sparse solvers while retaining the same evaluation regime.
The same approach applies across text and vision modalities, indicating the scaling technique is not limited to one data type.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The public implementation allows direct substitution of DB-KSVD into existing interpretability pipelines to test whether feature sets differ in downstream use.
Because KSVD carries established approximation guarantees, the equivalence result supplies a route to importing theoretical bounds into large-scale feature discovery.
Practitioners could apply DB-KSVD first on smaller models where exact sparse solutions are computable, then extrapolate scaling behavior to larger models.

Load-bearing premise

That matching scores on the six SAEBench metrics is sufficient to conclude DB-KSVD reaches solutions of comparable quality to the underlying sparse encoding problem.

What would settle it

A side-by-side run on the same embeddings that measures actual sparse reconstruction error or the distance to the exact sparse solution (on a smaller subset where exact solving is feasible) would show whether DB-KSVD or the SAE encoder stays closer to the optimum.

Figures

Figures reproduced from arXiv: 2505.18441 by Mykel J. Kochenderfer, Romeo Valentin, Sydney M. Katz, Vincent Vanhoucke.

**Figure 1.** Figure 1: Results (higher is better) of our DB-KSVD algorithm and Matryoshka adaptation on the SAEBench benchmark applied to Gemma-2-2B embeddings with 4096 dictionary elements. k = 20 k = 40 k = 80 DB-KSVD MatryoshkaDB-KSVD 0 0.2 0.4 0.6 0.8 1 Coherence Metric 0 0.2 0.4 0.6 0.8 1 Coherence Metric 0 0.2 0.4 0.6 0.8 1 Coherence Metric BatchTopK MatryoshkaBatchTopK [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗

**Figure 2.** Figure 2: Histograms of element-wise coherence metrics for different sparsities. We hypothesize that lower is better. The dictionaries are constructed using the Gemma-2-2B embeddings and comprise m = 4096 dictionary elements. to encourage more incoherent dictionaries. When using the Matryoshka structuring approach, we indeed observe a significant decrease in the coherence of the learned dictionaries. We observe a s… view at source ↗

**Figure 3.** Figure 3: Sparse decomposition of individual images. Each row shows random input image (left) with its top-5 most active dictionary features and their VLM-generated explanations (right). The sparse code (k=32) represents each image as a weighted combination of interpretable concepts. For instance, the cocker spaniel activates features related to dogs, while the honeycomb activates features for hexagonal patterns and… view at source ↗

**Figure 4.** Figure 4: Feature interpretability via automated interpretability. Each row shows a learned dictionary feature with its VLM-generated explanation (left) and the top-5 images with highest activation magnitude for that feature (right). The features capture coherent visual concepts ranging from specific animals to objects and patterns, demonstrating that the sparse decomposition learns semantically meaningful direction… view at source ↗

**Figure 5.** Figure 5: Results (higher is better) of our DB-KSVD algorithm and Matryoshka adaptation on the SAEBench benchmark applied to Gemma-2-2B embeddings with 16 384 dictionary elements. k = 20 k = 40 k = 80 DB-KSVD MatryoshkaDB-KSVD 0 0.2 0.4 0.6 0.8 1 Coherence Metric 0 0.2 0.4 0.6 0.8 1 Coherence Metric 0 0.2 0.4 0.6 0.8 1 Coherence Metric BatchTopK MatryoshkaBatchTopK [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗

**Figure 6.** Figure 6: Histograms of element-wise coherence metrics for different sparsities. We hypothesize that lower is better. The dictionaries are constructed using the Gemma-2-2B embeddings and comprise m = 16 384 dictionary elements. Var. Explained Sparse Probing Autointerpretability Model k SAE DB-KSVD SAE DB-KSVD SAE DB-KSVD DINOv2-S 16 0.749 0.801 0.907 0.833 0.781 0.796 32 0.812 0.875 0.915 0.851 0.723 0.817 64 0.866 … view at source ↗

**Figure 7.** Figure 7: Results (higher is better) of our DB-KSVD algorithm and Matryoshka adaptation on the SAEBench benchmark with 4096 dictionary elements on Pythia-160M embeddings. ksp = 1 ksp = 2 ksp = 5 Model k SAE DB-KSVD SAE DB-KSVD SAE DB-KSVD DINOv2-S 16 0.907 0.833 0.940 0.901 0.961 0.948 32 0.915 0.851 0.943 0.908 0.963 0.952 64 0.924 0.861 0.947 0.911 0.963 0.952 DINOv2-B 16 0.924 0.846 0.954 0.927 0.972 0.965 32 0.9… view at source ↗

**Figure 8.** Figure 8: Mean relative error (Eq. (6)) for SAEBench training (Section 5) on the Gemma-2-2B model plotted against iterations. 0.75 0.80 0.85 0.90 0.95 0 10 20 30 40 0.75 0.80 0.85 0.90 0.95 0 10 20 30 40 m = 4096 m = 16384 Iteration DB-KSVD MatryoshkaDB-KSVD k 20 40 80 data subset train validation [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗

**Figure 9.** Figure 9: Variance explained (Eq. (7)) for SAEBench training (Section 5) on the Gemma-2-2B model plotted against iterations. 18 [PITH_FULL_IMAGE:figures/full_fig_p018_9.png] view at source ↗

read the original abstract

Dictionary learning has recently emerged as a promising approach for mechanistic interpretability of large transformer models. Disentangling high-dimensional transformer embeddings requires algorithms that scale to high-dimensional data with large sample sizes. Recent work has explored sparse autoencoders (SAEs) for this problem. However, SAEs use a simple linear encoder to solve the sparse encoding subproblem, which is known to be NP-hard. It is therefore interesting to understand whether this approach is sufficient to find good solutions to the dictionary learning problem or if a more sophisticated algorithm could find better solutions. In this work, we propose Double-Batch KSVD (DB-KSVD), a scalable dictionary learning algorithm that adapts the classic KSVD algorithm. DB-KSVD is informed by the rich theoretical foundations of KSVD but scales to datasets with millions of samples and thousands of dimensions. We demonstrate the efficacy of DB-KSVD by disentangling text embeddings of the Gemma-2-2B and Pythia-160M models and evaluating on six metrics from the SAEBench benchmark, where we achieve competitive results when compared to established approaches based on SAEs. We further show similar results when disentangling image embeddings obtained from the DINOv2-S and DINOv2-B models, solidifying our findings. By matching SAE performance with an entirely different optimization approach, our results suggest that (i) SAEs do find strong solutions to the dictionary learning problem and (ii) traditional optimization approaches can be scaled to the required problem sizes, offering a promising avenue for further research. We make an implementation of DB-KSVD available at https://github.com/romeov/ksvd.jl.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DB-KSVD scales classic KSVD with a double-batch trick and matches SAEs on SAEBench metrics for embedding disentanglement, but the inference that this proves comparable solutions to the NP-hard sparse encoding step is indirect.

read the letter

The core contribution here is a practical scaling of KSVD to million-sample, thousand-dimension embedding data via the double-batch adaptation. That lets them run the algorithm on text embeddings from Gemma-2-2B and Pythia-160M plus image embeddings from DINOv2 models, then report competitive numbers against SAEs on the six SAEBench metrics. They also ship the Julia implementation, which is a concrete plus for anyone who wants to try it directly. The work sits in the mechanistic interpretability line that treats dictionary learning as a way to factor high-dimensional activations, and it shows that a traditional alternating optimization method can be made to run at the required scale without switching to an autoencoder formulation. That part is new and useful on its own terms. The experiments cover both language and vision cases, which broadens the claim a bit beyond the usual SAE text-only setups. The soft spot is in the interpretation of those competitive scores. The paper argues that matching SAE performance implies SAEs already find strong solutions to the underlying dictionary learning problem and that classical methods can now reach the same regime. The six SAEBench metrics are downstream interpretability measures, though, not direct reports of reconstruction error or the exact sparsity pattern achieved on the original sparse coding objective. Because the encoding subproblem is NP-hard, two methods can land at similar metric values while differing in how well they actually solve the fitting task. The abstract does not include those direct quantities, so the link between benchmark parity and solution quality remains one step removed. If the full paper adds explicit ||X - DZ|| comparisons or nonzero counts in Z with error bars, that would tighten the argument; otherwise it stays as a suggestive but not conclusive result. This paper is aimed at researchers already working on sparse dictionary methods for model internals who are open to non-autoencoder approaches. A reader who wants to see classical optimization compete at modern scales will find the algorithmic detail and the released code worth looking at. It is solid enough on the engineering side to merit a serious referee who can check the experimental controls and ask for the direct objective values.

Referee Report

2 major / 2 minor

Summary. The paper introduces Double-Batch KSVD (DB-KSVD), a scalable adaptation of the classic KSVD dictionary learning algorithm designed for high-dimensional embeddings with large sample sizes. It applies DB-KSVD to disentangle text embeddings from Gemma-2-2B and Pythia-160M as well as image embeddings from DINOv2-S and DINOv2-B, reporting competitive performance on the six SAEBench metrics relative to sparse autoencoders (SAEs). The central claim is that matching SAE results with this distinct optimization approach indicates both that SAEs recover strong solutions to the dictionary learning problem and that traditional methods can be scaled to the relevant regime.

Significance. If the empirical claims are substantiated, the work would be significant for mechanistic interpretability research. It supplies an alternative optimization pathway grounded in the theoretical literature on KSVD, demonstrates practical scalability to millions of samples, and provides a cross-algorithm check on whether SAEs are already near-optimal for the underlying sparse coding task. The open-source Julia implementation further supports reproducibility and follow-on work.

major comments (2)

[Abstract, final paragraph] Abstract, final paragraph: The inference that competitive SAEBench scores establish that DB-KSVD and SAEs achieve comparably good solutions to the NP-hard sparse encoding subproblem is not directly supported by evidence on the core objective. No values are reported for reconstruction error ||X−DZ||, nonzero counts in Z, or other measures of encoding fidelity; downstream interpretability metrics alone do not rule out the possibility that the two methods reach dictionaries with similar SAEBench scores but materially different fidelity to the original sparse coding problem.
[Experimental evaluation] Experimental evaluation: The abstract states that DB-KSVD achieves competitive results on the six SAEBench metrics, yet supplies no numerical values, error bars, statistical tests, hyperparameter selection protocol, or data-split details. Full reporting of these quantities (including per-metric tables with confidence intervals and ablation on batch-size choices) is required to evaluate whether the performance match is robust or sensitive to implementation details.

minor comments (2)

[Method] The description of the double-batch procedure would benefit from an explicit pseudocode listing the two nested batch loops and the update rules for D and Z.
[Notation] Notation for the dictionary D, coefficients Z, and the two batch sizes should be introduced once and used consistently throughout the text and equations.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. The comments highlight important aspects of evidence presentation and experimental reporting that we will address in revision. Below we respond point by point to the major comments.

read point-by-point responses

Referee: [Abstract, final paragraph] Abstract, final paragraph: The inference that competitive SAEBench scores establish that DB-KSVD and SAEs achieve comparably good solutions to the NP-hard sparse encoding subproblem is not directly supported by evidence on the core objective. No values are reported for reconstruction error ||X−DZ||, nonzero counts in Z, or other measures of encoding fidelity; downstream interpretability metrics alone do not rule out the possibility that the two methods reach dictionaries with similar SAEBench scores but materially different fidelity to the original sparse coding problem.

Authors: We agree that the SAEBench metrics are downstream interpretability measures and do not directly quantify fidelity to the sparse coding objective. Our claim is that comparable performance on these metrics, obtained via an entirely different optimization procedure, supports the conclusion that both approaches recover strong solutions for the interpretability task. Nevertheless, we recognize that reporting core quantities such as reconstruction error and sparsity would strengthen the argument. In the revised manuscript we will add tables comparing ||X − DZ||_F and average nonzero counts per sample for DB-KSVD and the SAE baselines on each model. revision: yes
Referee: [Experimental evaluation] Experimental evaluation: The abstract states that DB-KSVD achieves competitive results on the six SAEBench metrics, yet supplies no numerical values, error bars, statistical tests, hyperparameter selection protocol, or data-split details. Full reporting of these quantities (including per-metric tables with confidence intervals and ablation on batch-size choices) is required to evaluate whether the performance match is robust or sensitive to implementation details.

Authors: We accept that the current version does not provide sufficient quantitative detail or robustness checks. The full manuscript contains some experimental description, but we will substantially expand the experimental evaluation section. Revisions will include per-metric tables with numerical values and confidence intervals, explicit statements of the hyperparameter selection protocol and data splits, and an ablation study on batch-size choices. These additions will allow readers to assess the stability of the reported performance match. revision: yes

Circularity Check

0 steps flagged

No significant circularity; central claim rests on external empirical benchmark

full rationale

The paper advances DB-KSVD as a scalable KSVD variant and evaluates it directly against SAEs on the six SAEBench metrics for Gemma-2-2B, Pythia-160M, DINOv2-S and DINOv2-B embeddings. The suggestion that matching performance implies SAEs find strong dictionary-learning solutions follows from this head-to-head comparison rather than from any equation that defines a quantity in terms of itself, any fitted parameter renamed as a prediction, or a load-bearing self-citation chain. No derivation step reduces to its own inputs by construction; the argument remains self-contained against the stated benchmark and external metrics.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Based on the abstract alone, the work rests on the standard alternating-minimization framework of KSVD plus the empirical claim that SAEBench metrics proxy solution quality; no new physical entities or heavily fitted constants are introduced.

free parameters (1)

batch-size hyperparameters
Double-batch sizes are chosen to achieve scalability; exact values and selection procedure are not stated in the abstract.

axioms (1)

domain assumption Alternating optimization in KSVD yields useful dictionary solutions for embedding data
The paper explicitly builds on the theoretical foundations of KSVD as stated in the abstract.

pith-pipeline@v0.9.0 · 5853 in / 1304 out tokens · 84955 ms · 2026-05-19T14:12:24.040034+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose Double-Batch KSVD (DB-KSVD), a scalable dictionary learning algorithm that adapts the classic KSVD algorithm... min D,X ||Y−DX||_F^2 s.t. ||di||_2=1, ||xi||_0≤k
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The coherence of D is defined as μ(D)=max i≠j |⟨di,dj⟩|... k < 1/2(1+μ^{-1}(D))

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Mechanistic Interpretability for Learning Assurance of a Vision-Based Landing System
cs.LG 2026-05 unverdicted novelty 6.0

A vision transformer for runway keypoint regression is decomposed via K-SVD into content and style atoms; the model relies primarily on content atoms, enabling out-of-model-scope detection for runtime assurance in aviation.

Reference graph

Works this paper leans on

15 extracted references · 15 canonical work pages · cited by 1 Pith paper

[1]

Chocolate Labrador Retrievers

“Chocolate Labrador Retrievers.” (#1087)

work page
[2]

A group of dogs together

“A group of dogs together.” (#653)

work page
[3]

Images of relaxed dogs in comfortable, affectionate settings

“Images of relaxed dogs in comfortable, affectionate settings.” (#3189)

work page
[4]

Black puppies

“Black puppies.” (#3140)

work page
[5]

Scottish Terriers with distinctive fur

“Scottish Terriers with distinctive fur.” (#133) Honeycomb

work page
[6]

Whimsical or unconventional objects in everyday settings

“Whimsical or unconventional objects in everyday settings.” (#1509)

work page
[7]

Hexagonal patterns in textured surfaces

“Hexagonal patterns in textured surfaces.” (#3124)

work page
[8]

Honeycomb structures

“Honeycomb structures.” (#650)

work page
[9]

Webpages with text-heavy content and diverse layout designs

“Webpages with text-heavy content and diverse layout designs.” (#2263)

work page
[10]

Stage curtains in shades of red

“Stage curtains in shades of red.” (#1174) Velvet

work page
[11]

Horizontal slatted structures

“Horizontal slatted structures.” (#2614)

work page
[12]

Vibrant abstract patterns

“Vibrant abstract patterns.” (#996)

work page
[13]

Mop heads and cleaning accessories

“Mop heads and cleaning accessories.” (#1763)

work page
[14]

Textiles with striped or patterned designs

“Textiles with striped or patterned designs.” (#2945)

work page
[15]

Long, cylindrical food items

“Long, cylindrical food items.” (#2863) Figure 3.Sparse decomposition of individual images.Each row shows random input image (left) with its top-5 most active dictionary features and their VLM-generated explanations (right). The sparse code (k=32) represents each image as a weighted combination of interpretable concepts. For instance, the cocker spaniel a...

work page arXiv 2023

[1] [1]

Chocolate Labrador Retrievers

“Chocolate Labrador Retrievers.” (#1087)

work page

[2] [2]

A group of dogs together

“A group of dogs together.” (#653)

work page

[3] [3]

Images of relaxed dogs in comfortable, affectionate settings

“Images of relaxed dogs in comfortable, affectionate settings.” (#3189)

work page

[4] [4]

Black puppies

“Black puppies.” (#3140)

work page

[5] [5]

Scottish Terriers with distinctive fur

“Scottish Terriers with distinctive fur.” (#133) Honeycomb

work page

[6] [6]

Whimsical or unconventional objects in everyday settings

“Whimsical or unconventional objects in everyday settings.” (#1509)

work page

[7] [7]

Hexagonal patterns in textured surfaces

“Hexagonal patterns in textured surfaces.” (#3124)

work page

[8] [8]

Honeycomb structures

“Honeycomb structures.” (#650)

work page

[9] [9]

Webpages with text-heavy content and diverse layout designs

“Webpages with text-heavy content and diverse layout designs.” (#2263)

work page

[10] [10]

Stage curtains in shades of red

“Stage curtains in shades of red.” (#1174) Velvet

work page

[11] [11]

Horizontal slatted structures

“Horizontal slatted structures.” (#2614)

work page

[12] [12]

Vibrant abstract patterns

“Vibrant abstract patterns.” (#996)

work page

[13] [13]

Mop heads and cleaning accessories

“Mop heads and cleaning accessories.” (#1763)

work page

[14] [14]

Textiles with striped or patterned designs

“Textiles with striped or patterned designs.” (#2945)

work page

[15] [15]

Long, cylindrical food items

“Long, cylindrical food items.” (#2863) Figure 3.Sparse decomposition of individual images.Each row shows random input image (left) with its top-5 most active dictionary features and their VLM-generated explanations (right). The sparse code (k=32) represents each image as a weighted combination of interpretable concepts. For instance, the cocker spaniel a...

work page arXiv 2023