pith. sign in

arxiv: 2511.16309 · v2 · pith:VQNMFQ6Anew · submitted 2025-11-20 · 💻 cs.CV · cs.LG

Sparse Autoencoders are Topic Models

Pith reviewed 2026-05-21 19:39 UTC · model grok-4.3

classification 💻 cs.CV cs.LG
keywords sparse autoencoderstopic modelscontinuous topic modelembedding spacesthematic analysisimage datasetstext datasetstopic coherence
0
0 comments X

The pith

Sparse autoencoders function as topic models by deriving their objective from a continuous topic model on embeddings.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that sparse autoencoders used on embedding spaces can be understood directly as topic models. The authors define a continuous topic model for embeddings that draws from the structure of Latent Dirichlet Allocation. They prove that the standard sparse autoencoder loss corresponds exactly to maximum a posteriori estimation under this generative model. This reframes the features an SAE learns as thematic topic components rather than steerable directions. The result lets researchers apply SAEs to discover and track themes across large text and image collections by training reusable topic atoms once and combining them flexibly afterward.

Core claim

Sparse autoencoders are topic models because their training objective is the maximum a posteriori estimator for a continuous topic model on embedding spaces, in which each embedding arises as a sparse mixture of thematic basis vectors under a suitable prior and likelihood, so that the SAE decoder directions recover the topic distributions of the model.

What carries the argument

Continuous topic model (CTM) for embedding spaces, under which the SAE reconstruction-plus-sparsity objective is derived as maximum a posteriori estimation.

If this is right

  • SAE features act as reusable thematic components that admit direct interpretation as word or patch distributions.
  • The SAE-TM procedure learns topic atoms in one training run and merges them into any desired number of topics for new data without retraining.
  • Topics extracted this way show higher coherence scores and comparable diversity to strong baselines on both text and image collections.
  • Thematic structure and its changes over time become measurable in image datasets such as historical print series.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Topic-model evaluation metrics such as coherence could be applied directly to SAE features to quantify their interpretability.
  • The derivation suggests hybrid training schemes that add explicit topic-model regularizers to SAE objectives for domain-specific data.
  • The same framing could be tested on embeddings from audio or video models to extract thematic patterns in those modalities.

Load-bearing premise

Embedding spaces are generated according to the continuous topic model with its chosen prior on mixtures and likelihood on observed vectors.

What would settle it

If SAE features recovered from real embeddings cannot be interpreted as coherent distributions over words or visual elements on held-out data while dedicated topic models can, the claimed equivalence does not hold in practice.

Figures

Figures reproduced from arXiv: 2511.16309 by Leander Girrbach, Zeynep Akata.

Figure 1
Figure 1. Figure 1: Sparse autoencoders are topic models: the connection [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of our SAE topic model (SAE-TM): (a) pretrain foundational SAEs on large text or vision datasets to learn transferable [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Statistics of top 10 topics with the highest variance across four popular image datasets. Values indicate the proportion of images [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Statistics of top 10 topics with the highest variance in Japanese woodblock prints from different artistic periods. Changes in topic [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
read the original abstract

Sparse autoencoders (SAEs) are used to analyze embeddings, but their role and practical value are debated. We propose a new perspective on SAEs by demonstrating that they can be naturally understood as topic models. We propose a continuous topic model (CTM) inspired by Latent Dirichlet Allocation (LDA) for embedding spaces and derive the SAE objective as a maximum a posteriori estimator under this model. This view implies SAE features are thematic components rather than steerable directions. To confirm our theoretical findings, we introduce SAE-TM, a topic modeling framework that: (1) trains an SAE to learn reusable topic atoms, (2) interprets them as word distributions on downstream data, and (3) merges them into any number of topics without retraining. SAE-TM yields more coherent topics than strong baselines on text and image datasets while maintaining diversity. Finally, we analyze thematic structure in image datasets and trace topic changes over time in Japanese woodblock prints. Our work positions SAEs as effective tools for large-scale thematic analysis across modalities. Code is available at https://github.com/ExplainableML/SAE-TM .

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes viewing sparse autoencoders (SAEs) as topic models through a continuous topic model (CTM) inspired by LDA for embedding spaces. It derives the SAE objective (reconstruction loss plus L1 sparsity) as the maximum a posteriori estimator under this CTM. This leads to the SAE-TM framework that trains SAEs for reusable topic atoms, interprets them as word distributions, and merges them into topics. SAE-TM is shown to yield more coherent topics than baselines on text and image datasets, with applications to thematic analysis in images and temporal topic tracing in Japanese woodblock prints.

Significance. If the proposed CTM provides a good description of real embedding spaces, the work establishes a theoretical link between SAEs and topic models, suggesting SAE features capture thematic components. The SAE-TM approach offers a flexible, reusable way to perform topic modeling without retraining for different topic counts. The availability of code at the provided GitHub repository enhances reproducibility. This perspective could be significant for interpretability research in computer vision and multimodal learning.

major comments (2)
  1. [Derivation of SAE objective as MAP estimator under CTM] The manuscript constructs the CTM with a specific prior and likelihood chosen to make the SAE loss the MAP objective. While the algebraic equivalence holds within the model, the claim that this implies SAEs are topic models for real embeddings requires evidence that the CTM's induced distribution matches real data statistics. A concrete test, such as matching the distribution of pairwise cosine similarities or the sparsity patterns in activations, should be included to support the transfer of the interpretation.
  2. [SAE-TM empirical evaluation] The coherence metrics used to compare SAE-TM to baselines have known sensitivity to post-processing choices such as the merging procedure or threshold for interpreting atoms as word distributions. The manuscript should quantify this sensitivity, perhaps via ablation on the merging step or alternative coherence measures, to strengthen the claim of superior performance.
minor comments (2)
  1. [Notation and model definition] The definition of the continuous topic model could benefit from an explicit equation contrasting it with standard LDA to highlight the adaptations for continuous embeddings.
  2. [Related work] Additional citations to prior work on using autoencoders or sparse representations for topic modeling would help contextualize the contribution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback. We address each major comment below and indicate planned revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Derivation of SAE objective as MAP estimator under CTM] The manuscript constructs the CTM with a specific prior and likelihood chosen to make the SAE loss the MAP objective. While the algebraic equivalence holds within the model, the claim that this implies SAEs are topic models for real embeddings requires evidence that the CTM's induced distribution matches real data statistics. A concrete test, such as matching the distribution of pairwise cosine similarities or the sparsity patterns in activations, should be included to support the transfer of the interpretation.

    Authors: We agree that the algebraic derivation alone does not automatically establish that the CTM describes real embedding spaces. In the revised manuscript we will add experiments that directly compare the distribution of pairwise cosine similarities and the sparsity patterns of activations between samples drawn from the fitted CTM and the actual embeddings used in our text and image experiments. These tests will provide the requested empirical support for transferring the topic-model interpretation to real data. revision: yes

  2. Referee: [SAE-TM empirical evaluation] The coherence metrics used to compare SAE-TM to baselines have known sensitivity to post-processing choices such as the merging procedure or threshold for interpreting atoms as word distributions. The manuscript should quantify this sensitivity, perhaps via ablation on the merging step or alternative coherence measures, to strengthen the claim of superior performance.

    Authors: We acknowledge that coherence scores can be sensitive to post-processing decisions. In the revision we will include an ablation study that varies the merging threshold and procedure, and we will additionally report results under alternative coherence measures (e.g., NPMI with different window sizes). These additions will quantify the sensitivity and demonstrate that the performance advantage of SAE-TM remains consistent across reasonable post-processing choices. revision: yes

Circularity Check

1 steps flagged

CTM prior/likelihood chosen so MAP recovers SAE loss exactly, making equivalence hold by model construction

specific steps
  1. self definitional [Abstract; derivation of SAE as MAP under CTM]
    "We propose a continuous topic model (CTM) inspired by Latent Dirichlet Allocation (LDA) for embedding spaces and derive the SAE objective as a maximum a posteriori estimator under this model."

    The CTM prior and likelihood are defined such that the MAP estimator under the model is exactly the SAE training objective (reconstruction error plus L1 sparsity). The claimed interpretation that SAE features are thematic components therefore follows tautologically from the choice of generative model rather than from any external property of embedding spaces.

full rationale

The paper's central derivation proposes a continuous topic model whose generative assumptions (prior and likelihood) are selected to make the standard SAE reconstruction-plus-L1 objective its MAP estimator. This algebraic equivalence is true inside the assumed model but transfers to real embeddings only if those embeddings are generated by the CTM; no independent validation (posterior-predictive checks, moment matching, or marginal likelihood comparison) is reported. The step therefore reduces to a self-definitional construction rather than an independent justification that SAEs are topic models on observed data.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the CTM generative story being a reasonable model for embeddings and on the SAE loss being exactly the MAP objective under that story; no new particles or forces are introduced.

free parameters (1)
  • sparsity penalty coefficient
    Standard SAE hyperparameter that controls feature activation rate; its value is chosen to match the topic sparsity implicit in the CTM prior.
axioms (1)
  • domain assumption Embeddings are generated as mixtures of latent topic distributions with a specific prior that induces sparsity.
    This generative assumption is introduced to derive the SAE objective; it is not a standard result from embedding literature.

pith-pipeline@v0.9.0 · 5722 in / 1280 out tokens · 60300 ms · 2026-05-21T19:39:13.875556+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

111 extracted references · 111 canonical work pages

  1. [1]

    Phi-4 technical report

    Marah Abdin, Jyoti Aneja, Harkirat Behl, S ´ebastien Bubeck, Ronen Eldan, Suriya Gunasekar, Michael Harri- son, Russell J Hewett, Mojan Javaheripi, Piero Kauffmann, et al. Phi-4 technical report. InarXiv, 2024. 5, 7

  2. [2]

    Unsupervised domain clusters in pretrained language models

    Roee Aharoni and Yoav Goldberg. Unsupervised domain clusters in pretrained language models. InACL, 2020. 2

  3. [3]

    Top2vec: Distributed representations of topics

    Dimo Angelov. Top2vec: Distributed representations of topics. InarXiv, 2020. 2

  4. [4]

    Granite embedding r2 models

    Parul Awasthy, Aashka Trivedi, Yulong Li, Meet Doshi, Riyaz Bhat, Vishwajeet Kumar, Yushu Yang, Bhavani Iyer, Abraham Daniels, Rudra Murthy, et al. Granite embedding r2 models. InarXiv, 2025. 5

  5. [5]

    Cross-lingual contextual- ized topic models with zero-shot learning

    Federico Bianchi, Silvia Terragni, Dirk Hovy, Debora Nozza, Elisabetta Fersini, et al. Cross-lingual contextual- ized topic models with zero-shot learning. InEACL, 2021. 5, 6

  6. [6]

    Pre- training is a hot topic: contextualized document embed- dings improve topic coherence

    Federico Bianchi, Silvia Terragni, Dirk Hovy, et al. Pre- training is a hot topic: contextualized document embed- dings improve topic coherence. InACL, 2021. 5

  7. [7]

    Nltk: the natural language toolkit

    Steven Bird. Nltk: the natural language toolkit. InCOL- ING/ACL, 2006. 5

  8. [8]

    Correlated topic models

    David Blei and John Lafferty. Correlated topic models. In NeurIPS, 2006. 2

  9. [9]

    Dynamic topic models

    David M Blei and John D Lafferty. Dynamic topic models. InICML, 2006. 2

  10. [10]

    Latent dirichlet allocation

    David M Blei, Andrew Y Ng, and Michael I Jordan. Latent dirichlet allocation. InJMLR, 2003. 1, 2, 4

  11. [11]

    Food-101 – mining discriminative components with ran- dom forests

    Lukas Bossard, Matthieu Guillaumin, and Luc Van Gool. Food-101 – mining discriminative components with ran- dom forests. InECCV, 2014. 6

  12. [12]

    Generating sentences from a continuous space

    Samuel Bowman, Luke Vilnis, Oriol Vinyals, Andrew Dai, Rafal Jozefowicz, and Samy Bengio. Generating sentences from a continuous space. InSIGNLL, 2016. 2

  13. [13]

    Towards monose- manticity: Decomposing language models with dictionary learning

    Trenton Bricken, Adly Templeton, Joshua Batson, Brian Chen, Adam Jermyn, Tom Conerly, Nick Turner, Cem Anil, Carson Denison, Amanda Askell, et al. Towards monose- manticity: Decomposing language models with dictionary learning. InTransformer Circuits Thread, 2023. 1, 2, 3, 4

  14. [14]

    Decoupling spar- sity and smoothness in the dirichlet variational autoencoder topic model

    Sophie Burkhardt and Stefan Kramer. Decoupling spar- sity and smoothness in the dirichlet variational autoencoder topic model. InJMLR, 2019. 2, 5, 6

  15. [15]

    Batchtopk sparse autoencoders

    Bart Bussmann, Patrick Leask, and Neel Nanda. Batchtopk sparse autoencoders. InNeurIPS Workshop on Scientific Methods for Understanding Deep Learning, 2024. 2, 3, 5, 6, 13

  16. [16]

    Neural mod- els for documents with metadata

    Dallas Card, Chenhao Tan, and Noah A Smith. Neural mod- els for documents with metadata. InACL, 2018. 2

  17. [17]

    Reading tea leaves: How humans interpret topic models

    Jonathan Chang, Sean Gerrish, Chong Wang, Jordan Boyd- Graber, and David Blei. Reading tea leaves: How humans interpret topic models. InNeurIPS, 2009. 5

  18. [18]

    Conceptual 12m: Pushing web-scale image-text pre-training to recognize long-tail visual concepts

    Soravit Changpinyo, Piyush Sharma, Nan Ding, and Radu Soricut. Conceptual 12m: Pushing web-scale image-text pre-training to recognize long-tail visual concepts. In CVPR, 2021. 6

  19. [19]

    You are where you tweet: a content-based approach to geo-locating twitter users

    Zhiyuan Cheng, James Caverlee, and Kyumin Lee. You are where you tweet: a content-based approach to geo-locating twitter users. InCIKM, 2010. 5

  20. [20]

    From flat to hierarchical: Ex- tracting sparse representations with matching pursuit

    Val ´erie Costa, Thomas Fel, Ekdeep Singh Lubana, Bahareh Tolooshams, and Demba Ba. From flat to hierarchical: Ex- tracting sparse representations with matching pursuit. In arXiv, 2025. 2

  21. [21]

    Sparse autoencoders find highly interpretable features in language models

    Hoagy Cunningham, Aidan Ewart, Logan Riggs, Robert Huben, and Lee Sharkey. Sparse autoencoders find highly interpretable features in language models. InarXiv, 2023. 2

  22. [22]

    Imagenet: A large-scale hierarchical im- age database

    Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical im- age database. InCVPR, 2009. 6

  23. [23]

    Topic modeling in embedding spaces

    Adji B Dieng, Francisco JR Ruiz, and David M Blei. Topic modeling in embedding spaces. InTACL, 2020. 2, 5, 6

  24. [24]

    Toy models of superposition

    Nelson Elhage, Tristan Hume, Catherine Olsson, Nicholas Schiefer, Tom Henighan, Shauna Kravec, Zac Hatfield- Dodds, Robert Lasenby, Dawn Drain, Carol Chen, et al. Toy models of superposition. InarXiv, 2022. 2

  25. [25]

    Not all language model features are one-dimensionally linear

    Joshua Engels, Eric J Michaud, Isaac Liao, Wes Gurnee, and Max Tegmark. Not all language model features are one-dimensionally linear. InICLR, 2025. 2

  26. [26]

    Decomposing the dark matter of sparse autoencoders

    Joshua Engels, Logan Riggs Smith, and Max Tegmark. Decomposing the dark matter of sparse autoencoders. In TMLR, 2025. 2

  27. [27]

    Prince, Matthew Kowal, Victor Boutin, Isabel Papadimitriou, Binxu Wang, Martin Wattenberg, Demba E

    Thomas Fel, Ekdeep Singh Lubana, Jacob S. Prince, Matthew Kowal, Victor Boutin, Isabel Papadimitriou, Binxu Wang, Martin Wattenberg, Demba E. Ba, and Talia Konkle. Archetypal SAE: Adaptive and stable dictionary learning for concept extraction in large vision models. In FICML, 2025. 2

  28. [28]

    Scaling and evaluating sparse autoencoders

    Leo Gao, Tom Dupre la Tour, Henk Tillman, Gabriel Goh, Rajan Troll, Alec Radford, Ilya Sutskever, Jan Leike, and Jeffrey Wu. Scaling and evaluating sparse autoencoders. In ICLR, 2025. 2, 3, 13

  29. [29]

    Uncurated image-text datasets: Shedding light on demographic bias

    Noa Garcia, Yusuke Hirota, Yankun Wu, and Yuta Nakashima. Uncurated image-text datasets: Shedding light on demographic bias. InCVPR, 2023. 6

  30. [30]

    Sparse-coding variational autoencoders

    Victor Geadah, Gabriel Barello, Daniel Greenidge, Adam S Charles, and Jonathan W Pillow. Sparse-coding variational autoencoders. InNeural computation, 2024. 2

  31. [31]

    Bertopic: Neural topic modeling with a class-based tf-idf procedure

    Maarten Grootendorst. Bertopic: Neural topic modeling with a class-based tf-idf procedure. InarXiv, 2022. 2 9

  32. [32]

    Representing mixtures of word embeddings with mixtures of topic embeddings

    Dan Guo, He Zhao, Huangjie Zheng, Korawat Tanwisuth, Bo Chen, Mingyuan Zhou, et al. Representing mixtures of word embeddings with mixtures of topic embeddings. In ICLR, 2022. 2

  33. [33]

    Apples to apples: A systematic evaluation of topic models

    Ismail Harrando, Pasquale Lisena, and Raphael Troncy. Apples to apples: A systematic evaluation of topic models. InRANLP, 2021. 5

  34. [34]

    Teaching machines to read and comprehend

    Karl Moritz Hermann, Tomas Kocisky, Edward Grefen- stette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom. Teaching machines to read and comprehend. InNeurIPS, 2015. 5

  35. [35]

    Projecting assumptions: The dual- ity between sparse autoencoders and concept geometry

    Sai Sumedh R Hindupur, Ekdeep Singh Lubana, Thomas Fel, and Demba Ba. Projecting assumptions: The dual- ity between sparse autoencoders and concept geometry. In arXiv, 2025. 2

  36. [36]

    Online learning for latent dirichlet allocation

    Matthew Hoffman, Francis Bach, and David Blei. Online learning for latent dirichlet allocation. InNeurIPS, 2010. 2

  37. [37]

    Probabilistic latent semantic indexing

    Thomas Hofmann. Probabilistic latent semantic indexing. InSIGIR, 1999. 2

  38. [38]

    The curious case of neural text degeneration

    Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, and Yejin Choi. The curious case of neural text degeneration. In ICLR, 2020. 5

  39. [39]

    Is au- tomated topic model evaluation broken? the incoherence of coherence

    Alexander Hoyle, Pranav Goel, Andrew Hian-Cheong, De- nis Peskov, Jordan Boyd-Graber, and Philip Resnik. Is au- tomated topic model evaluation broken? the incoherence of coherence. InNeurIPS, 2021. 2, 5

  40. [40]

    Open-set image tagging with multi-grained text su- pervision

    Xinyu Huang, Yi-Jie Huang, Youcai Zhang, Weiwei Tian, Rui Feng, Yuejie Zhang, Yanchun Xie, Yaqian Li, and Lei Zhang. Open-set image tagging with multi-grained text su- pervision. InarXiv, 2023. 7

  41. [41]

    Sparse autoencoders find highly interpretable features in language models

    Robert Huben, Hoagy Cunningham, Logan Riggs Smith, Aidan Ewart, and Lee Sharkey. Sparse autoencoders find highly interpretable features in language models. InICLR,

  42. [42]

    Openclip

    Gabriel Ilharco, Mitchell Wortsman, Ross Wightman, Cade Gordon, Nicholas Carlini, Rohan Taori, Achal Dave, Vaishaal Shankar, Hongseok Namkoong, John Miller, Han- naneh Hajishirzi, and Ludwig Farhadi, Ali an Schmidt. Openclip. InGitHub, 2021. 6

  43. [43]

    Brave: Broadening the visual encoding of vision-language models

    O ˘guzhan Fatih Kar, Alessio Tonioni, Petra Poklukar, Achin Kulshrestha, Amir Zamir, and Federico Tombari. Brave: Broadening the visual encoding of vision-language models. InECCV, 2024. 6

  44. [44]

    SAEBench: A comprehensive benchmark for sparse autoencoders in language model in- terpretability

    Adam Karvonen, Can Rager, Johnny Lin, Curt Tigges, Joseph Isaac Bloom, David Chanin, Yeu-Tong Lau, Eoin Farrell, Callum Stuart McDougall, Kola Ayonrinde, Demian Till, Matthew Wearden, Arthur Conmy, Samuel Marks, and Neel Nanda. SAEBench: A comprehensive benchmark for sparse autoencoders in language model in- terpretability. InICML, 2025. 2

  45. [45]

    Stylistic multi-task analysis of ukiyo-e woodblock prints

    Selina Khan and Nanne van Noord. Stylistic multi-task analysis of ukiyo-e woodblock prints. InBMVC, 2021. 7

  46. [46]

    Interpret- ing vision transformers via residual replacement model

    Jinyeong Kim, Junhyeok Kim, Yumin Shim, Joohyeok Kim, Sunyoung Jung, and Seong Jae Hwang. Interpret- ing vision transformers via residual replacement model. In arXiv, 2025. 1

  47. [47]

    Auto-encoding vari- ational bayes

    Diederik P Kingma and Max Welling. Auto-encoding vari- ational bayes. InICLR, 2014. 2

  48. [48]

    From superposition to sparse codes: interpretable representations in neural networks

    David Klindt, Charles O’Neill, Patrik Reizinger, Harald Maurer, and Nina Miolane. From superposition to sparse codes: interpretable representations in neural networks. In arXiv, 2025. 2

  49. [49]

    Learning multiple layers of features from tiny images

    Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. InTech Report, 2009. 6

  50. [50]

    From word embeddings to document distances

    Matt Kusner, Yu Sun, Nicholas Kolkin, and Kilian Wein- berger. From word embeddings to document distances. In ICML, 2015. 5

  51. [51]

    Ma- chine reading tea leaves: Automatically evaluating topic co- herence and topic model quality

    Jey Han Lau, David Newman, and Timothy Baldwin. Ma- chine reading tea leaves: Automatically evaluating topic co- herence and topic model quality. InEACL, 2014. 5

  52. [52]

    Unbiased region- language alignment for open-vocabulary dense prediction

    Yunheng Li, Yuxuan Li, Quan-Sheng Zeng, Wenhai Wang, Qibin Hou, and Ming-Ming Cheng. Unbiased region- language alignment for open-vocabulary dense prediction. InCVPR, 2025. 6

  53. [53]

    Sparsemax and re- laxed wasserstein for topic sparsity

    Tianyi Lin, Zhiyue Hu, and Xin Guo. Sparsemax and re- laxed wasserstein for topic sparsity. InWDSM, 2019. 2

  54. [54]

    Sparse autoencoders, again? InICML, 2025

    Yin Lu, Xuening Zhu, Tong He, and David Wipf. Sparse autoencoders, again? InICML, 2025. 2

  55. [55]

    Learning word vectors for sentiment analysis

    Andrew Maas, Raymond E Daly, Peter T Pham, Dan Huang, Andrew Y Ng, and Christopher Potts. Learning word vectors for sentiment analysis. InACL-HLT, 2011. 5

  56. [56]

    K-sparse autoen- coders

    Alireza Makhzani and Brendan Frey. K-sparse autoen- coders. InICLR, 2014. 2

  57. [57]

    From softmax to sparsemax: A sparse model of attention and multi-label classification

    Andre Martins and Ramon Astudillo. From softmax to sparsemax: A sparse model of attention and multi-label classification. InICML, 2016. 2

  58. [58]

    Neural variational inference for text processing

    Yishu Miao, Lei Yu, and Phil Blunsom. Neural variational inference for text processing. InICML, 2016. 2

  59. [59]

    Dis- covering discrete latent topics with neural variational infer- ence

    Yishu Miao, Edward Grefenstette, and Phil Blunsom. Dis- covering discrete latent topics with neural variational infer- ence. InICML, 2017. 2

  60. [60]

    Efficient estimation of word representations in vector space

    Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. InarXiv, 2013. 5

  61. [61]

    Mitchell.Machine Learning

    Tom M. Mitchell.Machine Learning. McGraw-Hill, 1997. 5

  62. [62]

    Incorporating hierarchical semantics in sparse au- toencoder architectures

    Mark Muchane, Sean Richardson, Kiho Park, and Victor Veitch. Incorporating hierarchical semantics in sparse au- toencoder architectures. InarXiv, 2025. 2

  63. [63]

    Matryoshka sparse autoencoders

    Noa Nabeshima. Matryoshka sparse autoencoders. InLess- Wrong AI Alignment Forum, 2024. 2

  64. [64]

    Topic modeling with wasserstein autoencoders

    Feng Nan, Ran Ding, Ramesh Nallapati, and Bing Xiang. Topic modeling with wasserstein autoencoders. InACL,

  65. [65]

    Automatic evaluation of topic coherence

    David Newman, Jey Han Lau, Karl Grieser, and Timothy Baldwin. Automatic evaluation of topic coherence. In NAACL-HLT, 2010. 5

  66. [66]

    Contrastive learning for neural topic model

    Thong Nguyen and Anh Tuan Luu. Contrastive learning for neural topic model. InNeurIPS, 2021. 2

  67. [67]

    Emergence of simple-cell receptive field properties by learning a sparse code for natural images

    Bruno A Olshausen and David J Field. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. InNature, 1996. 2

  68. [68]

    The lin- ear representation hypothesis and the geometry of large lan- guage models

    Kiho Park, Yo Joong Choe, and Victor Veitch. The lin- ear representation hypothesis and the geometry of large lan- guage models. InICML, 2024. 3 10

  69. [69]

    Use sparse autoencoders to discover un- known concepts, not to act on known concepts

    Kenny Peng, Rajiv Movva, Jon Kleinberg, Emma Pierson, and Nikhil Garg. Use sparse autoencoders to discover un- known concepts, not to act on known concepts. InarXiv,

  70. [70]

    Glove: Global vectors for word representation

    Jeffrey Pennington, Richard Socher, and Christopher D Manning. Glove: Global vectors for word representation. InEMNLP, 2014. 5

  71. [71]

    Topicgpt: A prompt-based topic modeling framework

    Chau Pham, Alexander Hoyle, Simeng Sun, Philip Resnik, and Mohit Iyyer. Topicgpt: A prompt-based topic modeling framework. InNAACL, 2024. 2

  72. [72]

    Exploring the limits of transfer learning with a unified text-to-text transformer

    Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. Exploring the limits of transfer learning with a unified text-to-text transformer. InJMLR, 2020. 5

  73. [73]

    Contex- tualized topic coherence metrics

    Hamed Rahimi, David Mimno, Jacob Hoover Vigly, Hubert Naacke, Camelia Constantin, and Bernd Amann. Contex- tualized topic coherence metrics. InEACL Findings, 2024. 5

  74. [74]

    Jumping ahead: Improving reconstruction fi- delity with jumprelu sparse autoencoders

    Senthooran Rajamanoharan, Tom Lieberum, Nicolas Son- nerat, Arthur Conmy, Vikrant Varma, J ´anos Kram ´ar, and Neel Nanda. Jumping ahead: Improving reconstruction fi- delity with jumprelu sparse autoencoders. InarXiv, 2024. 2

  75. [75]

    Efficient learning of sparse repre- sentations with an energy-based model

    Marc’Aurelio Ranzato, Christopher Poultney, Sumit Chopra, and Yann Cun. Efficient learning of sparse repre- sentations with an energy-based model. InNeurIPS, 2006. 2

  76. [76]

    Sparse feature learning for deep belief networks

    Marc’Aurelio Ranzato, Y-Lan Boureau, Yann Cun, et al. Sparse feature learning for deep belief networks. In NeurIPS, 2007. 2

  77. [77]

    Discover-then-name: Task-agnostic concept bot- tlenecks via automated concept discovery

    Sukrut Rao, Sweta Mahajan, Moritz B ¨ohle, and Bernt Schiele. Discover-then-name: Task-agnostic concept bot- tlenecks via automated concept discovery. InECCV, 2024. 5

  78. [78]

    Laion- 400m: Open dataset of clip-filtered 400 million image-text pairs

    Christoph Schuhmann, Richard Vencu, Romain Beaumont, Robert Kaczmarczyk, Clayton Mullis, Aarush Katta, Theo Coombes, Jenia Jitsev, and Aran Komatsuzaki. Laion- 400m: Open dataset of clip-filtered 400 million image-text pairs. InarXiv, 2021. 6

  79. [79]

    Large scale vari- ational inference and experimental design for sparse gener- alized linear models

    Matthias W Seeger and Hannes Nickisch. Large scale vari- ational inference and experimental design for sparse gener- alized linear models. InarXiv, 2008. 2

  80. [80]

    Conceptual captions: A cleaned, hypernymed, im- age alt-text dataset for automatic image captioning

    Piyush Sharma, Nan Ding, Sebastian Goodman, and Radu Soricut. Conceptual captions: A cleaned, hypernymed, im- age alt-text dataset for automatic image captioning. InACL,

Showing first 80 references.