Sparse Autoencoders are Topic Models
Pith reviewed 2026-05-21 19:39 UTC · model grok-4.3
The pith
Sparse autoencoders function as topic models by deriving their objective from a continuous topic model on embeddings.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Sparse autoencoders are topic models because their training objective is the maximum a posteriori estimator for a continuous topic model on embedding spaces, in which each embedding arises as a sparse mixture of thematic basis vectors under a suitable prior and likelihood, so that the SAE decoder directions recover the topic distributions of the model.
What carries the argument
Continuous topic model (CTM) for embedding spaces, under which the SAE reconstruction-plus-sparsity objective is derived as maximum a posteriori estimation.
If this is right
- SAE features act as reusable thematic components that admit direct interpretation as word or patch distributions.
- The SAE-TM procedure learns topic atoms in one training run and merges them into any desired number of topics for new data without retraining.
- Topics extracted this way show higher coherence scores and comparable diversity to strong baselines on both text and image collections.
- Thematic structure and its changes over time become measurable in image datasets such as historical print series.
Where Pith is reading between the lines
- Topic-model evaluation metrics such as coherence could be applied directly to SAE features to quantify their interpretability.
- The derivation suggests hybrid training schemes that add explicit topic-model regularizers to SAE objectives for domain-specific data.
- The same framing could be tested on embeddings from audio or video models to extract thematic patterns in those modalities.
Load-bearing premise
Embedding spaces are generated according to the continuous topic model with its chosen prior on mixtures and likelihood on observed vectors.
What would settle it
If SAE features recovered from real embeddings cannot be interpreted as coherent distributions over words or visual elements on held-out data while dedicated topic models can, the claimed equivalence does not hold in practice.
Figures
read the original abstract
Sparse autoencoders (SAEs) are used to analyze embeddings, but their role and practical value are debated. We propose a new perspective on SAEs by demonstrating that they can be naturally understood as topic models. We propose a continuous topic model (CTM) inspired by Latent Dirichlet Allocation (LDA) for embedding spaces and derive the SAE objective as a maximum a posteriori estimator under this model. This view implies SAE features are thematic components rather than steerable directions. To confirm our theoretical findings, we introduce SAE-TM, a topic modeling framework that: (1) trains an SAE to learn reusable topic atoms, (2) interprets them as word distributions on downstream data, and (3) merges them into any number of topics without retraining. SAE-TM yields more coherent topics than strong baselines on text and image datasets while maintaining diversity. Finally, we analyze thematic structure in image datasets and trace topic changes over time in Japanese woodblock prints. Our work positions SAEs as effective tools for large-scale thematic analysis across modalities. Code is available at https://github.com/ExplainableML/SAE-TM .
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes viewing sparse autoencoders (SAEs) as topic models through a continuous topic model (CTM) inspired by LDA for embedding spaces. It derives the SAE objective (reconstruction loss plus L1 sparsity) as the maximum a posteriori estimator under this CTM. This leads to the SAE-TM framework that trains SAEs for reusable topic atoms, interprets them as word distributions, and merges them into topics. SAE-TM is shown to yield more coherent topics than baselines on text and image datasets, with applications to thematic analysis in images and temporal topic tracing in Japanese woodblock prints.
Significance. If the proposed CTM provides a good description of real embedding spaces, the work establishes a theoretical link between SAEs and topic models, suggesting SAE features capture thematic components. The SAE-TM approach offers a flexible, reusable way to perform topic modeling without retraining for different topic counts. The availability of code at the provided GitHub repository enhances reproducibility. This perspective could be significant for interpretability research in computer vision and multimodal learning.
major comments (2)
- [Derivation of SAE objective as MAP estimator under CTM] The manuscript constructs the CTM with a specific prior and likelihood chosen to make the SAE loss the MAP objective. While the algebraic equivalence holds within the model, the claim that this implies SAEs are topic models for real embeddings requires evidence that the CTM's induced distribution matches real data statistics. A concrete test, such as matching the distribution of pairwise cosine similarities or the sparsity patterns in activations, should be included to support the transfer of the interpretation.
- [SAE-TM empirical evaluation] The coherence metrics used to compare SAE-TM to baselines have known sensitivity to post-processing choices such as the merging procedure or threshold for interpreting atoms as word distributions. The manuscript should quantify this sensitivity, perhaps via ablation on the merging step or alternative coherence measures, to strengthen the claim of superior performance.
minor comments (2)
- [Notation and model definition] The definition of the continuous topic model could benefit from an explicit equation contrasting it with standard LDA to highlight the adaptations for continuous embeddings.
- [Related work] Additional citations to prior work on using autoencoders or sparse representations for topic modeling would help contextualize the contribution.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback. We address each major comment below and indicate planned revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: [Derivation of SAE objective as MAP estimator under CTM] The manuscript constructs the CTM with a specific prior and likelihood chosen to make the SAE loss the MAP objective. While the algebraic equivalence holds within the model, the claim that this implies SAEs are topic models for real embeddings requires evidence that the CTM's induced distribution matches real data statistics. A concrete test, such as matching the distribution of pairwise cosine similarities or the sparsity patterns in activations, should be included to support the transfer of the interpretation.
Authors: We agree that the algebraic derivation alone does not automatically establish that the CTM describes real embedding spaces. In the revised manuscript we will add experiments that directly compare the distribution of pairwise cosine similarities and the sparsity patterns of activations between samples drawn from the fitted CTM and the actual embeddings used in our text and image experiments. These tests will provide the requested empirical support for transferring the topic-model interpretation to real data. revision: yes
-
Referee: [SAE-TM empirical evaluation] The coherence metrics used to compare SAE-TM to baselines have known sensitivity to post-processing choices such as the merging procedure or threshold for interpreting atoms as word distributions. The manuscript should quantify this sensitivity, perhaps via ablation on the merging step or alternative coherence measures, to strengthen the claim of superior performance.
Authors: We acknowledge that coherence scores can be sensitive to post-processing decisions. In the revision we will include an ablation study that varies the merging threshold and procedure, and we will additionally report results under alternative coherence measures (e.g., NPMI with different window sizes). These additions will quantify the sensitivity and demonstrate that the performance advantage of SAE-TM remains consistent across reasonable post-processing choices. revision: yes
Circularity Check
CTM prior/likelihood chosen so MAP recovers SAE loss exactly, making equivalence hold by model construction
specific steps
-
self definitional
[Abstract; derivation of SAE as MAP under CTM]
"We propose a continuous topic model (CTM) inspired by Latent Dirichlet Allocation (LDA) for embedding spaces and derive the SAE objective as a maximum a posteriori estimator under this model."
The CTM prior and likelihood are defined such that the MAP estimator under the model is exactly the SAE training objective (reconstruction error plus L1 sparsity). The claimed interpretation that SAE features are thematic components therefore follows tautologically from the choice of generative model rather than from any external property of embedding spaces.
full rationale
The paper's central derivation proposes a continuous topic model whose generative assumptions (prior and likelihood) are selected to make the standard SAE reconstruction-plus-L1 objective its MAP estimator. This algebraic equivalence is true inside the assumed model but transfers to real embeddings only if those embeddings are generated by the CTM; no independent validation (posterior-predictive checks, moment matching, or marginal likelihood comparison) is reported. The step therefore reduces to a self-definitional construction rather than an independent justification that SAEs are topic models on observed data.
Axiom & Free-Parameter Ledger
free parameters (1)
- sparsity penalty coefficient
axioms (1)
- domain assumption Embeddings are generated as mixtures of latent topic distributions with a specific prior that induces sparsity.
Reference graph
Works this paper leans on
-
[1]
Marah Abdin, Jyoti Aneja, Harkirat Behl, S ´ebastien Bubeck, Ronen Eldan, Suriya Gunasekar, Michael Harri- son, Russell J Hewett, Mojan Javaheripi, Piero Kauffmann, et al. Phi-4 technical report. InarXiv, 2024. 5, 7
work page 2024
-
[2]
Unsupervised domain clusters in pretrained language models
Roee Aharoni and Yoav Goldberg. Unsupervised domain clusters in pretrained language models. InACL, 2020. 2
work page 2020
-
[3]
Top2vec: Distributed representations of topics
Dimo Angelov. Top2vec: Distributed representations of topics. InarXiv, 2020. 2
work page 2020
-
[4]
Parul Awasthy, Aashka Trivedi, Yulong Li, Meet Doshi, Riyaz Bhat, Vishwajeet Kumar, Yushu Yang, Bhavani Iyer, Abraham Daniels, Rudra Murthy, et al. Granite embedding r2 models. InarXiv, 2025. 5
work page 2025
-
[5]
Cross-lingual contextual- ized topic models with zero-shot learning
Federico Bianchi, Silvia Terragni, Dirk Hovy, Debora Nozza, Elisabetta Fersini, et al. Cross-lingual contextual- ized topic models with zero-shot learning. InEACL, 2021. 5, 6
work page 2021
-
[6]
Pre- training is a hot topic: contextualized document embed- dings improve topic coherence
Federico Bianchi, Silvia Terragni, Dirk Hovy, et al. Pre- training is a hot topic: contextualized document embed- dings improve topic coherence. InACL, 2021. 5
work page 2021
-
[7]
Nltk: the natural language toolkit
Steven Bird. Nltk: the natural language toolkit. InCOL- ING/ACL, 2006. 5
work page 2006
-
[8]
David Blei and John Lafferty. Correlated topic models. In NeurIPS, 2006. 2
work page 2006
-
[9]
David M Blei and John D Lafferty. Dynamic topic models. InICML, 2006. 2
work page 2006
-
[10]
David M Blei, Andrew Y Ng, and Michael I Jordan. Latent dirichlet allocation. InJMLR, 2003. 1, 2, 4
work page 2003
-
[11]
Food-101 – mining discriminative components with ran- dom forests
Lukas Bossard, Matthieu Guillaumin, and Luc Van Gool. Food-101 – mining discriminative components with ran- dom forests. InECCV, 2014. 6
work page 2014
-
[12]
Generating sentences from a continuous space
Samuel Bowman, Luke Vilnis, Oriol Vinyals, Andrew Dai, Rafal Jozefowicz, and Samy Bengio. Generating sentences from a continuous space. InSIGNLL, 2016. 2
work page 2016
-
[13]
Towards monose- manticity: Decomposing language models with dictionary learning
Trenton Bricken, Adly Templeton, Joshua Batson, Brian Chen, Adam Jermyn, Tom Conerly, Nick Turner, Cem Anil, Carson Denison, Amanda Askell, et al. Towards monose- manticity: Decomposing language models with dictionary learning. InTransformer Circuits Thread, 2023. 1, 2, 3, 4
work page 2023
-
[14]
Decoupling spar- sity and smoothness in the dirichlet variational autoencoder topic model
Sophie Burkhardt and Stefan Kramer. Decoupling spar- sity and smoothness in the dirichlet variational autoencoder topic model. InJMLR, 2019. 2, 5, 6
work page 2019
-
[15]
Bart Bussmann, Patrick Leask, and Neel Nanda. Batchtopk sparse autoencoders. InNeurIPS Workshop on Scientific Methods for Understanding Deep Learning, 2024. 2, 3, 5, 6, 13
work page 2024
-
[16]
Neural mod- els for documents with metadata
Dallas Card, Chenhao Tan, and Noah A Smith. Neural mod- els for documents with metadata. InACL, 2018. 2
work page 2018
-
[17]
Reading tea leaves: How humans interpret topic models
Jonathan Chang, Sean Gerrish, Chong Wang, Jordan Boyd- Graber, and David Blei. Reading tea leaves: How humans interpret topic models. InNeurIPS, 2009. 5
work page 2009
-
[18]
Conceptual 12m: Pushing web-scale image-text pre-training to recognize long-tail visual concepts
Soravit Changpinyo, Piyush Sharma, Nan Ding, and Radu Soricut. Conceptual 12m: Pushing web-scale image-text pre-training to recognize long-tail visual concepts. In CVPR, 2021. 6
work page 2021
-
[19]
You are where you tweet: a content-based approach to geo-locating twitter users
Zhiyuan Cheng, James Caverlee, and Kyumin Lee. You are where you tweet: a content-based approach to geo-locating twitter users. InCIKM, 2010. 5
work page 2010
-
[20]
From flat to hierarchical: Ex- tracting sparse representations with matching pursuit
Val ´erie Costa, Thomas Fel, Ekdeep Singh Lubana, Bahareh Tolooshams, and Demba Ba. From flat to hierarchical: Ex- tracting sparse representations with matching pursuit. In arXiv, 2025. 2
work page 2025
-
[21]
Sparse autoencoders find highly interpretable features in language models
Hoagy Cunningham, Aidan Ewart, Logan Riggs, Robert Huben, and Lee Sharkey. Sparse autoencoders find highly interpretable features in language models. InarXiv, 2023. 2
work page 2023
-
[22]
Imagenet: A large-scale hierarchical im- age database
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical im- age database. InCVPR, 2009. 6
work page 2009
-
[23]
Topic modeling in embedding spaces
Adji B Dieng, Francisco JR Ruiz, and David M Blei. Topic modeling in embedding spaces. InTACL, 2020. 2, 5, 6
work page 2020
-
[24]
Nelson Elhage, Tristan Hume, Catherine Olsson, Nicholas Schiefer, Tom Henighan, Shauna Kravec, Zac Hatfield- Dodds, Robert Lasenby, Dawn Drain, Carol Chen, et al. Toy models of superposition. InarXiv, 2022. 2
work page 2022
-
[25]
Not all language model features are one-dimensionally linear
Joshua Engels, Eric J Michaud, Isaac Liao, Wes Gurnee, and Max Tegmark. Not all language model features are one-dimensionally linear. InICLR, 2025. 2
work page 2025
-
[26]
Decomposing the dark matter of sparse autoencoders
Joshua Engels, Logan Riggs Smith, and Max Tegmark. Decomposing the dark matter of sparse autoencoders. In TMLR, 2025. 2
work page 2025
-
[27]
Prince, Matthew Kowal, Victor Boutin, Isabel Papadimitriou, Binxu Wang, Martin Wattenberg, Demba E
Thomas Fel, Ekdeep Singh Lubana, Jacob S. Prince, Matthew Kowal, Victor Boutin, Isabel Papadimitriou, Binxu Wang, Martin Wattenberg, Demba E. Ba, and Talia Konkle. Archetypal SAE: Adaptive and stable dictionary learning for concept extraction in large vision models. In FICML, 2025. 2
work page 2025
-
[28]
Scaling and evaluating sparse autoencoders
Leo Gao, Tom Dupre la Tour, Henk Tillman, Gabriel Goh, Rajan Troll, Alec Radford, Ilya Sutskever, Jan Leike, and Jeffrey Wu. Scaling and evaluating sparse autoencoders. In ICLR, 2025. 2, 3, 13
work page 2025
-
[29]
Uncurated image-text datasets: Shedding light on demographic bias
Noa Garcia, Yusuke Hirota, Yankun Wu, and Yuta Nakashima. Uncurated image-text datasets: Shedding light on demographic bias. InCVPR, 2023. 6
work page 2023
-
[30]
Sparse-coding variational autoencoders
Victor Geadah, Gabriel Barello, Daniel Greenidge, Adam S Charles, and Jonathan W Pillow. Sparse-coding variational autoencoders. InNeural computation, 2024. 2
work page 2024
-
[31]
Bertopic: Neural topic modeling with a class-based tf-idf procedure
Maarten Grootendorst. Bertopic: Neural topic modeling with a class-based tf-idf procedure. InarXiv, 2022. 2 9
work page 2022
-
[32]
Representing mixtures of word embeddings with mixtures of topic embeddings
Dan Guo, He Zhao, Huangjie Zheng, Korawat Tanwisuth, Bo Chen, Mingyuan Zhou, et al. Representing mixtures of word embeddings with mixtures of topic embeddings. In ICLR, 2022. 2
work page 2022
-
[33]
Apples to apples: A systematic evaluation of topic models
Ismail Harrando, Pasquale Lisena, and Raphael Troncy. Apples to apples: A systematic evaluation of topic models. InRANLP, 2021. 5
work page 2021
-
[34]
Teaching machines to read and comprehend
Karl Moritz Hermann, Tomas Kocisky, Edward Grefen- stette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom. Teaching machines to read and comprehend. InNeurIPS, 2015. 5
work page 2015
-
[35]
Projecting assumptions: The dual- ity between sparse autoencoders and concept geometry
Sai Sumedh R Hindupur, Ekdeep Singh Lubana, Thomas Fel, and Demba Ba. Projecting assumptions: The dual- ity between sparse autoencoders and concept geometry. In arXiv, 2025. 2
work page 2025
-
[36]
Online learning for latent dirichlet allocation
Matthew Hoffman, Francis Bach, and David Blei. Online learning for latent dirichlet allocation. InNeurIPS, 2010. 2
work page 2010
-
[37]
Probabilistic latent semantic indexing
Thomas Hofmann. Probabilistic latent semantic indexing. InSIGIR, 1999. 2
work page 1999
-
[38]
The curious case of neural text degeneration
Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, and Yejin Choi. The curious case of neural text degeneration. In ICLR, 2020. 5
work page 2020
-
[39]
Is au- tomated topic model evaluation broken? the incoherence of coherence
Alexander Hoyle, Pranav Goel, Andrew Hian-Cheong, De- nis Peskov, Jordan Boyd-Graber, and Philip Resnik. Is au- tomated topic model evaluation broken? the incoherence of coherence. InNeurIPS, 2021. 2, 5
work page 2021
-
[40]
Open-set image tagging with multi-grained text su- pervision
Xinyu Huang, Yi-Jie Huang, Youcai Zhang, Weiwei Tian, Rui Feng, Yuejie Zhang, Yanchun Xie, Yaqian Li, and Lei Zhang. Open-set image tagging with multi-grained text su- pervision. InarXiv, 2023. 7
work page 2023
-
[41]
Sparse autoencoders find highly interpretable features in language models
Robert Huben, Hoagy Cunningham, Logan Riggs Smith, Aidan Ewart, and Lee Sharkey. Sparse autoencoders find highly interpretable features in language models. InICLR,
- [42]
-
[43]
Brave: Broadening the visual encoding of vision-language models
O ˘guzhan Fatih Kar, Alessio Tonioni, Petra Poklukar, Achin Kulshrestha, Amir Zamir, and Federico Tombari. Brave: Broadening the visual encoding of vision-language models. InECCV, 2024. 6
work page 2024
-
[44]
SAEBench: A comprehensive benchmark for sparse autoencoders in language model in- terpretability
Adam Karvonen, Can Rager, Johnny Lin, Curt Tigges, Joseph Isaac Bloom, David Chanin, Yeu-Tong Lau, Eoin Farrell, Callum Stuart McDougall, Kola Ayonrinde, Demian Till, Matthew Wearden, Arthur Conmy, Samuel Marks, and Neel Nanda. SAEBench: A comprehensive benchmark for sparse autoencoders in language model in- terpretability. InICML, 2025. 2
work page 2025
-
[45]
Stylistic multi-task analysis of ukiyo-e woodblock prints
Selina Khan and Nanne van Noord. Stylistic multi-task analysis of ukiyo-e woodblock prints. InBMVC, 2021. 7
work page 2021
-
[46]
Interpret- ing vision transformers via residual replacement model
Jinyeong Kim, Junhyeok Kim, Yumin Shim, Joohyeok Kim, Sunyoung Jung, and Seong Jae Hwang. Interpret- ing vision transformers via residual replacement model. In arXiv, 2025. 1
work page 2025
-
[47]
Auto-encoding vari- ational bayes
Diederik P Kingma and Max Welling. Auto-encoding vari- ational bayes. InICLR, 2014. 2
work page 2014
-
[48]
From superposition to sparse codes: interpretable representations in neural networks
David Klindt, Charles O’Neill, Patrik Reizinger, Harald Maurer, and Nina Miolane. From superposition to sparse codes: interpretable representations in neural networks. In arXiv, 2025. 2
work page 2025
-
[49]
Learning multiple layers of features from tiny images
Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. InTech Report, 2009. 6
work page 2009
-
[50]
From word embeddings to document distances
Matt Kusner, Yu Sun, Nicholas Kolkin, and Kilian Wein- berger. From word embeddings to document distances. In ICML, 2015. 5
work page 2015
-
[51]
Ma- chine reading tea leaves: Automatically evaluating topic co- herence and topic model quality
Jey Han Lau, David Newman, and Timothy Baldwin. Ma- chine reading tea leaves: Automatically evaluating topic co- herence and topic model quality. InEACL, 2014. 5
work page 2014
-
[52]
Unbiased region- language alignment for open-vocabulary dense prediction
Yunheng Li, Yuxuan Li, Quan-Sheng Zeng, Wenhai Wang, Qibin Hou, and Ming-Ming Cheng. Unbiased region- language alignment for open-vocabulary dense prediction. InCVPR, 2025. 6
work page 2025
-
[53]
Sparsemax and re- laxed wasserstein for topic sparsity
Tianyi Lin, Zhiyue Hu, and Xin Guo. Sparsemax and re- laxed wasserstein for topic sparsity. InWDSM, 2019. 2
work page 2019
-
[54]
Sparse autoencoders, again? InICML, 2025
Yin Lu, Xuening Zhu, Tong He, and David Wipf. Sparse autoencoders, again? InICML, 2025. 2
work page 2025
-
[55]
Learning word vectors for sentiment analysis
Andrew Maas, Raymond E Daly, Peter T Pham, Dan Huang, Andrew Y Ng, and Christopher Potts. Learning word vectors for sentiment analysis. InACL-HLT, 2011. 5
work page 2011
-
[56]
Alireza Makhzani and Brendan Frey. K-sparse autoen- coders. InICLR, 2014. 2
work page 2014
-
[57]
From softmax to sparsemax: A sparse model of attention and multi-label classification
Andre Martins and Ramon Astudillo. From softmax to sparsemax: A sparse model of attention and multi-label classification. InICML, 2016. 2
work page 2016
-
[58]
Neural variational inference for text processing
Yishu Miao, Lei Yu, and Phil Blunsom. Neural variational inference for text processing. InICML, 2016. 2
work page 2016
-
[59]
Dis- covering discrete latent topics with neural variational infer- ence
Yishu Miao, Edward Grefenstette, and Phil Blunsom. Dis- covering discrete latent topics with neural variational infer- ence. InICML, 2017. 2
work page 2017
-
[60]
Efficient estimation of word representations in vector space
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. InarXiv, 2013. 5
work page 2013
- [61]
-
[62]
Incorporating hierarchical semantics in sparse au- toencoder architectures
Mark Muchane, Sean Richardson, Kiho Park, and Victor Veitch. Incorporating hierarchical semantics in sparse au- toencoder architectures. InarXiv, 2025. 2
work page 2025
-
[63]
Matryoshka sparse autoencoders
Noa Nabeshima. Matryoshka sparse autoencoders. InLess- Wrong AI Alignment Forum, 2024. 2
work page 2024
-
[64]
Topic modeling with wasserstein autoencoders
Feng Nan, Ran Ding, Ramesh Nallapati, and Bing Xiang. Topic modeling with wasserstein autoencoders. InACL,
-
[65]
Automatic evaluation of topic coherence
David Newman, Jey Han Lau, Karl Grieser, and Timothy Baldwin. Automatic evaluation of topic coherence. In NAACL-HLT, 2010. 5
work page 2010
-
[66]
Contrastive learning for neural topic model
Thong Nguyen and Anh Tuan Luu. Contrastive learning for neural topic model. InNeurIPS, 2021. 2
work page 2021
-
[67]
Emergence of simple-cell receptive field properties by learning a sparse code for natural images
Bruno A Olshausen and David J Field. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. InNature, 1996. 2
work page 1996
-
[68]
The lin- ear representation hypothesis and the geometry of large lan- guage models
Kiho Park, Yo Joong Choe, and Victor Veitch. The lin- ear representation hypothesis and the geometry of large lan- guage models. InICML, 2024. 3 10
work page 2024
-
[69]
Use sparse autoencoders to discover un- known concepts, not to act on known concepts
Kenny Peng, Rajiv Movva, Jon Kleinberg, Emma Pierson, and Nikhil Garg. Use sparse autoencoders to discover un- known concepts, not to act on known concepts. InarXiv,
-
[70]
Glove: Global vectors for word representation
Jeffrey Pennington, Richard Socher, and Christopher D Manning. Glove: Global vectors for word representation. InEMNLP, 2014. 5
work page 2014
-
[71]
Topicgpt: A prompt-based topic modeling framework
Chau Pham, Alexander Hoyle, Simeng Sun, Philip Resnik, and Mohit Iyyer. Topicgpt: A prompt-based topic modeling framework. InNAACL, 2024. 2
work page 2024
-
[72]
Exploring the limits of transfer learning with a unified text-to-text transformer
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. Exploring the limits of transfer learning with a unified text-to-text transformer. InJMLR, 2020. 5
work page 2020
-
[73]
Contex- tualized topic coherence metrics
Hamed Rahimi, David Mimno, Jacob Hoover Vigly, Hubert Naacke, Camelia Constantin, and Bernd Amann. Contex- tualized topic coherence metrics. InEACL Findings, 2024. 5
work page 2024
-
[74]
Jumping ahead: Improving reconstruction fi- delity with jumprelu sparse autoencoders
Senthooran Rajamanoharan, Tom Lieberum, Nicolas Son- nerat, Arthur Conmy, Vikrant Varma, J ´anos Kram ´ar, and Neel Nanda. Jumping ahead: Improving reconstruction fi- delity with jumprelu sparse autoencoders. InarXiv, 2024. 2
work page 2024
-
[75]
Efficient learning of sparse repre- sentations with an energy-based model
Marc’Aurelio Ranzato, Christopher Poultney, Sumit Chopra, and Yann Cun. Efficient learning of sparse repre- sentations with an energy-based model. InNeurIPS, 2006. 2
work page 2006
-
[76]
Sparse feature learning for deep belief networks
Marc’Aurelio Ranzato, Y-Lan Boureau, Yann Cun, et al. Sparse feature learning for deep belief networks. In NeurIPS, 2007. 2
work page 2007
-
[77]
Discover-then-name: Task-agnostic concept bot- tlenecks via automated concept discovery
Sukrut Rao, Sweta Mahajan, Moritz B ¨ohle, and Bernt Schiele. Discover-then-name: Task-agnostic concept bot- tlenecks via automated concept discovery. InECCV, 2024. 5
work page 2024
-
[78]
Laion- 400m: Open dataset of clip-filtered 400 million image-text pairs
Christoph Schuhmann, Richard Vencu, Romain Beaumont, Robert Kaczmarczyk, Clayton Mullis, Aarush Katta, Theo Coombes, Jenia Jitsev, and Aran Komatsuzaki. Laion- 400m: Open dataset of clip-filtered 400 million image-text pairs. InarXiv, 2021. 6
work page 2021
-
[79]
Large scale vari- ational inference and experimental design for sparse gener- alized linear models
Matthias W Seeger and Hannes Nickisch. Large scale vari- ational inference and experimental design for sparse gener- alized linear models. InarXiv, 2008. 2
work page 2008
-
[80]
Conceptual captions: A cleaned, hypernymed, im- age alt-text dataset for automatic image captioning
Piyush Sharma, Nan Ding, Sebastian Goodman, and Radu Soricut. Conceptual captions: A cleaned, hypernymed, im- age alt-text dataset for automatic image captioning. InACL,
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.