The Author-Topic Model for Authors and Documents

Mark Steyvers; Michal Rosen-Zvi; Padhraic Smyth; Thomas Griffiths

arxiv: 1207.4169 · v1 · pith:RPCOPRLTnew · submitted 2012-07-11 · 💻 cs.IR · cs.LG· stat.ML

The Author-Topic Model for Authors and Documents

Michal Rosen-Zvi , Thomas Griffiths , Mark Steyvers , Padhraic Smyth This is my paper

classification 💻 cs.IR cs.LGstat.ML

keywords modelauthordistributionassociatedauthor-topicauthorstopicsdocuments

0 comments

read the original abstract

We introduce the author-topic model, a generative model for documents that extends Latent Dirichlet Allocation (LDA; Blei, Ng, & Jordan, 2003) to include authorship information. Each author is associated with a multinomial distribution over topics and each topic is associated with a multinomial distribution over words. A document with multiple authors is modeled as a distribution over topics that is a mixture of the distributions associated with the authors. We apply the model to a collection of 1,700 NIPS conference papers and 160,000 CiteSeer abstracts. Exact inference is intractable for these datasets and we use Gibbs sampling to estimate the topic and author distributions. We compare the performance with two other generative models for documents, which are special cases of the author-topic model: LDA (a topic model) and a simple author model in which each author is associated with a distribution over words rather than a distribution over topics. We show topics recovered by the author-topic model, and demonstrate applications to computing similarity between authors and entropy of author output.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Traditional statistical representations outperform generative AI in identifying expert peer reviewers
cs.IR 2026-05 unverdicted novelty 5.0

TF-IDF identifies labeled experts in the top 25 recommendations 79.5% of the time versus 51.5% for GPT-4o mini on an astronomy observatory dataset.