Understanding Latent Dirichlet Allocation (4) Gibbs Sampling

bayesian machine learning natural language processing

In the last article, I explained LDA parameter inference using variational EM algorithm and implemented it from scratch. In this post, let’s take a look at another algorithm proposed in the original paper that introduced LDA to derive approximate posterior distribution: Gibbs sampling. In addition, I would like to introduce and implement from scratch a collapsed Gibbs sampling method that can efficiently fit topic model to the data.
» continue reading


Understanding Latent Dirichlet Allocation (1) Backgrounds

natural language processing

Latent Dirichlet allocation (LDA) is a three-level bayesian hierarchical model that is frequently used for topic modelling and document classification. First proposed to infer population structure from genotype data, LDA not only allows to represent words as mixtures of topics, but to represent documents as a mixture of words, which makes it a powerful generative probabilistic model.
» continue reading