the multiplcation between word distribution among all topics and the corresponding doc distribution among all topics: p(w)=\sum_{k}{p(k|d)*p(w|k)}= \sum_{k}{\frac{{n}_{kw}+{\beta }_{w}} {{n}_{k}+\bar{\beta }} \frac{{n}_{kd}+{\alpha }_{k}}{\sum{{n}_{k}}+ \bar{\alpha }}}
the multiplcation between word distribution among all topics and the corresponding doc distribution among all topics: p(w)=\sum_{k}{p(k|d)*p(w|k)}= \sum_{k}{\frac{{n}_{kw}+{\beta }_{w}} {{n}_{k}+\bar{\beta }} \frac{{n}_{kd}+{\alpha }_{k}}{\sum{{n}_{k}}+ \bar{\alpha }}}
\bar{\alpha }}} \sum_{k} \frac{{\alpha }_{k}{\beta }_{w} + {n}_{kw}{\alpha }_{k} + {n}_{kd}{\beta }_{w} + {n}_{kw}{n}_{kd}} {{n}_{k}+\bar{\beta }} \frac{1}{\sum{{n}_{k}}+\bar{\alpha }}} \exp^{-(\sum{\log(p(w))})/N} N is the number of tokens in corpus
Save the term-topic related model
Composition of both Gibbs sampler and Metropolis Hastings sampler time complexity for each sampling is: O(1) 1.
Composition of both Gibbs sampler and Metropolis Hastings sampler time complexity for each sampling is: O(1) 1. sampling word-related parts of standard LDA formula via Gibbs Sampler: Formula (6) in Paper "LightLDA: Big Topic Models on Modest Compute Clusters": ( \frac{{n}_{kd}{-di}+{\beta }_{w}}{{n}_{k}{-di}+\bar{\beta }} ) 2. given the computed probability in step 1 as proposal distribution q in Metropolis Hasting sampling, and we use asymmetric dirichlet prior, presented formula (3) in Paper "Rethinking LDA: Why Priors Matter" \frac{{n}_{kw}{-di}+{\beta }_{w}}{{n}_{k}{-di}+\bar{\beta}} \frac{{n}_{kd} {-di}+ \bar{\alpha} \frac{{n}_{k}{-di} + \acute{\alpha}}{\sum{n}_{k} +\bar{\acute{\alpha }}} }{\sum{n}_{kd}^{-di} +\bar{\alpha}}
where \bar{\beta}=\sum_{w}{\beta}_{w} \bar{\alpha}=\sum_{k}{\alpha}_{k} \bar{\acute{\alpha}}=\bar{\acute{\alpha}}=\sum_{k}\acute{\alpha} {n}_{kd} is the number of tokens in doc d that belong to topic k {n}_{kw} is the number of occurrence for word w that belong to topic k {n}_{k} is the number of tokens in corpus that belong to topic k
\frac{{n}_{k}^{-di} + \acute{\alpha}}{\sum{n}_{k} +\bar{\acute{\alpha }}}
where \bar{\beta}=\sum_{w}{\beta}_{w} \bar{\alpha}=\sum_{k}{\alpha}_{k} \bar{\acute{\alpha}}=\bar{\acute{\alpha}}=\sum_{k}\acute{\alpha} {n}_{kd} is the number of tokens in doc d that belong to topic k {n}_{kw} is the number of occurrence for word w that belong to topic k {n}_{k} is the number of tokens in corpus that belong to topic k