LDA training
LDA training
RDD of documents, which are term (word) count vectors paired with IDs. The term count vectors are "bags of words" with a fixed-size vocabulary (where the vocabulary size is the length of the vector). Document IDs must be unique and >= 0.
the number of iterations
the number of topics (5000+ for large data)
recommend to be (5.0 /numTopics)
recommend to be in range 0.001 - 0.1
recommend to be in range 0.01 - 1.0
which LDA sampling algorithm to use, recommend not lightlda for short text
which partition strategy to re partition by the graph
StorageLevel that the LDA Model RDD uses
DistributedLDAModel