Load or generate on first iteration the matrix M^ given A.
Measure convergence by calculating the total of the absolute difference between the previous and next vectors.
Measure convergence by calculating the total of the absolute difference between the previous and next vectors. This stores the result after calculation.
Recurse and iterate again iff we are under the max number of iterations and vector has not converged.
Recurse and iterate again iff we are under the max number of iterations and vector has not converged.
Load or generate on first iteration the prior vector given d and n.
A weighted PageRank implementation using the Scalding Matrix API. This assumes that all rows and columns are of type
Int
and values or egde weights areDouble
. If you want an unweighted PageRank, simply set the weights on the edges to 1.Input arguments:
d -- damping factor n -- number of nodes in the graph currentIteration -- start with 0 probably maxIterations -- stop after n iterations convergenceThreshold -- using the sum of the absolute difference between iteration solutions, iterating stops once we reach this threshold rootDir -- the root directory holding all starting, intermediate and final data/output
The expected structure of the rootDir is:
rootDir |- iterations | |- 0 <-- a TSV of (row, value) of size n, value can be 1/n (generate this) | |- n <-- holds future iterations/solutions |- edges <-- a TSV of (row, column, value) for edges in the graph |- onesVector <-- a TSV of (row, 1) of size n (generate this) |- diff <-- a single line representing the difference between the last iterations |- constants <-- built at iteration 0, these are constant for any given matrix/graph |- M_hat |- priorVector
Don't forget to set the number of reducers for this job: -D mapred.reduce.tasks=n