Determines if the vertices of a graph are normalized.
Determines if the vertices of a graph are normalized. Assumes a graph with
Double
vertex attributes.
Counts the number of vertices with no out edges.
Counts the number of vertices with no out edges. These are considered as "dangling" vertices.
This is implemented by a set operation, where the dangling vertices are those that appear in the set of destination vertex IDs only (and not in the source vertex IDs).
Performance note: edges
are iterated over twice, so please consider
persisting it first.
Counts the number of vertices that have self-referencing edges.
Counts the number of vertices that do not have edges that sum to 1.
Counts the number of vertices that do not have edges that sum to 1.0.
Assumes edges with Double
weights.
Normalizes outgoing edge weights of an EdgeRDD.
Normalizes outgoing edge weights of an EdgeRDD.
Performance note: edges
are iterated over twice, so please consider
persisting it first.
Removes any edges that are self-referencing the same vertex.
Removes any edges that are self-referencing the same vertex. That is, any edges where the source and destination are the same.
Given an RDD of source vertex IDs and an RDD of destination vertex IDs (from edges), tag vertex IDs with a flag to indicate if the vertex is dangling (no out edges) or not.
Given the edges of a graph, this unzips the source and destination vertex IDs.
Given the edges of a graph, this unzips the source and destination vertex IDs. ID's in the resulting RDDs are distinct.
Validates the structure of the input PageRank graph, according to the requirements to run PageRank.
Validates the structure of the input PageRank graph, according to the requirements to run PageRank. Returns a list of validation errors, if any.
Performance note: edges
are iterated over three times, and vertices
once, so please consider persisting either or both before running this.
Some general purpose graph operations and utilities. Any operations specific to the complete PageRank graph will be in there and not here.