Given a set of source nodes and path types, compute values for a feature matrix where the feature types (or columns) are the path types, the rows are (source node, target node) pairs, and the values are the probability of starting at source node, following a path of a particular type, and ending at target node.
Given a set of source nodes and path types, compute values for a feature matrix where the feature types (or columns) are the path types, the rows are (source node, target node) pairs, and the values are the probability of starting at source node, following a path of a particular type, and ending at target node.
This is essentially a simple wrapper around the PathFollower GraphChi program, which computes these features using random walks.
Note that this computes a fixed number of _columns_ of the feature matrix, with a not necessarily known number of rows (when only the source node of the row is specified).
A list of PathType
objects specifying the path types to follow from
each source node.
A feature matrix encoded as a list of MatrixRow
objects. Note that this
feature matrix may not have rows corresponding to every source in sourcesMap if there
were no paths from a source to an acceptable target following any of the path types,
there will be no row in the matrix for that source.
Constructs a MatrixRow for a single instance.
Constructs a MatrixRow for a single instance. This is intended for SGD-style training or online prediction. Note that this could be _really_ inefficient for some kinds of feature generators, and so far is only implemented for SFE.
Constructs a matrix for the test data.
Constructs a matrix for the test data. In general, if this step is dependent on training (because, for instance, a feature set was selected at training time), the FeatureGenerator should save that state internally, and use it to do this computation. Not all implementations need internal state to do this, but some do.
Takes the data, probably does some random walks (or maybe some matrix multiplications, or a few other possibilities), and returns a FeatureMatrix.
Takes the data, probably does some random walks (or maybe some matrix multiplications, or a few other possibilities), and returns a FeatureMatrix.
Returns a string representation of the features in the feature matrix.
Returns a string representation of the features in the feature matrix. This need only be defined after createTrainingMatrix is called once, and calling removeZeroWeightFeatures may change the output of this function (because the training and test matrices may have different feature spaces; see comments above).
For efficiency in creating the test matrix, we might drop some features if they have zero weight.
For efficiency in creating the test matrix, we might drop some features if they have zero weight. In some FeatureGenerator implementations, computing feature values can be very expensive, so this allows us to save some work. The return value is the updated set of weights, with any desired values removed. Yes, this potentially changes the indices and thus the meaning of the feature matrix. Thus the updated weights can't be used anymore on the training matrix, only on the test matrix.
Do feature selection for a PRA model, which amounts to finding common paths between sources and targets.
Do feature selection for a PRA model, which amounts to finding common paths between sources and targets.
This pretty much just wraps around the PathFinder GraphChi program, which does random walks to find paths between source and target nodes, along with a little bit of post processing to (for example) collapse paths that are the same, but are written differently in the GraphChi output because of inverse relationships.
A Dataset
containing source and target nodes from which we start walks.
A ranked list of the numPaths
highest ranked path features, encoded as
PathType
objects.