Class

com.twitter.scalding.examples

PageRank

Related Doc: package examples

Permalink

class PageRank extends Job

Options: --input: the three column TSV with node, comma-sep-out-neighbors, initial pagerank (set to 1.0 first) --output: the name for the TSV you want to write to, same as above. optional arguments: --errorOut: name of where to write the L1 error between the input page-rank and the output if this is omitted, we don't compute the error --iterations: how many iterations to run inside this job. Default is 1, 10 is about as much as cascading can handle. --jumpprob: probability of a random jump, default is 0.15 --convergence: if this is set, after every "--iterations" steps, we check the error and see if we should continue. Since the error check is expensive (involving a join), you should avoid doing this too frequently. 10 iterations is probably a good number to set. --temp: this is the name where we will store a temporary output so we can compare to the previous for convergence checking. If convergence is set, this MUST be.

Linear Supertypes
Job, Serializable, FieldConversions, LowPriorityFieldConversions, AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. PageRank
  2. Job
  3. Serializable
  4. FieldConversions
  5. LowPriorityFieldConversions
  6. AnyRef
  7. Any
  1. Hide All
  2. Show all
Visibility
  1. Public
  2. All

Instance Constructors

  1. new PageRank(args: Args)

    Permalink

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. val ALPHA: Double

    Permalink
  5. val EDGE: Int

    Permalink
  6. val JOB_COUNT: Int

    Permalink
  7. val NODESET: Int

    Permalink
  8. val STEPS: Int

    Permalink
  9. implicit def _implicitJobArgs: Args

    Permalink
    Attributes
    protected
    Definition Classes
    Job
  10. def anyToFieldArg(f: Any): Comparable[_]

    Permalink
    Attributes
    protected
    Definition Classes
    LowPriorityFieldConversions
  11. val args: Args

    Permalink
    Definition Classes
    Job
  12. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  13. def asList(f: Fields): List[Comparable[_]]

    Permalink
    Definition Classes
    FieldConversions
  14. def asSet(f: Fields): Set[Comparable[_]]

    Permalink
    Definition Classes
    FieldConversions
  15. def buildFlow: Flow[_]

    Permalink
    Definition Classes
    Job
  16. def classIdentifier: String

    Permalink
    Definition Classes
    Job
  17. def clear: Unit

    Permalink
    Definition Classes
    Job
  18. def clone(nextargs: Args): Job

    Permalink
    Definition Classes
    Job
  19. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  20. def computeError(pr: RichPipe): RichPipe

    Permalink
  21. def config: Map[AnyRef, AnyRef]

    Permalink
    Definition Classes
    Job
  22. implicit def dateParser: DateParser

    Permalink
    Definition Classes
    Job
  23. def defaultComparator: Option[Class[_ <: Comparator[_]]]

    Permalink
    Definition Classes
    Job
  24. def defaultMode(fromFields: Fields, toFields: Fields): Fields

    Permalink
    Definition Classes
    FieldConversions
  25. def defaultSpillThreshold: Int

    Permalink
    Definition Classes
    Job
  26. final def doPageRank(steps: Int)(pagerank: RichPipe): RichPipe

    Permalink

    The basic idea is to groupBy the dst key with BOTH the nodeset and the edge rows.

    The basic idea is to groupBy the dst key with BOTH the nodeset and the edge rows. the nodeset rows have the old page-rank, the edge rows are reversed, so we can get the incoming page-rank from the nodes that point to each destination.

    Annotations
    @tailrec()
  27. final def ensureUniqueFields(left: Fields, right: Fields, rightPipe: Pipe): (Fields, Pipe)

    Permalink
    Definition Classes
    FieldConversions
  28. implicit def enumValueToFields(x: Value): Fields

    Permalink
    Definition Classes
    FieldConversions
  29. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  30. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  31. implicit def fieldFields[T <: TraversableOnce[Field[_]]](f: T): RichFields

    Permalink
    Definition Classes
    FieldConversions
  32. implicit def fieldToFields(f: Field[_]): RichFields

    Permalink
    Definition Classes
    FieldConversions
  33. implicit def fields[T <: TraversableOnce[Symbol]](f: T): Fields

    Permalink
    Definition Classes
    FieldConversions
  34. implicit def fieldsToRichFields(fields: Fields): RichFields

    Permalink
    Definition Classes
    FieldConversions
  35. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  36. implicit val flowDef: FlowDef

    Permalink
    Attributes
    protected
    Definition Classes
    Job
  37. implicit def fromEnum[T <: Enumeration](enumeration: T): Fields

    Permalink
    Definition Classes
    FieldConversions
  38. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  39. def getField(f: Fields, idx: Int): Fields

    Permalink
    Definition Classes
    FieldConversions
  40. def handleStats(statsData: CascadingStats): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Job
  41. def hasInts(f: Fields): Boolean

    Permalink
    Definition Classes
    FieldConversions
  42. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  43. def initialize(nodeCol: Symbol, neighCol: Symbol, pageRank: Symbol): Pipe

    Permalink

    override this function to change how you generate a pipe of (Long, String, Double) where the first entry is the nodeid, the second is the list of neighbors, as a comma (no spaces) separated string representation of the numeric nodeids, the third is the initial page rank (if not starting from a previous run, this should be 1.0

    override this function to change how you generate a pipe of (Long, String, Double) where the first entry is the nodeid, the second is the list of neighbors, as a comma (no spaces) separated string representation of the numeric nodeids, the third is the initial page rank (if not starting from a previous run, this should be 1.0

    NOTE: if you want to run until convergence, the initialize method must read the same EXACT format as the output method writes. This is your job!

  44. implicit def intFields[T <: TraversableOnce[Int]](f: T): Fields

    Permalink
    Definition Classes
    FieldConversions
  45. implicit def intToFields(x: Int): Fields

    Permalink
    Definition Classes
    FieldConversions
  46. implicit def integerToFields(x: Integer): Fields

    Permalink
    Definition Classes
    FieldConversions
  47. def ioSerializations: List[Class[_ <: Serialization[_]]]

    Permalink
    Definition Classes
    Job
  48. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  49. implicit def iterableToRichPipe[T](iter: Iterable[T])(implicit set: TupleSetter[T], conv: TupleConverter[T]): RichPipe

    Permalink
    Definition Classes
    Job
  50. def keepAlive: Unit

    Permalink
    Definition Classes
    Job
  51. def listeners: List[FlowListener]

    Permalink
    Definition Classes
    Job
  52. implicit def mode: Mode

    Permalink
    Definition Classes
    Job
  53. def name: String

    Permalink
    Definition Classes
    Job
  54. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  55. final def newSymbol(avoid: Set[Symbol], guess: Symbol, trial: Int): Symbol

    Permalink
    Definition Classes
    FieldConversions
    Annotations
    @tailrec()
  56. def next: Option[Job]

    Permalink

    Here is where we check for convergence and then run the next job if we're not converged

    Here is where we check for convergence and then run the next job if we're not converged

    Definition Classes
    PageRank → Job
  57. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  58. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  59. def output(pipe: RichPipe): Pipe

    Permalink
  60. implicit def parseAnySeqToFields[T <: TraversableOnce[Any]](anyf: T): Fields

    Permalink
    Definition Classes
    FieldConversions
  61. implicit def pipeToRichPipe(pipe: Pipe): RichPipe

    Permalink
    Definition Classes
    Job
  62. implicit def productToFields(f: Product): Fields

    Permalink
    Definition Classes
    LowPriorityFieldConversions
  63. implicit def read(src: Source): Pipe

    Permalink
    Definition Classes
    Job
  64. def run: Boolean

    Permalink
    Definition Classes
    Job
  65. implicit def scaldingConfig: Config

    Permalink
    Attributes
    protected
    Definition Classes
    Job
  66. def skipStrategy: Option[FlowSkipStrategy]

    Permalink
    Definition Classes
    Job
  67. implicit def sourceToRichPipe(src: Source): RichPipe

    Permalink
    Definition Classes
    Job
  68. def stepListeners: List[FlowStepListener]

    Permalink
    Definition Classes
    Job
  69. def stepStrategy: Option[FlowStepStrategy[_]]

    Permalink
    Definition Classes
    Job
  70. implicit def strFields[T <: TraversableOnce[String]](f: T): Fields

    Permalink
    Definition Classes
    FieldConversions
  71. implicit def stringToFields(x: String): Fields

    Permalink
    Definition Classes
    FieldConversions
  72. implicit def symbolToFields(x: Symbol): Fields

    Permalink
    Definition Classes
    FieldConversions
  73. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  74. def timeout[T](timeout: AbsoluteDuration)(t: ⇒ T): Option[T]

    Permalink
    Definition Classes
    Job
  75. implicit def toPipe[T](iter: Iterable[T])(implicit set: TupleSetter[T], conv: TupleConverter[T]): Pipe

    Permalink
    Definition Classes
    Job
  76. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  77. implicit def tuple2ToFieldsPair[T, U](pair: (T, U))(implicit tf: (T) ⇒ Fields, uf: (U) ⇒ Fields): (Fields, Fields)

    Permalink
    Definition Classes
    FieldConversions
  78. implicit def unitToFields(u: Unit): Fields

    Permalink
    Definition Classes
    FieldConversions
  79. def validate: Unit

    Permalink
    Definition Classes
    Job
  80. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  81. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  82. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  83. def write(pipe: Pipe, src: Source): Unit

    Permalink
    Definition Classes
    Job

Inherited from Job

Inherited from Serializable

Inherited from FieldConversions

Inherited from LowPriorityFieldConversions

Inherited from AnyRef

Inherited from Any

Ungrouped