com.eharmony.aloha.dataset.vw.multilabel
A producer that can produce a VwDownsampledMultilabelRowCreator.
A producer that can produce a VwDownsampledMultilabelRowCreator. The requirement for StatefulRowCreatorProducer to only have zero-argument constructors is relaxed for this Producer because we don't have a way of generically constructing a list of labels. If the labels were encoded in the JSON, then a JsonReader for the label type would have to be passed to the constructor. Since the labels can't be encoded generically in the JSON, we accept that this Producer is a special case and allow the labels to be passed directly. The consequence is that this producer doesn't just rely on the dataset specification and the data itself. It also relying on the labels provided to the constructor.
type of input passed to the StatefulRowCreator.
the label type.
Perform the initial scramble.
Perform the initial scramble. This should be called once on the initial
seed prior to the first call to sampleCombination
.
an initial seed
a more scrambled seed.
Sample a k-combination from a population of n.
Sample a k-combination from a population of n.
This algorithm uses a linear congruential pseudorandom number generator (see Knuth) to perform reservoir sampling via "Algorithm R".
It is ~ O(n).
If n
≤ k
, then return 0, ..., n
- 1; otherwise, if k
< n
, the
returned array have length k
with values between 0 and n - 1
(inclusive)
but it is NOT guaranteed to be sorted.
NOTE: This is a pure function. It produces the same results as if
java.util.Random
was used to perform reservoir sampling but since it doesn't
carry state, this can be trivially operated in parallel with no locking or CAS
loop overhead. The consequence is that the seed
must be provided on every call
and a new seed will be returned as part of the output.
To get this function to act like java.util.Random
, the first time it is called, the
seed should be produce by running the desired seed through initSeedScramble
. For
instance:
val (kComb1, newSeed1) = sampleCombination(4, 2, initSeedScramble(0)) val (kComb2, newSeed2) = sampleCombination(4, 2, newSeed1)
For more information, see:
population size
combination size
the seed to use for random selection
a tuple 2 containing the array of 0-based indices representing the k-combination and a new random seed.