Class TwoPassDataIndexer

  • All Implemented Interfaces:
    DataIndexer

    public class TwoPassDataIndexer
    extends AbstractDataIndexer
    Collecting event and context counts by making two passes over the events. The first pass determines which contexts will be used by the model, and the second pass creates the events in memory containing only the contexts which will be used. This greatly reduces the amount of memory required for storing the events. During the first pass a temporary event file is created which is read during the second pass.
    • Constructor Detail

      • TwoPassDataIndexer

        public TwoPassDataIndexer​(ObjectStream<Event> eventStream)
                           throws java.io.IOException
        One argument constructor for DataIndexer which calls the two argument constructor assuming no cutoff.
        Parameters:
        eventStream - An Event[] which contains the a list of all the Events seen in the training data.
        Throws:
        java.io.IOException
      • TwoPassDataIndexer

        public TwoPassDataIndexer​(ObjectStream<Event> eventStream,
                                  int cutoff)
                           throws java.io.IOException
        Throws:
        java.io.IOException
      • TwoPassDataIndexer

        public TwoPassDataIndexer​(ObjectStream<Event> eventStream,
                                  int cutoff,
                                  boolean sort)
                           throws java.io.IOException
        Two argument constructor for DataIndexer.
        Parameters:
        eventStream - An Event[] which contains the a list of all the Events seen in the training data.
        cutoff - The minimum number of times a predicate must have been observed in order to be included in the model.
        Throws:
        java.io.IOException