Class SortingCollection<T>

java.lang.Object
htsjdk.samtools.util.SortingCollection<T>
All Implemented Interfaces:
Iterable<T>

public class SortingCollection<T> extends Object implements Iterable<T>
Collection to which many records can be added. After all records are added, the collection can be iterated, and the records will be returned in order defined by the comparator. Records may be spilled to a temporary directory if there are more records added than will fit in memory. As a result of this, the objects returned may not be identical to the objects added to the collection, but they should be equal as determined by the codec used to write them to disk and read them back.

When iterating over the collection, the number of file handles required is numRecordsInCollection/maxRecordsInRam. If this becomes a limiting factor, a file handle cache could be added.

If Snappy DLL is available and snappy.disable system property is not set to true, then Snappy is used to compress temporary files.

  • Method Details

    • add

      public void add(T rec)
    • doneAdding

      public void doneAdding()
      This method can be called after caller is done adding to collection, in order to possibly free up memory. If iterator() is called immediately after caller is done adding, this is not necessary, because iterator() triggers the same freeing.
    • isDestructiveIteration

      public boolean isDestructiveIteration()
      Returns:
      True if this collection is allowed to discard data during iteration in order to reduce memory footprint, precluding a second iteration over the collection.
    • setDestructiveIteration

      public void setDestructiveIteration(boolean destructiveIteration)
      Tell this collection that it is allowed to discard data during iteration in order to reduce memory footprint, precluding a second iteration. This is true by default.
    • spillToDisk

      public void spillToDisk()
      Sort the records in memory, write them to a file, and clear the buffer of records in memory.
    • iterator

      public CloseableIterator<T> iterator()
      Prepare to iterate through the records in order. This method may be called more than once, but add() may not be called after this method has been called.
      Specified by:
      iterator in interface Iterable<T>
    • cleanup

      public void cleanup()
      Delete any temporary files. After this method is called, iterator() may not be called.
    • newInstance

      @Deprecated public static <T> SortingCollection<T> newInstance(Class<T> componentType, SortingCollection.Codec<T> codec, Comparator<T> comparator, int maxRecordsInRAM, File... tmpDir)
      Deprecated.
      Syntactic sugar around the ctor, to save some typing of type parameters
      Parameters:
      componentType - Class of the record to be sorted. Necessary because of Java generic lameness.
      codec - For writing records to file and reading them back into RAM
      comparator - Defines output sort order
      maxRecordsInRAM - how many records to accumulate in memory before spilling to disk
      tmpDir - Where to write files of records that will not fit in RAM
    • newInstance

      @Deprecated public static <T> SortingCollection<T> newInstance(Class<T> componentType, SortingCollection.Codec<T> codec, Comparator<T> comparator, int maxRecordsInRAM, Collection<File> tmpDirs)
      Syntactic sugar around the ctor, to save some typing of type parameters
      Parameters:
      componentType - Class of the record to be sorted. Necessary because of Java generic lameness.
      codec - For writing records to file and reading them back into RAM
      comparator - Defines output sort order
      maxRecordsInRAM - how many records to accumulate in memory before spilling to disk
      tmpDirs - Where to write files of records that will not fit in RAM
    • newInstance

      public static <T> SortingCollection<T> newInstance(Class<T> componentType, SortingCollection.Codec<T> codec, Comparator<T> comparator, int maxRecordsInRAM, boolean printRecordSizeSampling)
      Syntactic sugar around the ctor, to save some typing of type parameters. Writes files to java.io.tmpdir
      Parameters:
      componentType - Class of the record to be sorted. Necessary because of Java generic lameness.
      codec - For writing records to file and reading them back into RAM
      comparator - Defines output sort order
      maxRecordsInRAM - how many records to accumulate in memory before spilling to disk
      printRecordSizeSampling - If true record size will be sampled and output at DEBUG log level
    • newInstance

      public static <T> SortingCollection<T> newInstance(Class<T> componentType, SortingCollection.Codec<T> codec, Comparator<T> comparator, int maxRecordsInRAM, boolean printRecordSizeSampling, Path... tmpDir)
      Syntactic sugar around the ctor, to save some typing of type parameters
      Parameters:
      componentType - Class of the record to be sorted. Necessary because of Java generic lameness.
      codec - For writing records to file and reading them back into RAM
      comparator - Defines output sort order
      maxRecordsInRAM - how many records to accumulate in memory before spilling to disk
      printRecordSizeSampling - If true record size will be sampled and output at DEBUG log level
      tmpDir - Where to write files of records that will not fit in RAM
    • newInstance

      public static <T> SortingCollection<T> newInstance(Class<T> componentType, SortingCollection.Codec<T> codec, Comparator<T> comparator, int maxRecordsInRAM)
      Syntactic sugar around the ctor, to save some typing of type parameters. Writes files to java.io.tmpdir
      Parameters:
      componentType - Class of the record to be sorted. Necessary because of Java generic lameness.
      codec - For writing records to file and reading them back into RAM
      comparator - Defines output sort order
      maxRecordsInRAM - how many records to accumulate in memory before spilling to disk
    • newInstance

      public static <T> SortingCollection<T> newInstance(Class<T> componentType, SortingCollection.Codec<T> codec, Comparator<T> comparator, int maxRecordsInRAM, Path... tmpDir)
      Syntactic sugar around the ctor, to save some typing of type parameters
      Parameters:
      componentType - Class of the record to be sorted. Necessary because of Java generic lameness.
      codec - For writing records to file and reading them back into RAM
      comparator - Defines output sort order
      maxRecordsInRAM - how many records to accumulate in memory before spilling to disk
      tmpDir - Where to write files of records that will not fit in RAM
    • newInstanceFromPaths

      public static <T> SortingCollection<T> newInstanceFromPaths(Class<T> componentType, SortingCollection.Codec<T> codec, Comparator<T> comparator, int maxRecordsInRAM, Collection<Path> tmpDirs)
      Syntactic sugar around the ctor, to save some typing of type parameters
      Parameters:
      componentType - Class of the record to be sorted. Necessary because of Java generic lameness.
      codec - For writing records to file and reading them back into RAM
      comparator - Defines output sort order
      maxRecordsInRAM - how many records to accumulate in memory before spilling to disk
      tmpDirs - Where to write files of records that will not fit in RAM