java.lang.Object
org.broadinstitute.hellbender.tools.copynumber.utils.HDF5Utils

public final class HDF5Utils extends Object
TODO move into hdf5-java-bindings
  • Field Details

  • Method Details

    • readIntervals

      public static List<SimpleInterval> readIntervals(org.broadinstitute.hdf5.HDF5File file, String path)
      Reads a list of intervals from an HDF5 file using the sub-paths and conventions used by writeIntervals(org.broadinstitute.hdf5.HDF5File, java.lang.String, java.util.List<T>).
    • writeIntervals

      public static <T extends SimpleInterval> void writeIntervals(org.broadinstitute.hdf5.HDF5File file, String path, List<T> intervals)
      Given an HDF5 file and an HDF5 path, writes a list of intervals to hard-coded sub-paths. Contig names are represented by a string array, while intervals are represented by a double matrix, in which the contigs are represented by their index in the aforementioned string array.
    • readChunkedDoubleMatrix

      public static double[][] readChunkedDoubleMatrix(org.broadinstitute.hdf5.HDF5File file, String path)
      Reads a large matrix stored as a set of chunks (submatrices) using the sub-paths and conventions used by writeChunkedDoubleMatrix(org.broadinstitute.hdf5.HDF5File, java.lang.String, double[][], int).
    • writeChunkedDoubleMatrix

      public static void writeChunkedDoubleMatrix(org.broadinstitute.hdf5.HDF5File file, String path, double[][] matrix, int maxChunkSize)
      Given a large matrix, chunks the matrix into equally sized subsets of rows (plus a subset containing the remainder, if necessary) and writes these submatrices to indexed sub-paths to avoid a hard limit in Java HDF5 on the number of elements in a matrix given by MAX_NUMBER_OF_VALUES_PER_HDF5_MATRIX. The number of chunks is determined by maxChunkSize, which should be set appropriately for the desired number of columns.
      Parameters:
      maxChunkSize - The maximum number of values in each chunk. Decreasing this number will reduce heap usage when writing chunks, which requires subarrays to be copied. However, since a single row is not allowed to be split across multiple chunks, the number of columns must be less than the maximum number of values in each chunk.