Class BlockCompressedInputStream

java.lang.Object
java.io.InputStream
htsjdk.samtools.util.BlockCompressedInputStream
All Implemented Interfaces:
LocationAware, Closeable, AutoCloseable
Direct Known Subclasses:
AsyncBlockCompressedInputStream

public class BlockCompressedInputStream extends InputStream implements LocationAware
Utility class for reading BGZF block compressed files. The caller can treat this file like any other InputStream. It probably is not necessary to wrap this stream in a buffering stream, because there is internal buffering. The advantage of BGZF over conventional GZip format is that BGZF allows for seeking without having to read the entire file up to the location being sought. Note that seeking is only possible if the input stream is seekable. Note that this implementation is not synchronized. If multiple threads access an instance concurrently, it must be synchronized externally. c.f. http://samtools.sourceforge.net/SAM1.pdf for details of BGZF format
  • Field Details

  • Constructor Details

    • BlockCompressedInputStream

      public BlockCompressedInputStream(InputStream stream)
      Note that seek() is not supported if this ctor is used.
      Parameters:
      stream - source of bytes
    • BlockCompressedInputStream

      public BlockCompressedInputStream(InputStream stream, InflaterFactory inflaterFactory)
      Note that seek() is not supported if this ctor is used.
      Parameters:
      stream - source of bytes
      inflaterFactory - InflaterFactory used by BlockGunzipper
    • BlockCompressedInputStream

      public BlockCompressedInputStream(InputStream stream, boolean allowBuffering)
      Note that seek() is not supported if this ctor is used.
      Parameters:
      stream - source of bytes
      allowBuffering - if true, allow buffering
    • BlockCompressedInputStream

      public BlockCompressedInputStream(InputStream stream, boolean allowBuffering, InflaterFactory inflaterFactory)
      Note that seek() is not supported if this ctor is used.
      Parameters:
      stream - source of bytes
      allowBuffering - if true, allow buffering
      inflaterFactory - InflaterFactory used by BlockGunzipper
    • BlockCompressedInputStream

      public BlockCompressedInputStream(File file) throws IOException
      Use this ctor if you wish to call seek()
      Parameters:
      file - source of bytes
      Throws:
      IOException
    • BlockCompressedInputStream

      public BlockCompressedInputStream(Path file) throws IOException
      Equivalent constructor for Path as the one that takes a File. Supports seeking.
      Throws:
      IOException
    • BlockCompressedInputStream

      public BlockCompressedInputStream(File file, InflaterFactory inflaterFactory) throws IOException
      Use this ctor if you wish to call seek()
      Parameters:
      file - source of bytes
      inflaterFactory - InflaterFactory used by BlockGunzipper
      Throws:
      IOException
    • BlockCompressedInputStream

      public BlockCompressedInputStream(URL url)
      Parameters:
      url - source of bytes
    • BlockCompressedInputStream

      public BlockCompressedInputStream(URL url, InflaterFactory inflaterFactory)
      Parameters:
      url - source of bytes
      inflaterFactory - InflaterFactory used by BlockGunzipper
    • BlockCompressedInputStream

      public BlockCompressedInputStream(SeekableStream strm)
      For providing some arbitrary data source. No additional buffering is provided, so if the underlying source is not buffered, wrap it in a SeekableBufferedStream before passing to this ctor.
      Parameters:
      strm - source of bytes
    • BlockCompressedInputStream

      public BlockCompressedInputStream(SeekableStream strm, InflaterFactory inflaterFactory)
      For providing some arbitrary data source. No additional buffering is provided, so if the underlying source is not buffered, wrap it in a SeekableBufferedStream before passing to this ctor.
      Parameters:
      strm - source of bytes
      inflaterFactory - InflaterFactory used by BlockGunzipper
  • Method Details

    • setCheckCrcs

      public void setCheckCrcs(boolean check)
      Determines whether or not the inflater will re-calculated the CRC on the decompressed data and check it against the value stored in the GZIP header. CRC checking is an expensive operation and should be used accordingly.
    • available

      public int available() throws IOException
      Overrides:
      available in class InputStream
      Returns:
      the number of bytes that can be read (or skipped over) from this input stream without blocking by the next caller of a method for this input stream. The next caller might be the same thread or another thread. Note that although the next caller can read this many bytes without blocking, the available() method call itself may block in order to fill an internal buffer if it has been exhausted.
      Throws:
      IOException
    • endOfBlock

      public boolean endOfBlock()
      Returns:
      true if the stream is at the end of a BGZF block, false otherwise.
    • close

      public void close() throws IOException
      Closes the underlying InputStream or RandomAccessFile
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable
      Overrides:
      close in class InputStream
      Throws:
      IOException
    • read

      public int read() throws IOException
      Reads the next byte of data from the input stream. The value byte is returned as an int in the range 0 to 255. If no byte is available because the end of the stream has been reached, the value -1 is returned. This method blocks until input data is available, the end of the stream is detected, or an exception is thrown.
      Specified by:
      read in class InputStream
      Returns:
      the next byte of data, or -1 if the end of the stream is reached.
      Throws:
      IOException
    • read

      public int read(byte[] buffer) throws IOException
      Reads some number of bytes from the input stream and stores them into the buffer array b. The number of bytes actually read is returned as an integer. This method blocks until input data is available, end of file is detected, or an exception is thrown. read(buf) has the same effect as read(buf, 0, buf.length).
      Overrides:
      read in class InputStream
      Parameters:
      buffer - the buffer into which the data is read.
      Returns:
      the total number of bytes read into the buffer, or -1 is there is no more data because the end of the stream has been reached.
      Throws:
      IOException
    • readLine

      public String readLine() throws IOException
      Reads a whole line. A line is considered to be terminated by either a line feed ('\n'), carriage return ('\r') or carriage return followed by a line feed ("\r\n").
      Returns:
      A String containing the contents of the line, excluding the line terminating character, or null if the end of the stream has been reached
      Throws:
      IOException - If an I/O error occurs
    • read

      public int read(byte[] buffer, int offset, int length) throws IOException
      Reads up to len bytes of data from the input stream into an array of bytes. An attempt is made to read as many as len bytes, but a smaller number may be read. The number of bytes actually read is returned as an integer. This method blocks until input data is available, end of file is detected, or an exception is thrown.
      Overrides:
      read in class InputStream
      Parameters:
      buffer - buffer into which data is read.
      offset - the start offset in array b at which the data is written.
      length - the maximum number of bytes to read.
      Returns:
      the total number of bytes read into the buffer, or -1 if there is no more data because the end of the stream has been reached.
      Throws:
      IOException
    • seek

      public void seek(long pos) throws IOException
      Seek to the given position in the file. Note that pos is a special virtual file pointer, not an actual byte offset.
      Parameters:
      pos - virtual file pointer position
      Throws:
      IOException - if stream is closed or not a file based stream
    • prepareForSeek

      protected void prepareForSeek()
      Performs cleanup required before seek is called on the underlying stream
    • getFilePointer

      public long getFilePointer()
      Returns:
      virtual file pointer that can be passed to seek() to return to the current position. This is not an actual byte offset, so arithmetic on file pointers cannot be done to determine the distance between the two.
    • getPosition

      public long getPosition()
      Description copied from interface: LocationAware
      The current offset, in bytes, of this stream/writer/file. Or, if this is an iterator/producer, the offset (in bytes) of the END of the most recently returned record (since a produced record corresponds to something that has been read already). See class javadoc for more. Note that for BGZF files, this does not represent an actually file position, but a virtual file pointer.
      Specified by:
      getPosition in interface LocationAware
    • getFileBlock

      public static long getFileBlock(long bgzfOffset)
    • isValidFile

      public static boolean isValidFile(InputStream stream) throws IOException
      Parameters:
      stream - Must be at start of file. Throws RuntimeException if !stream.markSupported().
      Returns:
      true if the given file looks like a valid BGZF file.
      Throws:
      IOException
    • nextBlock

      protected BlockCompressedInputStream.DecompressedBlock nextBlock(byte[] bufferAvailableForReuse)
      Reads and decompresses the next block
      Parameters:
      bufferAvailableForReuse - decompression buffer available for reuse
      Returns:
      next block in the decompressed stream
    • processNextBlock

      protected BlockCompressedInputStream.DecompressedBlock processNextBlock(byte[] bufferAvailableForReuse)
      Decompress the next block from the input stream. When using asynchronous IO, this will be called by the background thread.
      Parameters:
      bufferAvailableForReuse - buffer in which to place decompressed block. A null or incorrectly sized buffer will result in the buffer being ignored and a new buffer allocated for decompression.
      Returns:
      next block in input stream
    • checkTermination

      public static BlockCompressedInputStream.FileTermination checkTermination(File file) throws IOException
      Parameters:
      file - the file to check
      Returns:
      status of the last compressed block
      Throws:
      IOException
    • checkTermination

      public static BlockCompressedInputStream.FileTermination checkTermination(Path path) throws IOException
      Parameters:
      path - to the file to check
      Returns:
      status of the last compressed block
      Throws:
      IOException
    • checkTermination

      public static BlockCompressedInputStream.FileTermination checkTermination(SeekableByteChannel channel) throws IOException
      check the status of the final bzgipped block for the given bgzipped resource
      Parameters:
      channel - an open channel to read from, the channel will remain open and the initial position will be restored when the operation completes this makes no guarantee about the state of the channel if an exception is thrown during reading
      Returns:
      the status of the last compressed black
      Throws:
      IOException
    • assertNonDefectiveFile

      @Deprecated public static void assertNonDefectiveFile(File file) throws IOException
      Deprecated.
      Throws:
      IOException
    • assertNonDefectivePath

      public static void assertNonDefectivePath(Path file) throws IOException
      Throws:
      IOException