Package htsjdk.samtools.util
Class BlockCompressedInputStream
java.lang.Object
java.io.InputStream
htsjdk.samtools.util.BlockCompressedInputStream
- All Implemented Interfaces:
LocationAware
,Closeable
,AutoCloseable
- Direct Known Subclasses:
AsyncBlockCompressedInputStream
Utility class for reading BGZF block compressed files. The caller can treat this file like any other InputStream.
It probably is not necessary to wrap this stream in a buffering stream, because there is internal buffering.
The advantage of BGZF over conventional GZip format is that BGZF allows for seeking without having to read the
entire file up to the location being sought. Note that seeking is only possible if the input stream is seekable.
Note that this implementation is not synchronized. If multiple threads access an instance concurrently, it must be synchronized externally.
c.f. http://samtools.sourceforge.net/SAM1.pdf for details of BGZF format
-
Nested Class Summary
Modifier and TypeClassDescriptionprotected static class
static enum
-
Field Summary
-
Constructor Summary
ConstructorDescriptionFor providing some arbitrary data source.BlockCompressedInputStream
(SeekableStream strm, InflaterFactory inflaterFactory) For providing some arbitrary data source.Use this ctor if you wish to call seek()BlockCompressedInputStream
(File file, InflaterFactory inflaterFactory) Use this ctor if you wish to call seek()Note that seek() is not supported if this ctor is used.BlockCompressedInputStream
(InputStream stream, boolean allowBuffering) Note that seek() is not supported if this ctor is used.BlockCompressedInputStream
(InputStream stream, boolean allowBuffering, InflaterFactory inflaterFactory) Note that seek() is not supported if this ctor is used.BlockCompressedInputStream
(InputStream stream, InflaterFactory inflaterFactory) Note that seek() is not supported if this ctor is used.BlockCompressedInputStream
(URL url, InflaterFactory inflaterFactory) Equivalent constructor for Path as the one that takes a File. -
Method Summary
Modifier and TypeMethodDescriptionstatic void
assertNonDefectiveFile
(File file) Deprecated.static void
assertNonDefectivePath
(Path file) int
checkTermination
(File file) checkTermination
(SeekableByteChannel channel) check the status of the final bzgipped block for the given bgzipped resourcecheckTermination
(Path path) void
close()
Closes the underlying InputStream or RandomAccessFileboolean
static long
getFileBlock
(long bgzfOffset) long
long
The current offset, in bytes, of this stream/writer/file.static boolean
isValidFile
(InputStream stream) nextBlock
(byte[] bufferAvailableForReuse) Reads and decompresses the next blockprotected void
Performs cleanup required before seek is called on the underlying streamprocessNextBlock
(byte[] bufferAvailableForReuse) Decompress the next block from the input stream.int
read()
Reads the next byte of data from the input stream.int
read
(byte[] buffer) Reads some number of bytes from the input stream and stores them into the buffer array b.int
read
(byte[] buffer, int offset, int length) Reads up to len bytes of data from the input stream into an array of bytes.readLine()
Reads a whole line.void
seek
(long pos) Seek to the given position in the file.void
setCheckCrcs
(boolean check) Determines whether or not the inflater will re-calculated the CRC on the decompressed data and check it against the value stored in the GZIP header.Methods inherited from class java.io.InputStream
mark, markSupported, nullInputStream, readAllBytes, readNBytes, readNBytes, reset, skip, skipNBytes, transferTo
-
Field Details
-
INCORRECT_HEADER_SIZE_MSG
- See Also:
-
UNEXPECTED_BLOCK_LENGTH_MSG
- See Also:
-
PREMATURE_END_MSG
- See Also:
-
CANNOT_SEEK_STREAM_MSG
- See Also:
-
CANNOT_SEEK_CLOSED_STREAM_MSG
- See Also:
-
INVALID_FILE_PTR_MSG
- See Also:
-
-
Constructor Details
-
BlockCompressedInputStream
Note that seek() is not supported if this ctor is used.- Parameters:
stream
- source of bytes
-
BlockCompressedInputStream
Note that seek() is not supported if this ctor is used.- Parameters:
stream
- source of bytesinflaterFactory
-InflaterFactory
used byBlockGunzipper
-
BlockCompressedInputStream
Note that seek() is not supported if this ctor is used.- Parameters:
stream
- source of bytesallowBuffering
- if true, allow buffering
-
BlockCompressedInputStream
public BlockCompressedInputStream(InputStream stream, boolean allowBuffering, InflaterFactory inflaterFactory) Note that seek() is not supported if this ctor is used.- Parameters:
stream
- source of bytesallowBuffering
- if true, allow bufferinginflaterFactory
-InflaterFactory
used byBlockGunzipper
-
BlockCompressedInputStream
Use this ctor if you wish to call seek()- Parameters:
file
- source of bytes- Throws:
IOException
-
BlockCompressedInputStream
Equivalent constructor for Path as the one that takes a File. Supports seeking.- Throws:
IOException
-
BlockCompressedInputStream
Use this ctor if you wish to call seek()- Parameters:
file
- source of bytesinflaterFactory
-InflaterFactory
used byBlockGunzipper
- Throws:
IOException
-
BlockCompressedInputStream
- Parameters:
url
- source of bytes
-
BlockCompressedInputStream
- Parameters:
url
- source of bytesinflaterFactory
-InflaterFactory
used byBlockGunzipper
-
BlockCompressedInputStream
For providing some arbitrary data source. No additional buffering is provided, so if the underlying source is not buffered, wrap it in a SeekableBufferedStream before passing to this ctor.- Parameters:
strm
- source of bytes
-
BlockCompressedInputStream
For providing some arbitrary data source. No additional buffering is provided, so if the underlying source is not buffered, wrap it in a SeekableBufferedStream before passing to this ctor.- Parameters:
strm
- source of bytesinflaterFactory
-InflaterFactory
used byBlockGunzipper
-
-
Method Details
-
setCheckCrcs
public void setCheckCrcs(boolean check) Determines whether or not the inflater will re-calculated the CRC on the decompressed data and check it against the value stored in the GZIP header. CRC checking is an expensive operation and should be used accordingly. -
available
- Overrides:
available
in classInputStream
- Returns:
- the number of bytes that can be read (or skipped over) from this input stream without blocking by the next caller of a method for this input stream. The next caller might be the same thread or another thread. Note that although the next caller can read this many bytes without blocking, the available() method call itself may block in order to fill an internal buffer if it has been exhausted.
- Throws:
IOException
-
endOfBlock
public boolean endOfBlock()- Returns:
true
if the stream is at the end of a BGZF block,false
otherwise.
-
close
Closes the underlying InputStream or RandomAccessFile- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceCloseable
- Overrides:
close
in classInputStream
- Throws:
IOException
-
read
Reads the next byte of data from the input stream. The value byte is returned as an int in the range 0 to 255. If no byte is available because the end of the stream has been reached, the value -1 is returned. This method blocks until input data is available, the end of the stream is detected, or an exception is thrown.- Specified by:
read
in classInputStream
- Returns:
- the next byte of data, or -1 if the end of the stream is reached.
- Throws:
IOException
-
read
Reads some number of bytes from the input stream and stores them into the buffer array b. The number of bytes actually read is returned as an integer. This method blocks until input data is available, end of file is detected, or an exception is thrown. read(buf) has the same effect as read(buf, 0, buf.length).- Overrides:
read
in classInputStream
- Parameters:
buffer
- the buffer into which the data is read.- Returns:
- the total number of bytes read into the buffer, or -1 is there is no more data because the end of the stream has been reached.
- Throws:
IOException
-
readLine
Reads a whole line. A line is considered to be terminated by either a line feed ('\n'), carriage return ('\r') or carriage return followed by a line feed ("\r\n").- Returns:
- A String containing the contents of the line, excluding the line terminating character, or null if the end of the stream has been reached
- Throws:
IOException
- If an I/O error occurs
-
read
Reads up to len bytes of data from the input stream into an array of bytes. An attempt is made to read as many as len bytes, but a smaller number may be read. The number of bytes actually read is returned as an integer. This method blocks until input data is available, end of file is detected, or an exception is thrown.- Overrides:
read
in classInputStream
- Parameters:
buffer
- buffer into which data is read.offset
- the start offset in array b at which the data is written.length
- the maximum number of bytes to read.- Returns:
- the total number of bytes read into the buffer, or -1 if there is no more data because the end of the stream has been reached.
- Throws:
IOException
-
seek
Seek to the given position in the file. Note that pos is a special virtual file pointer, not an actual byte offset.- Parameters:
pos
- virtual file pointer position- Throws:
IOException
- if stream is closed or not a file based stream
-
prepareForSeek
protected void prepareForSeek()Performs cleanup required before seek is called on the underlying stream -
getFilePointer
public long getFilePointer()- Returns:
- virtual file pointer that can be passed to seek() to return to the current position. This is not an actual byte offset, so arithmetic on file pointers cannot be done to determine the distance between the two.
-
getPosition
public long getPosition()Description copied from interface:LocationAware
The current offset, in bytes, of this stream/writer/file. Or, if this is an iterator/producer, the offset (in bytes) of the END of the most recently returned record (since a produced record corresponds to something that has been read already). See class javadoc for more. Note that for BGZF files, this does not represent an actually file position, but a virtual file pointer.- Specified by:
getPosition
in interfaceLocationAware
-
getFileBlock
public static long getFileBlock(long bgzfOffset) -
isValidFile
- Parameters:
stream
- Must be at start of file. Throws RuntimeException if !stream.markSupported().- Returns:
- true if the given file looks like a valid BGZF file.
- Throws:
IOException
-
nextBlock
Reads and decompresses the next block- Parameters:
bufferAvailableForReuse
- decompression buffer available for reuse- Returns:
- next block in the decompressed stream
-
processNextBlock
protected BlockCompressedInputStream.DecompressedBlock processNextBlock(byte[] bufferAvailableForReuse) Decompress the next block from the input stream. When using asynchronous IO, this will be called by the background thread.- Parameters:
bufferAvailableForReuse
- buffer in which to place decompressed block. A null or incorrectly sized buffer will result in the buffer being ignored and a new buffer allocated for decompression.- Returns:
- next block in input stream
-
checkTermination
public static BlockCompressedInputStream.FileTermination checkTermination(File file) throws IOException - Parameters:
file
- the file to check- Returns:
- status of the last compressed block
- Throws:
IOException
-
checkTermination
public static BlockCompressedInputStream.FileTermination checkTermination(Path path) throws IOException - Parameters:
path
- to the file to check- Returns:
- status of the last compressed block
- Throws:
IOException
-
checkTermination
public static BlockCompressedInputStream.FileTermination checkTermination(SeekableByteChannel channel) throws IOException check the status of the final bzgipped block for the given bgzipped resource- Parameters:
channel
- an open channel to read from, the channel will remain open and the initial position will be restored when the operation completes this makes no guarantee about the state of the channel if an exception is thrown during reading- Returns:
- the status of the last compressed black
- Throws:
IOException
-
assertNonDefectiveFile
Deprecated.- Throws:
IOException
-
assertNonDefectivePath
- Throws:
IOException
-