Class BlockCompressedInputStream

  • All Implemented Interfaces:
    LocationAware, Closeable, AutoCloseable
    Direct Known Subclasses:
    AsyncBlockCompressedInputStream

    public class BlockCompressedInputStream
    extends InputStream
    implements LocationAware
    Utility class for reading BGZF block compressed files. The caller can treat this file like any other InputStream. It probably is not necessary to wrap this stream in a buffering stream, because there is internal buffering. The advantage of BGZF over conventional GZip format is that BGZF allows for seeking without having to read the entire file up to the location being sought. Note that seeking is only possible if the input stream is seekable. Note that this implementation is not synchronized. If multiple threads access an instance concurrently, it must be synchronized externally. c.f. http://samtools.sourceforge.net/SAM1.pdf for details of BGZF format
    • Constructor Detail

      • BlockCompressedInputStream

        public BlockCompressedInputStream​(InputStream stream)
        Note that seek() is not supported if this ctor is used.
        Parameters:
        stream - source of bytes
      • BlockCompressedInputStream

        public BlockCompressedInputStream​(InputStream stream,
                                          boolean allowBuffering)
        Note that seek() is not supported if this ctor is used.
        Parameters:
        stream - source of bytes
        allowBuffering - if true, allow buffering
      • BlockCompressedInputStream

        public BlockCompressedInputStream​(InputStream stream,
                                          boolean allowBuffering,
                                          InflaterFactory inflaterFactory)
        Note that seek() is not supported if this ctor is used.
        Parameters:
        stream - source of bytes
        allowBuffering - if true, allow buffering
        inflaterFactory - InflaterFactory used by BlockGunzipper
      • BlockCompressedInputStream

        public BlockCompressedInputStream​(File file)
                                   throws IOException
        Use this ctor if you wish to call seek()
        Parameters:
        file - source of bytes
        Throws:
        IOException
      • BlockCompressedInputStream

        public BlockCompressedInputStream​(URL url)
        Parameters:
        url - source of bytes
      • BlockCompressedInputStream

        public BlockCompressedInputStream​(SeekableStream strm)
        For providing some arbitrary data source. No additional buffering is provided, so if the underlying source is not buffered, wrap it in a SeekableBufferedStream before passing to this ctor.
        Parameters:
        strm - source of bytes
      • BlockCompressedInputStream

        public BlockCompressedInputStream​(SeekableStream strm,
                                          InflaterFactory inflaterFactory)
        For providing some arbitrary data source. No additional buffering is provided, so if the underlying source is not buffered, wrap it in a SeekableBufferedStream before passing to this ctor.
        Parameters:
        strm - source of bytes
        inflaterFactory - InflaterFactory used by BlockGunzipper
    • Method Detail

      • setCheckCrcs

        public void setCheckCrcs​(boolean check)
        Determines whether or not the inflater will re-calculated the CRC on the decompressed data and check it against the value stored in the GZIP header. CRC checking is an expensive operation and should be used accordingly.
      • available

        public int available()
                      throws IOException
        Overrides:
        available in class InputStream
        Returns:
        the number of bytes that can be read (or skipped over) from this input stream without blocking by the next caller of a method for this input stream. The next caller might be the same thread or another thread. Note that although the next caller can read this many bytes without blocking, the available() method call itself may block in order to fill an internal buffer if it has been exhausted.
        Throws:
        IOException
      • endOfBlock

        public boolean endOfBlock()
        Returns:
        true if the stream is at the end of a BGZF block, false otherwise.
      • read

        public int read()
                 throws IOException
        Reads the next byte of data from the input stream. The value byte is returned as an int in the range 0 to 255. If no byte is available because the end of the stream has been reached, the value -1 is returned. This method blocks until input data is available, the end of the stream is detected, or an exception is thrown.
        Specified by:
        read in class InputStream
        Returns:
        the next byte of data, or -1 if the end of the stream is reached.
        Throws:
        IOException
      • read

        public int read​(byte[] buffer)
                 throws IOException
        Reads some number of bytes from the input stream and stores them into the buffer array b. The number of bytes actually read is returned as an integer. This method blocks until input data is available, end of file is detected, or an exception is thrown. read(buf) has the same effect as read(buf, 0, buf.length).
        Overrides:
        read in class InputStream
        Parameters:
        buffer - the buffer into which the data is read.
        Returns:
        the total number of bytes read into the buffer, or -1 is there is no more data because the end of the stream has been reached.
        Throws:
        IOException
      • readLine

        public String readLine()
                        throws IOException
        Reads a whole line. A line is considered to be terminated by either a line feed ('\n'), carriage return ('\r') or carriage return followed by a line feed ("\r\n").
        Returns:
        A String containing the contents of the line, excluding the line terminating character, or null if the end of the stream has been reached
        Throws:
        IOException - If an I/O error occurs
      • read

        public int read​(byte[] buffer,
                        int offset,
                        int length)
                 throws IOException
        Reads up to len bytes of data from the input stream into an array of bytes. An attempt is made to read as many as len bytes, but a smaller number may be read. The number of bytes actually read is returned as an integer. This method blocks until input data is available, end of file is detected, or an exception is thrown.
        Overrides:
        read in class InputStream
        Parameters:
        buffer - buffer into which data is read.
        offset - the start offset in array b at which the data is written.
        length - the maximum number of bytes to read.
        Returns:
        the total number of bytes read into the buffer, or -1 if there is no more data because the end of the stream has been reached.
        Throws:
        IOException
      • seek

        public void seek​(long pos)
                  throws IOException
        Seek to the given position in the file. Note that pos is a special virtual file pointer, not an actual byte offset.
        Parameters:
        pos - virtual file pointer position
        Throws:
        IOException - if stream is closed or not a file based stream
      • prepareForSeek

        protected void prepareForSeek()
        Performs cleanup required before seek is called on the underlying stream
      • getFilePointer

        public long getFilePointer()
        Returns:
        virtual file pointer that can be passed to seek() to return to the current position. This is not an actual byte offset, so arithmetic on file pointers cannot be done to determine the distance between the two.
      • getPosition

        public long getPosition()
        Description copied from interface: LocationAware
        The current offset, in bytes, of this stream/writer/file. Or, if this is an iterator/producer, the offset (in bytes) of the END of the most recently returned record (since a produced record corresponds to something that has been read already). See class javadoc for more. Note that for BGZF files, this does not represent an actually file position, but a virtual file pointer.
        Specified by:
        getPosition in interface LocationAware
      • getFileBlock

        public static long getFileBlock​(long bgzfOffset)
      • isValidFile

        public static boolean isValidFile​(InputStream stream)
                                   throws IOException
        Parameters:
        stream - Must be at start of file. Throws RuntimeException if !stream.markSupported().
        Returns:
        true if the given file looks like a valid BGZF file.
        Throws:
        IOException
      • nextBlock

        protected BlockCompressedInputStream.DecompressedBlock nextBlock​(byte[] bufferAvailableForReuse)
        Reads and decompresses the next block
        Parameters:
        bufferAvailableForReuse - decompression buffer available for reuse
        Returns:
        next block in the decompressed stream
      • processNextBlock

        protected BlockCompressedInputStream.DecompressedBlock processNextBlock​(byte[] bufferAvailableForReuse)
        Decompress the next block from the input stream. When using asynchronous IO, this will be called by the background thread.
        Parameters:
        bufferAvailableForReuse - buffer in which to place decompressed block. A null or incorrectly sized buffer will result in the buffer being ignored and a new buffer allocated for decompression.
        Returns:
        next block in input stream
      • checkTermination

        public static BlockCompressedInputStream.FileTermination checkTermination​(SeekableByteChannel channel)
                                                                           throws IOException
        check the status of the final bzgipped block for the given bgzipped resource
        Parameters:
        channel - an open channel to read from, the channel will remain open and the initial position will be restored when the operation completes this makes no guarantee about the state of the channel if an exception is thrown during reading
        Returns:
        the status of the last compressed black
        Throws:
        IOException