S3AInputStream (Apache Hadoop Amazon Web Services support 3.4.0 API)

java.lang.Object
- java.io.InputStream
- - org.apache.hadoop.fs.FSInputStream
  - - org.apache.hadoop.fs.s3a.S3AInputStream

All Implemented Interfaces:

Closeable, AutoCloseable, org.apache.hadoop.fs.CanSetReadahead, org.apache.hadoop.fs.CanUnbuffer, org.apache.hadoop.fs.PositionedReadable, org.apache.hadoop.fs.Seekable, org.apache.hadoop.fs.statistics.IOStatisticsSource, org.apache.hadoop.fs.StreamCapabilities
```
@InterfaceAudience.Private
 @InterfaceStability.Evolving
public class S3AInputStream
extends org.apache.hadoop.fs.FSInputStream
implements org.apache.hadoop.fs.CanSetReadahead, org.apache.hadoop.fs.CanUnbuffer, org.apache.hadoop.fs.StreamCapabilities, org.apache.hadoop.fs.statistics.IOStatisticsSource
```
The input stream for an S3A object. As this stream seeks withing an object, it may close then re-open the stream. When this happens, any updated stream data may be retrieved, and, given the consistency model of Amazon S3, outdated data may in fact be picked up. As a result, the outcome of reading from a stream of an object which is actively manipulated during the read process is "undefined". The class is marked as private as code should not be creating instances themselves. Any extra feature (e.g instrumentation) should be considered unstable. Because it prints some of the state of the instrumentation, the output of toString() must also be considered unstable.

Nested Class Summary

Nested Classes
Modifier and Type Class and Description

static interface S3AInputStream.InputStreamCallbacks
Callbacks for input stream IO.
- Nested classes/interfaces inherited from interface org.apache.hadoop.fs.StreamCapabilities
  org.apache.hadoop.fs.StreamCapabilities.StreamCapability

Nested Classes
Modifier and Type	Class and Description
`static interface`	`S3AInputStream.InputStreamCallbacks` Callbacks for input stream IO.

Field Summary

Fields
Modifier and Type Field and Description

static String E_NEGATIVE_READAHEAD_VALUE

static String OPERATION_OPEN

static String OPERATION_REOPEN
- Fields inherited from interface org.apache.hadoop.fs.StreamCapabilities
  ABORTABLE_STREAM, DROPBEHIND, HFLUSH, HSYNC, IOSTATISTICS, IOSTATISTICS_CONTEXT, PREADBYTEBUFFER, READAHEAD, READBYTEBUFFER, UNBUFFER, VECTOREDIO

Fields
Modifier and Type	Field and Description
`static String`	`E_NEGATIVE_READAHEAD_VALUE`
`static String`	`OPERATION_OPEN`
`static String`	`OPERATION_REOPEN`

Constructor Summary

Constructors
Constructor and Description
`S3AInputStream(S3AReadOpContext ctx, S3ObjectAttributes s3Attributes, S3AInputStream.InputStreamCallbacks client, S3AInputStreamStatistics streamStatistics, ExecutorService boundedThreadPool)` Create the stream.

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`int`	`available()`
`void`	`close()` Close the stream.
`long`	`getContentRangeFinish()`
`long`	`getContentRangeStart()`
`S3AInputPolicy`	`getInputPolicy()` Get the current input policy.
`org.apache.hadoop.fs.statistics.IOStatistics`	`getIOStatistics()`
`long`	`getPos()`
`long`	`getReadahead()` Get the current readahead value.
`S3AInputStreamStatistics`	`getS3AStreamStatistics()` Access the input stream statistics.
`software.amazon.awssdk.core.ResponseInputStream<software.amazon.awssdk.services.s3.model.GetObjectResponse>`	`getWrappedStream()` Get the wrapped stream.
`boolean`	`hasCapability(String capability)`
`boolean`	`isObjectStreamOpen()` Is the inner object stream open?
`boolean`	`markSupported()`
`int`	`maxReadSizeForVectorReads()` .
`int`	`minSeekForVectorReads()` .
`int`	`read()`
`int`	`read(byte[] buf, int off, int len)` This updates the statistics on read operations started and whether or not the read operation "completed", that is: returned the exact number of bytes requested.
`void`	`readFully(long position, byte[] buffer, int offset, int length)` Subclass `readFully()` operation which only seeks at the start of the series of operations; seeking back at the end.
`void`	`readVectored(List<? extends org.apache.hadoop.fs.FileRange> ranges, IntFunction<ByteBuffer> allocate)` Vectored read implementation for S3AInputStream.
`long`	`remainingInCurrentRequest()` Bytes left in the current request.
`long`	`remainingInFile()` Bytes left in stream.
`boolean`	`resetConnection()` Forcibly reset the stream, by aborting the connection.
`void`	`seek(long targetPos)`
`boolean`	`seekToNewSource(long targetPos)`
`void`	`setReadahead(Long readahead)`
`String`	`toString()` String value includes statistics as well as stream state.
`void`	`unbuffer()` Closes the underlying S3 stream, and merges the `streamStatistics` instance associated with the stream.
`static long`	`validateReadahead(Long readahead)` from a possibly null Long value, return a valid readahead.

Methods inherited from class org.apache.hadoop.fs.FSInputStream
read, readFully, validatePositionedReadArgs

Methods inherited from class java.io.InputStream
mark, read, reset, skip

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

- Field Detail
  - E_NEGATIVE_READAHEAD_VALUE
```
public static final String E_NEGATIVE_READAHEAD_VALUE
```
    See Also:
    
    Constant Field Values
  - OPERATION_OPEN
```
public static final String OPERATION_OPEN
```
    See Also:
    
    Constant Field Values
  - OPERATION_REOPEN
```
public static final String OPERATION_REOPEN
```
    See Also:
    
    Constant Field Values
- Constructor Detail
  - S3AInputStream
```
public S3AInputStream(S3AReadOpContext ctx,
                      S3ObjectAttributes s3Attributes,
                      S3AInputStream.InputStreamCallbacks client,
                      S3AInputStreamStatistics streamStatistics,
                      ExecutorService boundedThreadPool)
```
    Create the stream. This does not attempt to open it; that is only done on the first actual read() operation.
    
    Parameters:
    
    ctx - operation context
    
    s3Attributes - object attributes
    
    client - S3 client to use
    
    streamStatistics - stream io stats.
    
    boundedThreadPool - thread pool to use.
- Method Detail
  - getInputPolicy
```
@VisibleForTesting
public S3AInputPolicy getInputPolicy()
```
    Get the current input policy.
    
    Returns:
    
    input policy.
  - getPos
```
public long getPos()
            throws IOException
```
    Specified by:
    
    getPos in interface org.apache.hadoop.fs.Seekable
    
    Specified by:
    
    getPos in class org.apache.hadoop.fs.FSInputStream
    
    Throws:
    
    IOException
  - seek
```
public void seek(long targetPos)
          throws IOException
```
    Specified by:
    
    seek in interface org.apache.hadoop.fs.Seekable
    
    Specified by:
    
    seek in class org.apache.hadoop.fs.FSInputStream
    
    Throws:
    
    IOException
  - seekToNewSource
```
public boolean seekToNewSource(long targetPos)
                        throws IOException
```
    Specified by:
    
    seekToNewSource in interface org.apache.hadoop.fs.Seekable
    
    Specified by:
    
    seekToNewSource in class org.apache.hadoop.fs.FSInputStream
    
    Throws:
    
    IOException
  - read
```
@Retries.RetryTranslated
public int read()
                                  throws IOException
```
    Specified by:
    
    read in class InputStream
    
    Throws:
    
    IOException
  - read
```
@Retries.RetryTranslated
public int read(byte[] buf,
                                         int off,
                                         int len)
                                  throws IOException
```
    This updates the statistics on read operations started and whether or not the read operation "completed", that is: returned the exact number of bytes requested.
    
    Overrides:
    
    read in class InputStream
    
    Throws:
    
    IOException - if there are other problems
  - close
```
public void close()
           throws IOException
```
    Close the stream. This triggers publishing of the stream statistics back to the filesystem statistics. This operation is synchronized, so that only one thread can attempt to close the connection; all later/blocked calls are no-ops.
    
    Specified by:
    
    close in interface Closeable
    
    Specified by:
    
    close in interface AutoCloseable
    
    Overrides:
    
    close in class InputStream
    
    Throws:
    
    IOException - on any problem
  - resetConnection
```
@InterfaceStability.Unstable
public boolean resetConnection()
                                                     throws IOException
```
    Forcibly reset the stream, by aborting the connection. The next read() operation will trigger the opening of a new HTTPS connection. This is potentially very inefficient, and should only be invoked in extreme circumstances. It logs at info for this reason. Blocks until the abort is completed.
    
    Returns:
    
    true if the connection was actually reset.
    
    Throws:
    
    IOException - if invoked on a closed stream.
  - available
```
public int available()
              throws IOException
```
    Overrides:
    
    available in class InputStream
    
    Throws:
    
    IOException
  - remainingInFile
```
@InterfaceAudience.Private
 @InterfaceStability.Unstable
public long remainingInFile()
```
    Bytes left in stream.
    
    Returns:
    
    how many bytes are left to read
  - remainingInCurrentRequest
```
@InterfaceAudience.Private
 @InterfaceStability.Unstable
public long remainingInCurrentRequest()
```
    Bytes left in the current request. Only valid if there is an active request.
    
    Returns:
    
    how many bytes are left to read in the current GET.
  - getContentRangeFinish
```
@InterfaceAudience.Private
 @InterfaceStability.Unstable
public long getContentRangeFinish()
```
  - getContentRangeStart
```
@InterfaceAudience.Private
 @InterfaceStability.Unstable
public long getContentRangeStart()
```
  - markSupported
```
public boolean markSupported()
```
    Overrides:
    
    markSupported in class InputStream
  - toString
```
@InterfaceStability.Unstable
public String toString()
```
    String value includes statistics as well as stream state. Important: there are no guarantees as to the stability of this value.
    
    Overrides:
    
    toString in class org.apache.hadoop.fs.FSInputStream
    
    Returns:
    
    a string value for printing in logs/diagnostics
  - readFully
```
@Retries.RetryTranslated
public void readFully(long position,
                                               byte[] buffer,
                                               int offset,
                                               int length)
                                        throws IOException
```
    Subclass readFully() operation which only seeks at the start of the series of operations; seeking back at the end. This is significantly higher performance if multiple read attempts are needed to fetch the data, as it does not break the HTTP connection. To maintain thread safety requirements, this operation is synchronized for the duration of the sequence.
    
    Specified by:
    
    readFully in interface org.apache.hadoop.fs.PositionedReadable
    
    Overrides:
    
    readFully in class org.apache.hadoop.fs.FSInputStream
    
    Throws:
    
    IOException
  - minSeekForVectorReads
```
public int minSeekForVectorReads()
```
    .
    
    Specified by:
    
    minSeekForVectorReads in interface org.apache.hadoop.fs.PositionedReadable
  - maxReadSizeForVectorReads
```
public int maxReadSizeForVectorReads()
```
    .
    
    Specified by:
    
    maxReadSizeForVectorReads in interface org.apache.hadoop.fs.PositionedReadable
  - readVectored
```
public void readVectored(List<? extends org.apache.hadoop.fs.FileRange> ranges,
                         IntFunction<ByteBuffer> allocate)
                  throws IOException
```
    Vectored read implementation for S3AInputStream.
    
    Specified by:
    
    readVectored in interface org.apache.hadoop.fs.PositionedReadable
    
    Parameters:
    
    ranges - the byte ranges to read.
    
    allocate - the function to allocate ByteBuffer.
    
    Throws:
    
    IOException - IOE if any.
  - getS3AStreamStatistics
```
@InterfaceAudience.Private
 @InterfaceStability.Unstable
 @VisibleForTesting
public S3AInputStreamStatistics getS3AStreamStatistics()
```
    Access the input stream statistics. This is for internal testing and may be removed without warning.
    
    Returns:
    
    the statistics for this input stream
  - setReadahead
```
public void setReadahead(Long readahead)
```
    Specified by:
    
    setReadahead in interface org.apache.hadoop.fs.CanSetReadahead
  - getReadahead
```
public long getReadahead()
```
    Get the current readahead value.
    
    Returns:
    
    a non-negative readahead value
  - validateReadahead
```
public static long validateReadahead(@Nullable
                                     Long readahead)
```
    from a possibly null Long value, return a valid readahead.
    
    Parameters:
    
    readahead - new readahead
    
    Returns:
    
    a natural number.
    
    Throws:
    
    IllegalArgumentException - if the range is invalid.
  - unbuffer
```
public void unbuffer()
```
    Closes the underlying S3 stream, and merges the streamStatistics instance associated with the stream. Also sets the stopVectoredIOOperations flag to true such that active vectored read operations are terminated. However termination of old vectored reads are not guaranteed if a new vectored read operation is initiated after unbuffer is called.
    
    Specified by:
    
    unbuffer in interface org.apache.hadoop.fs.CanUnbuffer
  - hasCapability
```
public boolean hasCapability(String capability)
```
    Specified by:
    
    hasCapability in interface org.apache.hadoop.fs.StreamCapabilities
  - isObjectStreamOpen
```
@VisibleForTesting
public boolean isObjectStreamOpen()
```
    Is the inner object stream open?
    
    Returns:
    
    true if there is an active HTTP request to S3.
  - getIOStatistics
```
public org.apache.hadoop.fs.statistics.IOStatistics getIOStatistics()
```
    Specified by:
    
    getIOStatistics in interface org.apache.hadoop.fs.statistics.IOStatisticsSource
  - getWrappedStream
```
@VisibleForTesting
public software.amazon.awssdk.core.ResponseInputStream<software.amazon.awssdk.services.s3.model.GetObjectResponse> getWrappedStream()
```
    Get the wrapped stream. This is for testing only.
    
    Returns:
    
    the wrapped stream, or null if there is none.

Class S3AInputStream

Nested Class Summary

Nested classes/interfaces inherited from interface org.apache.hadoop.fs.StreamCapabilities

Field Summary

Fields inherited from interface org.apache.hadoop.fs.StreamCapabilities

Constructor Summary

Method Summary

Methods inherited from class org.apache.hadoop.fs.FSInputStream

Methods inherited from class java.io.InputStream

Methods inherited from class java.lang.Object

Field Detail

E_NEGATIVE_READAHEAD_VALUE

OPERATION_OPEN

OPERATION_REOPEN

Constructor Detail

S3AInputStream

Method Detail

getInputPolicy

getPos

seek

seekToNewSource

read

read

close

resetConnection

available

remainingInFile

remainingInCurrentRequest

getContentRangeFinish

getContentRangeStart

markSupported

toString

readFully

minSeekForVectorReads

maxReadSizeForVectorReads

readVectored

getS3AStreamStatistics

setReadahead

getReadahead

validateReadahead

unbuffer

hasCapability

isObjectStreamOpen

getIOStatistics

getWrappedStream