Class OffsetBasedSource<T>

    • Constructor Detail

      • OffsetBasedSource

        public OffsetBasedSource​(long startOffset,
                                 long endOffset,
                                 long minBundleSize)
        Parameters:
        startOffset - starting offset (inclusive) of the source. Must be non-negative.
        endOffset - ending offset (exclusive) of the source. Use Long.MAX_VALUE to indicate that the entire source after startOffset should be read. Must be > startOffset.
        minBundleSize - minimum bundle size in offset units that should be used when splitting the source into sub-sources. This value may not be respected if the total range of the source is smaller than the specified minBundleSize. Must be non-negative.
    • Method Detail

      • getStartOffset

        public long getStartOffset()
        Returns the starting offset of the source.
      • getMinBundleSize

        public long getMinBundleSize()
        Returns the minimum bundle size that should be used when splitting the source into sub-sources. This value may not be respected if the total range of the source is smaller than the specified minBundleSize.
      • getEstimatedSizeBytes

        public long getEstimatedSizeBytes​(PipelineOptions options)
                                   throws java.lang.Exception
        Description copied from class: BoundedSource
        An estimate of the total size (in bytes) of the data that would be read from this source. This estimate is in terms of external storage size, before any decompression or other processing done by the reader.

        If there is no way to estimate the size of the source implementations MAY return 0L.

        Specified by:
        getEstimatedSizeBytes in class BoundedSource<T>
        Throws:
        java.lang.Exception
      • split

        public java.util.List<? extends OffsetBasedSource<T>> split​(long desiredBundleSizeBytes,
                                                                    PipelineOptions options)
                                                             throws java.lang.Exception
        Description copied from class: BoundedSource
        Splits the source into bundles of approximately desiredBundleSizeBytes.
        Specified by:
        split in class BoundedSource<T>
        Throws:
        java.lang.Exception
      • validate

        public void validate()
        Description copied from class: Source
        Checks that this source is valid, before it can be used in a pipeline.

        It is recommended to use Preconditions for implementing this method.

        Overrides:
        validate in class Source<T>
      • toString

        public java.lang.String toString()
        Overrides:
        toString in class java.lang.Object
      • getMaxEndOffset

        public abstract long getMaxEndOffset​(PipelineOptions options)
                                      throws java.lang.Exception
        Returns the actual ending offset of the current source. The value returned by this function will be used to clip the end of the range [startOffset, endOffset) such that the range used is [startOffset, min(endOffset, maxEndOffset)).

        As an example in which OffsetBasedSource is used to implement a file source, suppose that this source was constructed with an endOffset of Long.MAX_VALUE to indicate that a file should be read to the end. Then this function should determine the actual, exact size of the file in bytes and return it.

        Throws:
        java.lang.Exception
      • createSourceForSubrange

        public abstract OffsetBasedSource<T> createSourceForSubrange​(long start,
                                                                     long end)
        Returns an OffsetBasedSource for a subrange of the current source. The subrange [start, end) must be within the range [startOffset, endOffset) of the current source, i.e. startOffset <= start < end <= endOffset.
      • populateDisplayData

        public void populateDisplayData​(DisplayData.Builder builder)
        Description copied from class: Source
        Register display data for the given transform or component.

        populateDisplayData(DisplayData.Builder) is invoked by Pipeline runners to collect display data via DisplayData.from(HasDisplayData). Implementations may call super.populateDisplayData(builder) in order to register display data in the current namespace, but should otherwise use subcomponent.populateDisplayData(builder) to use the namespace of the subcomponent.

        By default, does not register any display data. Implementors may override this method to provide their own display data.

        Specified by:
        populateDisplayData in interface HasDisplayData
        Overrides:
        populateDisplayData in class Source<T>
        Parameters:
        builder - The builder to populate with display data.
        See Also:
        HasDisplayData