Class TextRowCountEstimator


  • public abstract class TextRowCountEstimator
    extends java.lang.Object
    This returns a row count estimation for files associated with a file pattern.
    • Constructor Detail

      • TextRowCountEstimator

        public TextRowCountEstimator()
    • Method Detail

      • getNumSampledBytesPerFile

        public abstract long getNumSampledBytesPerFile()
      • getDelimiters

        public abstract byte @Nullable [] getDelimiters()
      • getFilePattern

        public abstract java.lang.String getFilePattern()
      • getCompression

        public abstract Compression getCompression()
      • estimateRowCount

        public java.lang.Double estimateRowCount​(PipelineOptions pipelineOptions)
                                          throws java.io.IOException,
                                                 TextRowCountEstimator.NoEstimationException
        Estimates the number of non empty rows. It samples NumSampledBytesPerFile bytes from every file until the condition in sampling strategy is met. Then it takes the average line size of the rows and divides the total file sizes by that number. If all the sampled rows are empty, and it has not sampled all the lines (due to sampling strategy) it throws Exception.
        Returns:
        Number of estimated rows.
        Throws:
        TextRowCountEstimator.NoEstimationException - if all the sampled lines are empty and we have not read all the lines in the matched files.
        java.io.IOException