org.apache.spark.sql.execution.streaming
FileStreamSourceLog 
            Companion object FileStreamSourceLog
          
      class FileStreamSourceLog extends CompactibleFileStreamLog[FileEntry]
- Alphabetic
- By Inheritance
- FileStreamSourceLog
- CompactibleFileStreamLog
- HDFSMetadataLog
- Logging
- MetadataLog
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
-  new FileStreamSourceLog(metadataLogVersion: Int, sparkSession: SparkSession, path: String)
Value Members
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        !=(arg0: Any): Boolean
      
      
      - Definition Classes
- AnyRef → Any
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        ##(): Int
      
      
      - Definition Classes
- AnyRef → Any
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        ==(arg0: Any): Boolean
      
      
      - Definition Classes
- AnyRef → Any
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        add(batchId: Long, logs: Array[FileEntry]): Boolean
      
      
      Store the metadata for the specified batchId and return trueif successful.Store the metadata for the specified batchId and return trueif successful. If the batchId's metadata has already been stored, this method will returnfalse.- Definition Classes
- FileStreamSourceLog → CompactibleFileStreamLog → HDFSMetadataLog → MetadataLog
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        addNewBatchByStream(batchId: Long)(fn: (OutputStream) ⇒ Unit): Boolean
      
      
      Store the metadata for the specified batchId and return trueif successful.Store the metadata for the specified batchId and return trueif successful. This method fills the content of metadata via executing function. If the function throws an exception, writing will be automatically cancelled and this method will propagate the exception.If the batchId's metadata has already been stored, this method will return false.Writing the metadata is done by writing a batch to a temp file then rename it to the batch file. There may be multiple HDFSMetadataLog using the same metadata path. Although it is not a valid behavior, we still need to prevent it from destroying the files. - Definition Classes
- HDFSMetadataLog
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        allFiles(): Array[FileEntry]
      
      
      Returns all files except the deleted ones. Returns all files except the deleted ones. - Definition Classes
- CompactibleFileStreamLog
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        applyFnToBatchByStream[RET](batchId: Long, skipExistingCheck: Boolean = false)(fn: (InputStream) ⇒ RET): RET
      
      
      Apply provided function to each entry in the specific batch metadata log. Apply provided function to each entry in the specific batch metadata log. Unlike get which will materialize all entries into memory, this method streamlines the process via READ-AND-PROCESS. This helps to avoid the memory issue on huge metadata log file. NOTE: This no longer fails early on corruption. The caller should handle the exception properly and make sure the logic is not affected by failing in the middle. - Definition Classes
- HDFSMetadataLog
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        asInstanceOf[T0]: T0
      
      
      - Definition Classes
- Any
 
- 
      
      
      
        
      
    
      
        
        val
      
      
        batchCache: Map[Long, Array[FileEntry]]
      
      
      Cache the latest two batches. Cache the latest two batches. StreamExecution usually just accesses the latest two batches when committing offsets, this cache will save some file system operations. - Attributes
- protected[sql]
- Definition Classes
- HDFSMetadataLog
 
- 
      
      
      
        
      
    
      
        
        val
      
      
        batchFilesFilter: PathFilter
      
      
      A PathFilterto filter only batch filesA PathFilterto filter only batch files- Attributes
- protected
- Definition Classes
- HDFSMetadataLog
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        batchIdToPath(batchId: Long): Path
      
      
      - Definition Classes
- CompactibleFileStreamLog → HDFSMetadataLog
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        clone(): AnyRef
      
      
      - Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
 
- 
      
      
      
        
      
    
      
        final 
        lazy val
      
      
        compactInterval: Int
      
      
      - Attributes
- protected
- Definition Classes
- CompactibleFileStreamLog
 
- 
      
      
      
        
      
    
      
        
        val
      
      
        defaultCompactInterval: Int
      
      
      - Attributes
- protected
- Definition Classes
- FileStreamSourceLog → CompactibleFileStreamLog
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        deserialize(in: InputStream): Array[FileEntry]
      
      
      Read and deserialize the metadata from input stream. Read and deserialize the metadata from input stream. If this method is overridden in a subclass, the overriding method should not close the given input stream, as it will be closed in the caller. - Definition Classes
- CompactibleFileStreamLog → HDFSMetadataLog
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        eq(arg0: AnyRef): Boolean
      
      
      - Definition Classes
- AnyRef
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        equals(arg0: Any): Boolean
      
      
      - Definition Classes
- AnyRef → Any
 
- 
      
      
      
        
      
    
      
        
        val
      
      
        fileCleanupDelayMs: Long
      
      
      If we delete the old files after compaction at once, there is a race condition in S3: other processes may see the old files are deleted but still cannot see the compaction file using "list". If we delete the old files after compaction at once, there is a race condition in S3: other processes may see the old files are deleted but still cannot see the compaction file using "list". The allFileshandles this by looking for the next compaction file directly, however, a live lock may happen if the compaction happens too frequently: one processing keeps deleting old files while another one keeps retrying. Setting a reasonable cleanup delay could avoid it.- Attributes
- protected
- Definition Classes
- FileStreamSourceLog → CompactibleFileStreamLog
 
- 
      
      
      
        
      
    
      
        
        val
      
      
        fileManager: CheckpointFileManager
      
      
      - Attributes
- protected
- Definition Classes
- HDFSMetadataLog
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        filterInBatch(batchId: Long)(predicate: (FileEntry) ⇒ Boolean): Option[Array[FileEntry]]
      
      
      Apply filter on all entries in the specific batch. Apply filter on all entries in the specific batch. - Definition Classes
- CompactibleFileStreamLog
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        finalize(): Unit
      
      
      - Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        foreachInBatch(batchId: Long)(fn: (FileEntry) ⇒ Unit): Unit
      
      
      Apply function on all entries in the specific batch. Apply function on all entries in the specific batch. The method will throw FileNotFoundException if the metadata log file doesn't exist. NOTE: This doesn't fail early on corruption. The caller should handle the exception properly and make sure the logic is not affected by failing in the middle. - Definition Classes
- CompactibleFileStreamLog
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        get(startId: Option[Long], endId: Option[Long]): Array[(Long, Array[FileEntry])]
      
      
      Return metadata for batches between startId (inclusive) and endId (inclusive). Return metadata for batches between startId (inclusive) and endId (inclusive). If startIdisNone, just return all batches before endId (inclusive).- Definition Classes
- FileStreamSourceLog → HDFSMetadataLog → MetadataLog
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        get(batchId: Long): Option[Array[FileEntry]]
      
      
      Return the metadata for the specified batchId if it's stored. Return the metadata for the specified batchId if it's stored. Otherwise, return None. - Definition Classes
- HDFSMetadataLog → MetadataLog
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        getClass(): Class[_]
      
      
      - Definition Classes
- AnyRef → Any
- Annotations
- @native()
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        getLatest(): Option[(Long, Array[FileEntry])]
      
      
      Return the latest batch Id and its metadata if exist. Return the latest batch Id and its metadata if exist. - Definition Classes
- HDFSMetadataLog → MetadataLog
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        getLatestBatchId(): Option[Long]
      
      
      Return the latest batch id without reading the file. Return the latest batch id without reading the file. - Definition Classes
- HDFSMetadataLog
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        getOrderedBatchFiles(): Array[FileStatus]
      
      
      Get an array of [FileStatus] referencing batch files. Get an array of [FileStatus] referencing batch files. The array is sorted by most recent batch file first to oldest batch file. - Definition Classes
- HDFSMetadataLog
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        getPrevBatchFromStorage(batchId: Long): Option[Long]
      
      
      Get the id of the previous batch from storage Get the id of the previous batch from storage - batchId
- get the previous batch id of this batch with batchId 
 - Definition Classes
- HDFSMetadataLog
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        hashCode(): Int
      
      
      - Definition Classes
- AnyRef → Any
- Annotations
- @native()
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
      
      
      - Attributes
- protected
- Definition Classes
- Logging
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        initializeLogIfNecessary(isInterpreter: Boolean): Unit
      
      
      - Attributes
- protected
- Definition Classes
- Logging
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        isBatchFile(path: Path): Boolean
      
      
      - Definition Classes
- CompactibleFileStreamLog → HDFSMetadataLog
 
- 
      
      
      
        
      
    
      
        
        val
      
      
        isDeletingExpiredLog: Boolean
      
      
      - Attributes
- protected
- Definition Classes
- FileStreamSourceLog → CompactibleFileStreamLog
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        isInstanceOf[T0]: Boolean
      
      
      - Definition Classes
- Any
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        isTraceEnabled(): Boolean
      
      
      - Attributes
- protected
- Definition Classes
- Logging
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        listBatches: Array[Long]
      
      
      List the available batches on file system. List the available batches on file system. - Attributes
- protected
- Definition Classes
- HDFSMetadataLog
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        listBatchesOnDisk: Array[Long]
      
      
      List the batches persisted to storage 
- 
      
      
      
        
      
    
      
        
        def
      
      
        log: Logger
      
      
      - Attributes
- protected
- Definition Classes
- Logging
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        logDebug(msg: ⇒ String, throwable: Throwable): Unit
      
      
      - Attributes
- protected
- Definition Classes
- Logging
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        logDebug(msg: ⇒ String): Unit
      
      
      - Attributes
- protected
- Definition Classes
- Logging
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        logError(msg: ⇒ String, throwable: Throwable): Unit
      
      
      - Attributes
- protected
- Definition Classes
- Logging
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        logError(msg: ⇒ String): Unit
      
      
      - Attributes
- protected
- Definition Classes
- Logging
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        logInfo(msg: ⇒ String, throwable: Throwable): Unit
      
      
      - Attributes
- protected
- Definition Classes
- Logging
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        logInfo(msg: ⇒ String): Unit
      
      
      - Attributes
- protected
- Definition Classes
- Logging
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        logName: String
      
      
      - Attributes
- protected
- Definition Classes
- Logging
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        logTrace(msg: ⇒ String, throwable: Throwable): Unit
      
      
      - Attributes
- protected
- Definition Classes
- Logging
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        logTrace(msg: ⇒ String): Unit
      
      
      - Attributes
- protected
- Definition Classes
- Logging
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        logWarning(msg: ⇒ String, throwable: Throwable): Unit
      
      
      - Attributes
- protected
- Definition Classes
- Logging
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        logWarning(msg: ⇒ String): Unit
      
      
      - Attributes
- protected
- Definition Classes
- Logging
 
- 
      
      
      
        
      
    
      
        
        val
      
      
        metadataCacheEnabled: Boolean
      
      
      - Attributes
- protected
- Definition Classes
- HDFSMetadataLog
 
- 
      
      
      
        
      
    
      
        
        val
      
      
        metadataPath: Path
      
      
      - Definition Classes
- HDFSMetadataLog
 
- 
      
      
      
        
      
    
      
        
        val
      
      
        minBatchesToRetain: Int
      
      
      - Attributes
- protected
- Definition Classes
- CompactibleFileStreamLog
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        ne(arg0: AnyRef): Boolean
      
      
      - Definition Classes
- AnyRef
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        notify(): Unit
      
      
      - Definition Classes
- AnyRef
- Annotations
- @native()
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        notifyAll(): Unit
      
      
      - Definition Classes
- AnyRef
- Annotations
- @native()
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        pathToBatchId(path: Path): Long
      
      
      - Definition Classes
- CompactibleFileStreamLog → HDFSMetadataLog
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        purge(thresholdBatchId: Long): Unit
      
      
      CompactibleFileStreamLog maintains logs by itself, and manual purging might break internal state, specifically which latest compaction batch is purged. CompactibleFileStreamLog maintains logs by itself, and manual purging might break internal state, specifically which latest compaction batch is purged. To simplify the situation, this method just throws UnsupportedOperationException regardless of given parameter, and let CompactibleFileStreamLog handles purging by itself. - Definition Classes
- CompactibleFileStreamLog → HDFSMetadataLog → MetadataLog
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        purgeAfter(thresholdBatchId: Long): Unit
      
      
      Removes all log entries later than thresholdBatchId (exclusive). Removes all log entries later than thresholdBatchId (exclusive). - Definition Classes
- HDFSMetadataLog
 
-  def restore(): Array[FileEntry]
- 
      
      
      
        
      
    
      
        
        def
      
      
        serialize(logData: Array[FileEntry], out: OutputStream): Unit
      
      
      Serialize the metadata and write to the output stream. Serialize the metadata and write to the output stream. If this method is overridden in a subclass, the overriding method should not close the given output stream, as it will be closed in the caller. - Definition Classes
- CompactibleFileStreamLog → HDFSMetadataLog
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        shouldRetain(log: FileEntry, currentTime: Long): Boolean
      
      
      Determine whether the log should be retained or not. Determine whether the log should be retained or not. Default implementation retains all log entries. Implementations should override the method to change the behavior. - Definition Classes
- CompactibleFileStreamLog
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        synchronized[T0](arg0: ⇒ T0): T0
      
      
      - Definition Classes
- AnyRef
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        toString(): String
      
      
      - Definition Classes
- AnyRef → Any
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        wait(): Unit
      
      
      - Definition Classes
- AnyRef
- Annotations
- @throws( ... )
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        wait(arg0: Long, arg1: Int): Unit
      
      
      - Definition Classes
- AnyRef
- Annotations
- @throws( ... )
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        wait(arg0: Long): Unit
      
      
      - Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        write(batchMetadataFile: Path, fn: (OutputStream) ⇒ Unit): Unit
      
      
      - Attributes
- protected
- Definition Classes
- HDFSMetadataLog