class HDFSMetadataLog[T <: AnyRef] extends MetadataLog[T] with Logging
A MetadataLog implementation based on HDFS. HDFSMetadataLog uses the specified path
as the metadata storage.
When writing a new batch, HDFSMetadataLog will firstly write to a temp file and then rename it to the final batch file. If the rename step fails, there must be multiple writers and only one of them will succeed and the others will fail.
Note: HDFSMetadataLog doesn't support S3-like file systems as they don't guarantee listing files in a directory always shows the latest files.
- Alphabetic
- By Inheritance
- HDFSMetadataLog
- Logging
- MetadataLog
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
-  new HDFSMetadataLog(sparkSession: SparkSession, path: String)(implicit arg0: ClassTag[T])
Value Members
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        !=(arg0: Any): Boolean
      
      
      - Definition Classes
- AnyRef → Any
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        ##(): Int
      
      
      - Definition Classes
- AnyRef → Any
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        ==(arg0: Any): Boolean
      
      
      - Definition Classes
- AnyRef → Any
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        add(batchId: Long, metadata: T): Boolean
      
      
      Store the metadata for the specified batchId and return trueif successful.Store the metadata for the specified batchId and return trueif successful. If the batchId's metadata has already been stored, this method will returnfalse.- Definition Classes
- HDFSMetadataLog → MetadataLog
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        addNewBatchByStream(batchId: Long)(fn: (OutputStream) ⇒ Unit): Boolean
      
      
      Store the metadata for the specified batchId and return trueif successful.Store the metadata for the specified batchId and return trueif successful. This method fills the content of metadata via executing function. If the function throws an exception, writing will be automatically cancelled and this method will propagate the exception.If the batchId's metadata has already been stored, this method will return false.Writing the metadata is done by writing a batch to a temp file then rename it to the batch file. There may be multiple HDFSMetadataLog using the same metadata path. Although it is not a valid behavior, we still need to prevent it from destroying the files. 
- 
      
      
      
        
      
    
      
        
        def
      
      
        applyFnToBatchByStream[RET](batchId: Long, skipExistingCheck: Boolean = false)(fn: (InputStream) ⇒ RET): RET
      
      
      Apply provided function to each entry in the specific batch metadata log. Apply provided function to each entry in the specific batch metadata log. Unlike get which will materialize all entries into memory, this method streamlines the process via READ-AND-PROCESS. This helps to avoid the memory issue on huge metadata log file. NOTE: This no longer fails early on corruption. The caller should handle the exception properly and make sure the logic is not affected by failing in the middle. 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        asInstanceOf[T0]: T0
      
      
      - Definition Classes
- Any
 
- 
      
      
      
        
      
    
      
        
        val
      
      
        batchCache: Map[Long, T]
      
      
      Cache the latest two batches. Cache the latest two batches. StreamExecution usually just accesses the latest two batches when committing offsets, this cache will save some file system operations. - Attributes
- protected[sql]
 
- 
      
      
      
        
      
    
      
        
        val
      
      
        batchFilesFilter: PathFilter
      
      
      A PathFilterto filter only batch filesA PathFilterto filter only batch files- Attributes
- protected
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        batchIdToPath(batchId: Long): Path
      
      
      - Attributes
- protected
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        clone(): AnyRef
      
      
      - Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        deserialize(in: InputStream): T
      
      
      Read and deserialize the metadata from input stream. Read and deserialize the metadata from input stream. If this method is overridden in a subclass, the overriding method should not close the given input stream, as it will be closed in the caller. - Attributes
- protected
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        eq(arg0: AnyRef): Boolean
      
      
      - Definition Classes
- AnyRef
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        equals(arg0: Any): Boolean
      
      
      - Definition Classes
- AnyRef → Any
 
- 
      
      
      
        
      
    
      
        
        val
      
      
        fileManager: CheckpointFileManager
      
      
      - Attributes
- protected
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        finalize(): Unit
      
      
      - Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        get(startId: Option[Long], endId: Option[Long]): Array[(Long, T)]
      
      
      Return metadata for batches between startId (inclusive) and endId (inclusive). Return metadata for batches between startId (inclusive) and endId (inclusive). If startIdisNone, just return all batches before endId (inclusive).- Definition Classes
- HDFSMetadataLog → MetadataLog
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        get(batchId: Long): Option[T]
      
      
      Return the metadata for the specified batchId if it's stored. Return the metadata for the specified batchId if it's stored. Otherwise, return None. - Definition Classes
- HDFSMetadataLog → MetadataLog
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        getClass(): Class[_]
      
      
      - Definition Classes
- AnyRef → Any
- Annotations
- @native()
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        getLatest(): Option[(Long, T)]
      
      
      Return the latest batch Id and its metadata if exist. Return the latest batch Id and its metadata if exist. - Definition Classes
- HDFSMetadataLog → MetadataLog
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        getLatestBatchId(): Option[Long]
      
      
      Return the latest batch id without reading the file. 
- 
      
      
      
        
      
    
      
        
        def
      
      
        getOrderedBatchFiles(): Array[FileStatus]
      
      
      Get an array of [FileStatus] referencing batch files. Get an array of [FileStatus] referencing batch files. The array is sorted by most recent batch file first to oldest batch file. 
- 
      
      
      
        
      
    
      
        
        def
      
      
        getPrevBatchFromStorage(batchId: Long): Option[Long]
      
      
      Get the id of the previous batch from storage Get the id of the previous batch from storage - batchId
- get the previous batch id of this batch with batchId 
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        hashCode(): Int
      
      
      - Definition Classes
- AnyRef → Any
- Annotations
- @native()
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
      
      
      - Attributes
- protected
- Definition Classes
- Logging
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        initializeLogIfNecessary(isInterpreter: Boolean): Unit
      
      
      - Attributes
- protected
- Definition Classes
- Logging
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        isBatchFile(path: Path): Boolean
      
      
      - Attributes
- protected
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        isInstanceOf[T0]: Boolean
      
      
      - Definition Classes
- Any
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        isTraceEnabled(): Boolean
      
      
      - Attributes
- protected
- Definition Classes
- Logging
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        listBatches: Array[Long]
      
      
      List the available batches on file system. List the available batches on file system. - Attributes
- protected
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        listBatchesOnDisk: Array[Long]
      
      
      List the batches persisted to storage List the batches persisted to storage - returns
- array of batches ids 
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        log: Logger
      
      
      - Attributes
- protected
- Definition Classes
- Logging
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        logDebug(msg: ⇒ String, throwable: Throwable): Unit
      
      
      - Attributes
- protected
- Definition Classes
- Logging
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        logDebug(msg: ⇒ String): Unit
      
      
      - Attributes
- protected
- Definition Classes
- Logging
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        logError(msg: ⇒ String, throwable: Throwable): Unit
      
      
      - Attributes
- protected
- Definition Classes
- Logging
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        logError(msg: ⇒ String): Unit
      
      
      - Attributes
- protected
- Definition Classes
- Logging
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        logInfo(msg: ⇒ String, throwable: Throwable): Unit
      
      
      - Attributes
- protected
- Definition Classes
- Logging
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        logInfo(msg: ⇒ String): Unit
      
      
      - Attributes
- protected
- Definition Classes
- Logging
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        logName: String
      
      
      - Attributes
- protected
- Definition Classes
- Logging
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        logTrace(msg: ⇒ String, throwable: Throwable): Unit
      
      
      - Attributes
- protected
- Definition Classes
- Logging
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        logTrace(msg: ⇒ String): Unit
      
      
      - Attributes
- protected
- Definition Classes
- Logging
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        logWarning(msg: ⇒ String, throwable: Throwable): Unit
      
      
      - Attributes
- protected
- Definition Classes
- Logging
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        logWarning(msg: ⇒ String): Unit
      
      
      - Attributes
- protected
- Definition Classes
- Logging
 
- 
      
      
      
        
      
    
      
        
        val
      
      
        metadataCacheEnabled: Boolean
      
      
      - Attributes
- protected
 
-  val metadataPath: Path
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        ne(arg0: AnyRef): Boolean
      
      
      - Definition Classes
- AnyRef
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        notify(): Unit
      
      
      - Definition Classes
- AnyRef
- Annotations
- @native()
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        notifyAll(): Unit
      
      
      - Definition Classes
- AnyRef
- Annotations
- @native()
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        pathToBatchId(path: Path): Long
      
      
      - Attributes
- protected
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        purge(thresholdBatchId: Long): Unit
      
      
      Removes all the log entry earlier than thresholdBatchId (exclusive). Removes all the log entry earlier than thresholdBatchId (exclusive). - Definition Classes
- HDFSMetadataLog → MetadataLog
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        purgeAfter(thresholdBatchId: Long): Unit
      
      
      Removes all log entries later than thresholdBatchId (exclusive). 
- 
      
      
      
        
      
    
      
        
        def
      
      
        serialize(metadata: T, out: OutputStream): Unit
      
      
      Serialize the metadata and write to the output stream. Serialize the metadata and write to the output stream. If this method is overridden in a subclass, the overriding method should not close the given output stream, as it will be closed in the caller. - Attributes
- protected
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        synchronized[T0](arg0: ⇒ T0): T0
      
      
      - Definition Classes
- AnyRef
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        toString(): String
      
      
      - Definition Classes
- AnyRef → Any
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        wait(): Unit
      
      
      - Definition Classes
- AnyRef
- Annotations
- @throws( ... )
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        wait(arg0: Long, arg1: Int): Unit
      
      
      - Definition Classes
- AnyRef
- Annotations
- @throws( ... )
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        wait(arg0: Long): Unit
      
      
      - Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        write(batchMetadataFile: Path, fn: (OutputStream) ⇒ Unit): Unit
      
      
      - Attributes
- protected