org.apache.spark.sql.execution.streaming

FileStreamSinkLog

class FileStreamSinkLog extends CompactibleFileStreamLog[SinkFileStatus]

A special log for FileStreamSink. It will write one log file for each batch. The first line of the log file is the version number, and there are multiple JSON lines following. Each JSON line is a JSON format of SinkFileStatus.

As reading from many small files is usually pretty slow, FileStreamSinkLog will compact log files every "spark.sql.sink.file.log.compactLen" batches into a big file. When doing a compaction, it will read all old log files and merge them with the new batch. During the compaction, it will also delete the files that are deleted (marked by SinkFileStatus.action). When the reader uses allFiles to list all files, this method only returns the visible files (drops the deleted files).

Linear Supertypes
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. FileStreamSinkLog
  2. CompactibleFileStreamLog
  3. HDFSMetadataLog
  4. Logging
  5. MetadataLog
  6. AnyRef
  7. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Instance Constructors

  1. new FileStreamSinkLog(metadataLogVersion: Int, sparkSession: SparkSession, path: String)

Value Members

  1. final def !=(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  2. final def !=(arg0: Any): Boolean

    Definition Classes
    Any
  3. final def ##(): Int

    Definition Classes
    AnyRef → Any
  4. final def ==(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  5. final def ==(arg0: Any): Boolean

    Definition Classes
    Any
  6. def add(batchId: Long, logs: Array[SinkFileStatus]): Boolean

    Store the metadata for the specified batchId and return true if successful.

    Store the metadata for the specified batchId and return true if successful. If the batchId's metadata has already been stored, this method will return false.

    Definition Classes
    CompactibleFileStreamLogHDFSMetadataLogMetadataLog
  7. def allFiles(): Array[SinkFileStatus]

    Returns all files except the deleted ones.

    Returns all files except the deleted ones.

    Definition Classes
    CompactibleFileStreamLog
  8. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  9. val batchFilesFilter: PathFilter

    A PathFilter to filter only batch files

    A PathFilter to filter only batch files

    Attributes
    protected
    Definition Classes
    HDFSMetadataLog
  10. def batchIdToPath(batchId: Long): Path

  11. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  12. final lazy val compactInterval: Int

    Attributes
    protected
    Definition Classes
    CompactibleFileStreamLog
  13. def compactLogs(logs: Seq[SinkFileStatus]): Seq[SinkFileStatus]

    Filter out the obsolete logs.

    Filter out the obsolete logs.

    Definition Classes
    FileStreamSinkLogCompactibleFileStreamLog
  14. val defaultCompactInterval: Int

    Attributes
    protected
    Definition Classes
    FileStreamSinkLogCompactibleFileStreamLog
  15. def deserialize(in: InputStream): Array[SinkFileStatus]

  16. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  17. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  18. val fileCleanupDelayMs: Long

    If we delete the old files after compaction at once, there is a race condition in S3: other processes may see the old files are deleted but still cannot see the compaction file using "list".

    If we delete the old files after compaction at once, there is a race condition in S3: other processes may see the old files are deleted but still cannot see the compaction file using "list". The allFiles handles this by looking for the next compaction file directly, however, a live lock may happen if the compaction happens too frequently: one processing keeps deleting old files while another one keeps retrying. Setting a reasonable cleanup delay could avoid it.

    Attributes
    protected
    Definition Classes
    FileStreamSinkLogCompactibleFileStreamLog
  19. val fileManager: FileManager

    Attributes
    protected
    Definition Classes
    HDFSMetadataLog
  20. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  21. def get(startId: Option[Long], endId: Option[Long]): Array[(Long, Array[SinkFileStatus])]

    Return metadata for batches between startId (inclusive) and endId (inclusive).

    Return metadata for batches between startId (inclusive) and endId (inclusive). If startId is None, just return all batches before endId (inclusive).

    Definition Classes
    HDFSMetadataLogMetadataLog
  22. def get(batchId: Long): Option[Array[SinkFileStatus]]

    Return the metadata for the specified batchId if it's stored.

    Return the metadata for the specified batchId if it's stored. Otherwise, return None.

    Definition Classes
    HDFSMetadataLogMetadataLog
  23. def get(batchFile: Path): Option[Array[SinkFileStatus]]

    returns

    the deserialized metadata in a batch file, or None if file not exist.

    Definition Classes
    HDFSMetadataLog
    Exceptions thrown
    IllegalArgumentException

    when path does not point to a batch file.

  24. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  25. def getLatest(): Option[(Long, Array[SinkFileStatus])]

    Return the latest batch Id and its metadata if exist.

    Return the latest batch Id and its metadata if exist.

    Definition Classes
    HDFSMetadataLogMetadataLog
  26. def getOrderedBatchFiles(): Array[FileStatus]

    Get an array of [FileStatus] referencing batch files.

    Get an array of [FileStatus] referencing batch files. The array is sorted by most recent batch file first to oldest batch file.

    Definition Classes
    HDFSMetadataLog
  27. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  28. def initializeLogIfNecessary(isInterpreter: Boolean): Unit

    Attributes
    protected
    Definition Classes
    Logging
  29. def isBatchFile(path: Path): Boolean

  30. val isDeletingExpiredLog: Boolean

    Attributes
    protected
    Definition Classes
    FileStreamSinkLogCompactibleFileStreamLog
  31. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  32. def isTraceEnabled(): Boolean

    Attributes
    protected
    Definition Classes
    Logging
  33. def log: Logger

    Attributes
    protected
    Definition Classes
    Logging
  34. def logDebug(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  35. def logDebug(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  36. def logError(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  37. def logError(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  38. def logInfo(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  39. def logInfo(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  40. def logName: String

    Attributes
    protected
    Definition Classes
    Logging
  41. def logTrace(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  42. def logTrace(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  43. def logWarning(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  44. def logWarning(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  45. val metadataPath: Path

    Definition Classes
    HDFSMetadataLog
  46. val minBatchesToRetain: Int

    Attributes
    protected
    Definition Classes
    CompactibleFileStreamLog
  47. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  48. final def notify(): Unit

    Definition Classes
    AnyRef
  49. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  50. def pathToBatchId(path: Path): Long

  51. def purge(thresholdBatchId: Long): Unit

    Removes all the log entry earlier than thresholdBatchId (exclusive).

    Removes all the log entry earlier than thresholdBatchId (exclusive).

    Definition Classes
    HDFSMetadataLogMetadataLog
  52. def serialize(logData: Array[SinkFileStatus], out: OutputStream): Unit

  53. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  54. def toString(): String

    Definition Classes
    AnyRef → Any
  55. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  56. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  57. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from HDFSMetadataLog[Array[SinkFileStatus]]

Inherited from Logging

Inherited from MetadataLog[Array[SinkFileStatus]]

Inherited from AnyRef

Inherited from Any

Ungrouped