Object

com.coxautodata.waimak.filesystem

FSUtils

Related Doc: package filesystem

Permalink

object FSUtils extends Logging

Created by Alexei Perelighin on 23/10/17.

Linear Supertypes
Logging, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. FSUtils
  2. Logging
  3. AnyRef
  4. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Type Members

  1. implicit class ScalaRemoteIterator[T] extends Iterator[T]

    Permalink

    Implicit class to convert an Hadoop RemoteIterator object to a Scala Iterator

    Implicit class to convert an Hadoop RemoteIterator object to a Scala Iterator

    T

    Type of the elements in the iterator

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  5. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  6. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  7. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  8. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  9. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  10. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  11. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  12. def isTraceEnabled(): Boolean

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  13. def keepNotPresent[O](fs: FileSystem, inParentFolder: Path, toTest: Seq[O])(getObjectPath: (O) ⇒ Path): Seq[O]

    Permalink

    Check is objects in the toTest collection can be mapped to existing folders in the HDFS and returns objects that have not been mapped into the HDFS folder yet.

    Check is objects in the toTest collection can be mapped to existing folders in the HDFS and returns objects that have not been mapped into the HDFS folder yet.

    Implementation is quite efficient as it uses HDFS PathFilter and does not use globs or full lists that could be quite big.

    For example: Inputs: 1) HDFS folder inParentFolder contains partition folders, one per day from 2017/01/01 to 2017/03/15 2) toTest is a suggested range of dates from 2017/03/10 to 2017/03/20

    Output: 1) list of dates from 2017/03/16 to 2017/03/20

    inParentFolder

    - HDFS folder that contains folders that could be mapped to tested objects

    getObjectPath

    - maps hdfs path to tested object

    returns

    objects from toTest that could not be mapped into the folder inParentFolder via function getObjectPath

  14. def listPartitions(fs: FileSystem, folder: String): Seq[(String, String)]

    Permalink

    Lists Hive partition column name and its value, by looking into the folder.

    Lists Hive partition column name and its value, by looking into the folder.

    returns

    (PARTITON COLUMN NAME, VALUE)

  15. def logAndReturn[A](a: A, msg: String, level: Level): A

    Permalink

    Takes a value of type A and a msg to log, returning a and logging the message at the desired level

    Takes a value of type A and a msg to log, returning a and logging the message at the desired level

    returns

    a

    Definition Classes
    Logging
  16. def logAndReturn[A](a: A, message: (A) ⇒ String, level: Level): A

    Permalink

    Takes a value of type A and a function message from A to String, logs the value of invoking message(a) at the level described by the level parameter

    Takes a value of type A and a function message from A to String, logs the value of invoking message(a) at the level described by the level parameter

    returns

    a

    Definition Classes
    Logging
    Example:
    1. logAndReturn(1, (num: Int) => s"number: $num", Info)
      // In the log we would see a log corresponding to "number 1"
  17. def logDebug(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  18. def logDebug(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  19. def logError(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  20. def logError(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  21. def logInfo(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  22. def logInfo(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  23. def logName: String

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  24. def logTrace(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  25. def logTrace(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  26. def logWarning(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  27. def logWarning(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  28. def mergeMoveFiles(fs: FileSystem, sourceFolder: Path, destinationFolder: Path, pathFilter: (Path) ⇒ Boolean): Unit

    Permalink

    Move all files from a source directory into a destination directory.

    Move all files from a source directory into a destination directory. Source folder must exist, and destination folder will be created if it does not exists. All files that match isFile and the given pathFilter will be moved. An exception is thrown if the file is already present in the destination directory.

    fs

    - FileSystem object for the given paths

    sourceFolder

    - Folder to move files from

    destinationFolder

    - Folder to move files to

    pathFilter

    - Only move files that match the given filter

  29. def moveAll(fs: FileSystem, subs: Seq[String], fromPath: Path, toPath: Path): Boolean

    Permalink

    Moves all sub-folders in fromPath into toPath.

    Moves all sub-folders in fromPath into toPath. If a folder exists in the destination, it is overwritten. It uses and efficient approach to minimise the number of call to HDFS for checks and validations which could add significant amount of time to the end to end execution.

    fs

    - current hadoop file system

    subs

    - sub folders to move, usually thsese are folders in the staging folder

    fromPath

    - parent folder in which sub folders are

    toPath

    - into which folder to move the subs folders, if any already exist, then need to be overwritten

  30. def moveOverwriteFolder(fs: FileSystem, toMove: Path, toPath: Path): Boolean

    Permalink

    Moves toMove into toPath.

    Moves toMove into toPath. Parent folder of the toPath is created if it does not exist

    fs

    - FileSystem which can be HDFS or Local.

    toMove

    - full path to the folder to be moved.

    toPath

    - full path to be moved into, includes the folder name itself.

    returns

    true if move was successful.

  31. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  32. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  33. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  34. def removeFolder(fs: FileSystem, folder: String): Unit

    Permalink

    Deletes folder with all of its content, if it does not exist than does nothing.

  35. def removeSubFoldersPresentInList(fs: FileSystem, folder: Path, subs: Seq[String]): Boolean

    Permalink

    Check if there are any existing folders with the same name in the path and removes them.

    Check if there are any existing folders with the same name in the path and removes them. The main benefit is that it performs checks in one round-trip to HDFS which in case of day zero scenarios could take a lot of time.

    folder

    - parent folder in which to check for existing sub-folders

    subs

    - names to check, if the name is not present, than ignore it, if present, remove it

    returns

    - true if everything was fine

  36. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  37. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  38. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  39. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  40. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from Logging

Inherited from AnyRef

Inherited from Any

Ungrouped