Implicit class to convert an Hadoop RemoteIterator object to a Scala Iterator
Implicit class to convert an Hadoop RemoteIterator object to a Scala Iterator
Type of the elements in the iterator
Check is objects in the toTest collection can be mapped to existing folders in the HDFS and returns objects that have not been mapped into the HDFS folder yet.
Check is objects in the toTest collection can be mapped to existing folders in the HDFS and returns objects that have not been mapped into the HDFS folder yet.
Implementation is quite efficient as it uses HDFS PathFilter and does not use globs or full lists that could be quite big.
For example: Inputs: 1) HDFS folder inParentFolder contains partition folders, one per day from 2017/01/01 to 2017/03/15 2) toTest is a suggested range of dates from 2017/03/10 to 2017/03/20
Output: 1) list of dates from 2017/03/16 to 2017/03/20
- HDFS folder that contains folders that could be mapped to tested objects
- maps hdfs path to tested object
objects from toTest that could not be mapped into the folder inParentFolder via function getObjectPath
Lists Hive partition column name and its value, by looking into the folder.
Lists Hive partition column name and its value, by looking into the folder.
(PARTITON COLUMN NAME, VALUE)
Takes a value of type A and a msg to log, returning a and logging the message at the desired level
Takes a value of type A and a msg to log, returning a and logging the message at the desired level
a
Takes a value of type A and a function message from A to String, logs the value of invoking message(a) at the level described by the level parameter
Takes a value of type A and a function message from A to String, logs the value of invoking message(a) at the level described by the level parameter
a
logAndReturn(1, (num: Int) => s"number: $num", Info) // In the log we would see a log corresponding to "number 1"
Move all files from a source directory into a destination directory.
Move all files from a source directory into a destination directory. Source folder must exist, and destination folder will be created if it does not exists. All files that match isFile and the given pathFilter will be moved. An exception is thrown if the file is already present in the destination directory.
- FileSystem object for the given paths
- Folder to move files from
- Folder to move files to
- Only move files that match the given filter
Moves all sub-folders in fromPath into toPath.
Moves all sub-folders in fromPath into toPath. If a folder exists in the destination, it is overwritten. It uses and efficient approach to minimise the number of call to HDFS for checks and validations which could add significant amount of time to the end to end execution.
- current hadoop file system
- sub folders to move, usually thsese are folders in the staging folder
- parent folder in which sub folders are
- into which folder to move the subs folders, if any already exist, then need to be overwritten
Moves toMove into toPath.
Moves toMove into toPath. Parent folder of the toPath is created if it does not exist
- FileSystem which can be HDFS or Local.
- full path to the folder to be moved.
- full path to be moved into, includes the folder name itself.
true if move was successful.
Deletes folder with all of its content, if it does not exist than does nothing.
Check if there are any existing folders with the same name in the path and removes them.
Check if there are any existing folders with the same name in the path and removes them. The main benefit is that it performs checks in one round-trip to HDFS which in case of day zero scenarios could take a lot of time.
- parent folder in which to check for existing sub-folders
- names to check, if the name is not present, than ignore it, if present, remove it
- true if everything was fine
Created by Alexei Perelighin on 23/10/17.