it.agilelab.bigdata.wasp.consumers.spark.strategies.gdpr.utils.hdfs
Create a new directory inside backupParentDir
, called "backup_{randomUUID}".
Create a new directory inside backupParentDir
, called "backup_{randomUUID}".
Each of the files inside filesToBackup
will be copied in this directory, also maintaining
the eventual HDFS partitioning. The new file path is created by removing the base directory
(that is dataPath
) from the file path, and replacing it with the path of the backup directory.
Example:
filesToBackup
= ["/user/data/p1=a/p2=b/file.parquet"]
backupParentDir
= "/user"
dataPath
= "/user/data"
backupDir
= "/user/backup_123'Files that should be copied in the backup directory
Base path where to create the backup directory
Path containing the data that will be backup
Path of the newly created backup directory