it.agilelab.bigdata.wasp.consumers.spark.plugins.raw.tools
Given a basePath, finds all files associated to the specified partitions
Given a basePath, finds all files associated to the specified partitions
the file system from which to read the files
the base path from which to begin the search
the partitions to search
the list of Path of files found
the list of directories that a file path contains E.g. [ "journey", "raw", "a=1", "b=2" ]
the combination of partitions E.g. [ "a=1", "b=2" ]
true if the partitions of combinations
are all present in folders
, false otherwise
the path of the file
the combination of partitions
true if the partitions of combinations
are all present in path
, false otherwise
the WhereCondition to filter
the file to check
true if whereCondition
covers the path of file
, false otherwise
the list of files to
the WhereCondition to filter
true if at least one file is present for this WhereCondition, false otherwise
Generates all the possible combinations of columnName and columnValue Example: partitions = List( "a" -> List("1"), "b" -> List("2", "3"), "c" -> List("4", "5", "6") ) output = List( ("a", "1") :: ("b", "2) :: ("c", "4") :: Nil, ("a", "1") :: ("b", "2") :: ("c", "5") :: Nil, ("a", "1") :: ("b", "2") :: ("c", "6") :: Nil, ("a", "1") :: ("b", "3") :: ("c", "4") :: Nil, ("a", "1") :: ("b", "3") :: ("c", "5") :: Nil, ("a", "1") :: ("b", "3") :: ("c", "6") :: Nil, )
Generates all the possible combinations of columnName and columnValue Example: partitions = List( "a" -> List("1"), "b" -> List("2", "3"), "c" -> List("4", "5", "6") ) output = List( ("a", "1") :: ("b", "2) :: ("c", "4") :: Nil, ("a", "1") :: ("b", "2") :: ("c", "5") :: Nil, ("a", "1") :: ("b", "2") :: ("c", "6") :: Nil, ("a", "1") :: ("b", "3") :: ("c", "4") :: Nil, ("a", "1") :: ("b", "3") :: ("c", "5") :: Nil, ("a", "1") :: ("b", "3") :: ("c", "6") :: Nil, )
the list of partitions to generate the combinations
all the possible combinations obtained from the input partitions
Builds the list of WhereCondition used to filter the original DataFrame read from the input model.
Builds the list of WhereCondition used to filter the original DataFrame read from the input model. This list has one element for each output partition combination, in order to write the correct number of files to the partitions specified by the output model. Each of these combinations is put in AND with all the input partitions combinations, in order to write only the files of the partitions requested. Example: inputModel.partitions = [a, b, c, d] outputModel.partitions = [a, b] partitions = a -> [1, 2], b -> [3, 4], c -> [5, 6], d -> [7, 8]
output = [ (a=1 AND b=3) AND ( (c=5 AND d=7) OR (c=5 AND d=8) OR (c=6 AND d=7) OR (c=6 AND d=8) ) (a=1 AND b=4) AND ( (c=5 AND d=7) OR (c=5 AND d=8) OR (c=6 AND d=7) OR (c=6 AND d=8) ) (a=2 AND b=3) AND ( (c=5 AND d=7) OR (c=5 AND d=8) OR (c=6 AND d=7) OR (c=6 AND d=8) ) (a=2 AND b=4) AND ( (c=5 AND d=7) OR (c=5 AND d=8) OR (c=6 AND d=7) OR (c=6 AND d=8) ) ]
the list of partitions to generate the conditions
the inputModel defining the input partitions
the outputModel defining the output partitions
the list of WhereCondition generated