A utility which transforms a JSON org.apache.spark.rdd.RDD to a org.apache.spark.sql.DataFrame of flattened JSON from a provided array of JSONPaths.
A utility which transforms a JSON org.apache.spark.rdd.RDD to a org.apache.spark.sql.DataFrame of flattened JSON from a provided array of JSONPaths.
NOTE: The resulting org.apache.spark.sql.DataFrame is suitable for loading to a target table that has
additional columns with defaults (including TIMESTAMP default CURRENT_TIME
and computed columns).
For instance, given JSON blobs of the form { "a" : value1, "b" : { "c" : value2, "d" : value3 } } The paths
Array(JSONPath("a"), JSONPath("b","c"), JSONPath("b","d"))
will produce a DataFrame like
+--------+--------+--------+ | a | b_c | b_d | +--------+--------+--------+ | value1 | value2 | value3 | +--------+--------+--------+
For non-leaf-paths, you will get the flattened JSON as scala.Predef.String. for instance,
Array(JSONPath("b"))
will yield
+--------+--------+---------+ | b | +--------+--------+---------+ | {"c":value1, "d":value2"} | +--------+--------+---------+
Any nonexisting paths will yield null. Malformed JSON will throw a runtime com.fasterxml.jackson.core.JsonParseException on the executors. This utility currently does not support flattening JSON arrays.
Equivalent to JSONRDDToDataFrame.
Equivalent to JSONRDDToDataFrame.rdd, but does not create a org.apache.spark.sql.DataFrame.
An scala.Array of JSONPaths to extract.
The org.apache.spark.rdd.RDD to parse as JSON.
Utility for parsing a JSON formatted org.apache.spark.rdd.RDD.