trait UDFUtils extends RestAPIUtils with Serializable
Utility class with different UDFs to take care of miscellaneous tasks.
- Alphabetic
- By Inheritance
- UDFUtils
- Serializable
- Serializable
- RestAPIUtils
- LazyLogging
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Type Members
- case class LookupCondition(lookupColumn: String, comparisonOp: String, inputVariableName: String) extends Product with Serializable
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
arrayColumn(value: String, values: String*): Column
Function to take variable number of values and create an array column out of it.
Function to take variable number of values and create an array column out of it.
- value
input value
- values
variable number of input values.
- returns
an array of column.
-
val
array_value: UserDefinedFunction
UDF to find and return element in arr sequence at passed index.
UDF to find and return element in arr sequence at passed index. If no element found then null is returned.
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
val
call_rest_api: UserDefinedFunction
Spark UDF that makes a single blocking rest API call to a given url.
Spark UDF that makes a single blocking rest API call to a given url. The result of this udf is always produced, contains a proper error if it failed at any stage, and never interrupts the job execution (unless called with invalid signature).
The default timeout can be configured through the
spark.network.timeout
Spark configuration option.Parameters:
- method - any supported HTTP1.1 method type, e.g. POST, GET. Complete list: [httpMethods].
- url - valid url to which a request is going to be made
- headers - an array of "key: value" headers that are past with the request
- content - any content (by default, the supported rest api content type is application/json)
Response - a struct with the following fields:
- isSuccess - boolean, whether a successful response has been received
- status - nullable integer, status code (e.g. 404, 200, etc)
- headers - an array of
name: value
response headers (e.g. [Server: akka-http/10.1.10, Date: Tue, 07 Sep 2021 18:11:47 GMT]) - content - nullable string, response back
- error - nullable string, if the parameters passed are valid or the system failed to make a call, this field contains an error message
- Definition Classes
- RestAPIUtils
-
def
call_udf(udfName: String, cols: Column*): Column
Taken from upstream Spark
Taken from upstream Spark
- Annotations
- @varargs()
-
def
castDataType(sparkSession: SparkSession, df: DataFrame, column: Column, dataType: String, replaceColumn: String): DataFrame
Function to add new typecasted column in input dataframe.
Function to add new typecasted column in input dataframe. Newly added column is typecasted version of passed column. Typecast operation is supported for string, boolean, byte, short, int, long, float, double, decimal, date, timestamp
- sparkSession
spark session
- df
input dataframe
- column
input column to be typecasted
- dataType
datatype to cast column to.
- replaceColumn
column name to be added in dataframe.
- returns
new dataframe with new typecasted column.
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native() @HotSpotIntrinsicCandidate()
-
def
createExtendedLookup(name: String, df: DataFrame, spark: SparkSession, conditions: List[LookupCondition], inputParams: List[String], valueColumns: String*): UserDefinedFunction
Extended Lookup creates a special lookup to support the informatica lookup node functionality
Extended Lookup creates a special lookup to support the informatica lookup node functionality
- conditions
: condition used to filter the rows
- inputParams
: input parameters
-
def
createLookup(name: String, df: DataFrame, spark: SparkSession, keyCols: List[String], rowCols: String*): UserDefinedFunction
Function registers 4 different UDFs with spark registry.
Function registers 4 different UDFs with spark registry. UDF for lookup_match, lookup_count, lookup_row and lookup functions are registered. This function stores the data of input dataframe in a broadcast variable, then uses this broadcast variable in different lookup functions.
lookup : This function returns the first matching row for given input keys lookup_count : This function returns the count of all matching rows for given input keys. lookup_match : This function returns 0 if there is no matching row and 1 for some matching rows for given input keys. lookup_row : This function returns all the matching rows for given input keys.
This function registers for upto 10 matching keys as input to these lookup functions.
- name
UDF Name
- df
input dataframe
- spark
spark session
- keyCols
columns to be used as keys in lookup functions.
- rowCols
schema of entire row which will be stored for each matching key.
- returns
registered UDF definitions for lookup functions. These UDF functions returns different results depending on the lookup function.
-
def
createRangeLookup(name: String, df: DataFrame, spark: SparkSession, minColumn: String, maxColumn: String, valueColumns: String*): UserDefinedFunction
Method to create UDF which looks for passed input double in input dataframe.
Method to create UDF which looks for passed input double in input dataframe. This function first loads the data of dataframe in broadcast variable and then defines a UDF which looks for input double value in the data stored in broadcast variable. If input double lies between passed col1 and col2 values then it adds corresponding row in the returned result. If value of input double doesn't lie between col1 and col2 then it simply returns null for current row in result.
- name
created UDF name
- df
input dataframe
- spark
spark session
- minColumn
column whose value to be considered as minimum in comparison.
- maxColumn
column whose value to be considered as maximum in comparison.
- valueColumns
remaining column names to be part of result.
- returns
registers UDF which in turn returns rows corresponding to each row in dataframe on which range UDF is called.
-
def
dropColumns(sparkSession: SparkSession, df: DataFrame, columns: Column*): DataFrame
Function to drop passed columns from input dataframe.
Function to drop passed columns from input dataframe.
- sparkSession
spark session
- df
input dataframe.
- columns
list of columns to be dropped from dataframe.
- returns
new dataframe with dropped columns.
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- def extended_lookup(lookupName: String, cols: Column*): Column
- def extended_lookup_any(lookupName: String, cols: Column*): Column
- def extended_lookup_first(lookupName: String, cols: Column*): Column
- def extended_lookup_last(lookupName: String, cols: Column*): Column
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
lazy val
logger: Logger
- Attributes
- protected
- Definition Classes
- LazyLogging
- Annotations
- @transient()
-
def
lookup(lookupName: String, cols: Column*): Column
By default returns only the first matching record
- def lookup_count(lookupName: String, cols: Column*): Column
-
def
lookup_last(lookupName: String, cols: Column*): Column
Returns the last matching record
-
def
lookup_match(lookupName: String, cols: Column*): Column
- returns
Boolean Column
- def lookup_nth(lookupName: String, cols: Column*): Column
- def lookup_range(lookupName: String, input: Column): Column
- def lookup_row(lookupName: String, cols: Column*): Column
- def lookup_row_reverse(lookupName: String, cols: Column*): Column
- def measure[T](fn: ⇒ T)(caller: String = findCaller()): T
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @HotSpotIntrinsicCandidate()
- def registerProphecyUdfs(spark: SparkSession): Unit
-
def
replaceString(sparkSession: SparkSession, df: DataFrame, outputCol: String, inputCol: String, replaceWith: String, value: String, values: String*): DataFrame
Function to add new column in passed dataframe.
Function to add new column in passed dataframe. Newly added column value is decided by the presence of value corresponding to inputCol in array comprised of value and values. If inputCol is found then value of replaceWith is added in new column otherwise inputCol value is added.
- sparkSession
spark session.
- df
input dataframe.
- outputCol
name of new column to be added.
- inputCol
column name whose value is searched.
- replaceWith
value with which to replace searched value if found.
- value
element to be combined in array column
- values
all values to be combined in array column for searching purpose.
- returns
dataframe with new column with column name outputCol
-
def
replaceStringNull(sparkSession: SparkSession, df: DataFrame, outputCol: String, inputCol: String, replaceWith: String, value: String, values: String*): DataFrame
Function to add new column in passed dataframe.
Function to add new column in passed dataframe. Newly added column value is decided by the presence of value corresponding to inputCol in array comprised of value and values and null. If inputCol is found then value of replaceWith is added in new column otherwise inputCol value is added.
- sparkSession
spark session.
- df
input dataframe.
- outputCol
name of new column to be added.
- inputCol
column name whose value is searched.
- replaceWith
value with which to replace searched value if found.
- value
element to be combined in array column
- values
all values to be combined in array column for searching purpose.
- returns
dataframe with new column with column name outputCol
-
def
replaceStringWithNull(sparkSession: SparkSession, df: DataFrame, outputCol: String, inputCol: String, value: String, values: String*): DataFrame
Function to add new column in passed dataframe.
Function to add new column in passed dataframe. Newly added column value is decided by the presence of value corresponding to inputCol in array comprised of value and values and null. If inputCol is found then value of null is added in new column otherwise inputCol value is added.
- sparkSession
spark session.
- df
input dataframe.
- outputCol
name of new Column to be added.
- inputCol
column name whose value is searched.
- value
element to be combined in array column.
- values
all values to be combined in array column for searching purpose.
- returns
dataframe with new column with column name outputCol
-
val
replace_string: UserDefinedFunction
UDF to find str in input sequence toBeReplaced and return replace if found.
UDF to find str in input sequence toBeReplaced and return replace if found. Otherwise str is returned.
-
val
replace_string_with_null: UserDefinedFunction
UDF to find str in input sequence toBeReplaced and return null if found.
UDF to find str in input sequence toBeReplaced and return null if found. Otherwise str is returned.
-
def
splitIntoMultipleColumns(sparkSession: SparkSession, df: DataFrame, colName: String, pattern: String, prefix: String = null): DataFrame
Function to split column with colName in input dataframe using split pattern into multiple columns.
Function to split column with colName in input dataframe using split pattern into multiple columns. If prefix name is provided each new generated column is prefixed with prefix followed by column number, otherwise original column name is used.
- sparkSession
spark session.
- df
input dataframe.
- colName
column in dataframe which needs to be split into multiple columns.
- pattern
regex with which column in input dataframe will be split into multiple columns.
- prefix
column prefix to be used with all newly generated columns.
- returns
new dataframe with new columns where new column values are generated after splitting original column colName.
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
val
take_last_nth: UserDefinedFunction
UDF to return nth element from last in passed array of elements.
UDF to return nth element from last in passed array of elements. In case input sequence has less number of elements than n then first element is returned.
-
val
take_nth: UserDefinedFunction
UDF to take Nth element from beginning.
UDF to take Nth element from beginning. In case input sequence has less element than N then exception is thrown.
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
Deprecated Value Members
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] ) @Deprecated
- Deprecated