Packages

t

io.prophecy.libs

UDFUtils

trait UDFUtils extends RestAPIUtils with Serializable

Utility class with different UDFs to take care of miscellaneous tasks.

Linear Supertypes
Serializable, Serializable, RestAPIUtils, LazyLogging, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. UDFUtils
  2. Serializable
  3. Serializable
  4. RestAPIUtils
  5. LazyLogging
  6. AnyRef
  7. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Type Members

  1. case class LookupCondition(lookupColumn: String, comparisonOp: String, inputVariableName: String) extends Product with Serializable

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. def arrayColumn(value: String, values: String*): Column

    Function to take variable number of values and create an array column out of it.

    Function to take variable number of values and create an array column out of it.

    value

    input value

    values

    variable number of input values.

    returns

    an array of column.

  5. val array_value: UserDefinedFunction

    UDF to find and return element in arr sequence at passed index.

    UDF to find and return element in arr sequence at passed index. If no element found then null is returned.

  6. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  7. val call_rest_api: UserDefinedFunction

    Spark UDF that makes a single blocking rest API call to a given url.

    Spark UDF that makes a single blocking rest API call to a given url. The result of this udf is always produced, contains a proper error if it failed at any stage, and never interrupts the job execution (unless called with invalid signature).

    The default timeout can be configured through the spark.network.timeout Spark configuration option.

    Parameters:

    • method - any supported HTTP1.1 method type, e.g. POST, GET. Complete list: [httpMethods].
    • url - valid url to which a request is going to be made
    • headers - an array of "key: value" headers that are past with the request
    • content - any content (by default, the supported rest api content type is application/json)

    Response - a struct with the following fields:

    • isSuccess - boolean, whether a successful response has been received
    • status - nullable integer, status code (e.g. 404, 200, etc)
    • headers - an array of name: value response headers (e.g. [Server: akka-http/10.1.10, Date: Tue, 07 Sep 2021 18:11:47 GMT])
    • content - nullable string, response back
    • error - nullable string, if the parameters passed are valid or the system failed to make a call, this field contains an error message
    Definition Classes
    RestAPIUtils
  8. def call_udf(udfName: String, cols: Column*): Column

    Taken from upstream Spark

    Taken from upstream Spark

    Annotations
    @varargs()
  9. def castDataType(sparkSession: SparkSession, df: DataFrame, column: Column, dataType: String, replaceColumn: String): DataFrame

    Function to add new typecasted column in input dataframe.

    Function to add new typecasted column in input dataframe. Newly added column is typecasted version of passed column. Typecast operation is supported for string, boolean, byte, short, int, long, float, double, decimal, date, timestamp

    sparkSession

    spark session

    df

    input dataframe

    column

    input column to be typecasted

    dataType

    datatype to cast column to.

    replaceColumn

    column name to be added in dataframe.

    returns

    new dataframe with new typecasted column.

  10. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native() @HotSpotIntrinsicCandidate()
  11. def createExtendedLookup(name: String, df: DataFrame, spark: SparkSession, conditions: List[LookupCondition], inputParams: List[String], valueColumns: String*): UserDefinedFunction

    Extended Lookup creates a special lookup to support the informatica lookup node functionality

    Extended Lookup creates a special lookup to support the informatica lookup node functionality

    conditions

    : condition used to filter the rows

    inputParams

    : input parameters

  12. def createLookup(name: String, df: DataFrame, spark: SparkSession, keyCols: List[String], rowCols: String*): UserDefinedFunction

    Function registers 4 different UDFs with spark registry.

    Function registers 4 different UDFs with spark registry. UDF for lookup_match, lookup_count, lookup_row and lookup functions are registered. This function stores the data of input dataframe in a broadcast variable, then uses this broadcast variable in different lookup functions.

    lookup : This function returns the first matching row for given input keys lookup_count : This function returns the count of all matching rows for given input keys. lookup_match : This function returns 0 if there is no matching row and 1 for some matching rows for given input keys. lookup_row : This function returns all the matching rows for given input keys.

    This function registers for upto 10 matching keys as input to these lookup functions.

    name

    UDF Name

    df

    input dataframe

    spark

    spark session

    keyCols

    columns to be used as keys in lookup functions.

    rowCols

    schema of entire row which will be stored for each matching key.

    returns

    registered UDF definitions for lookup functions. These UDF functions returns different results depending on the lookup function.

  13. def createRangeLookup(name: String, df: DataFrame, spark: SparkSession, minColumn: String, maxColumn: String, valueColumns: String*): UserDefinedFunction

    Method to create UDF which looks for passed input double in input dataframe.

    Method to create UDF which looks for passed input double in input dataframe. This function first loads the data of dataframe in broadcast variable and then defines a UDF which looks for input double value in the data stored in broadcast variable. If input double lies between passed col1 and col2 values then it adds corresponding row in the returned result. If value of input double doesn't lie between col1 and col2 then it simply returns null for current row in result.

    name

    created UDF name

    df

    input dataframe

    spark

    spark session

    minColumn

    column whose value to be considered as minimum in comparison.

    maxColumn

    column whose value to be considered as maximum in comparison.

    valueColumns

    remaining column names to be part of result.

    returns

    registers UDF which in turn returns rows corresponding to each row in dataframe on which range UDF is called.

  14. def dropColumns(sparkSession: SparkSession, df: DataFrame, columns: Column*): DataFrame

    Function to drop passed columns from input dataframe.

    Function to drop passed columns from input dataframe.

    sparkSession

    spark session

    df

    input dataframe.

    columns

    list of columns to be dropped from dataframe.

    returns

    new dataframe with dropped columns.

  15. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  16. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  17. def extended_lookup(lookupName: String, cols: Column*): Column
  18. def extended_lookup_any(lookupName: String, cols: Column*): Column
  19. def extended_lookup_first(lookupName: String, cols: Column*): Column
  20. def extended_lookup_last(lookupName: String, cols: Column*): Column
  21. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native() @HotSpotIntrinsicCandidate()
  22. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native() @HotSpotIntrinsicCandidate()
  23. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  24. lazy val logger: Logger
    Attributes
    protected
    Definition Classes
    LazyLogging
    Annotations
    @transient()
  25. def lookup(lookupName: String, cols: Column*): Column

    By default returns only the first matching record

  26. def lookup_count(lookupName: String, cols: Column*): Column
  27. def lookup_last(lookupName: String, cols: Column*): Column

    Returns the last matching record

  28. def lookup_match(lookupName: String, cols: Column*): Column

    returns

    Boolean Column

  29. def lookup_nth(lookupName: String, cols: Column*): Column
  30. def lookup_range(lookupName: String, input: Column): Column
  31. def lookup_row(lookupName: String, cols: Column*): Column
  32. def lookup_row_reverse(lookupName: String, cols: Column*): Column
  33. def measure[T](fn: ⇒ T)(caller: String = findCaller()): T
  34. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  35. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native() @HotSpotIntrinsicCandidate()
  36. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native() @HotSpotIntrinsicCandidate()
  37. def registerProphecyUdfs(spark: SparkSession): Unit
  38. def replaceString(sparkSession: SparkSession, df: DataFrame, outputCol: String, inputCol: String, replaceWith: String, value: String, values: String*): DataFrame

    Function to add new column in passed dataframe.

    Function to add new column in passed dataframe. Newly added column value is decided by the presence of value corresponding to inputCol in array comprised of value and values. If inputCol is found then value of replaceWith is added in new column otherwise inputCol value is added.

    sparkSession

    spark session.

    df

    input dataframe.

    outputCol

    name of new column to be added.

    inputCol

    column name whose value is searched.

    replaceWith

    value with which to replace searched value if found.

    value

    element to be combined in array column

    values

    all values to be combined in array column for searching purpose.

    returns

    dataframe with new column with column name outputCol

  39. def replaceStringNull(sparkSession: SparkSession, df: DataFrame, outputCol: String, inputCol: String, replaceWith: String, value: String, values: String*): DataFrame

    Function to add new column in passed dataframe.

    Function to add new column in passed dataframe. Newly added column value is decided by the presence of value corresponding to inputCol in array comprised of value and values and null. If inputCol is found then value of replaceWith is added in new column otherwise inputCol value is added.

    sparkSession

    spark session.

    df

    input dataframe.

    outputCol

    name of new column to be added.

    inputCol

    column name whose value is searched.

    replaceWith

    value with which to replace searched value if found.

    value

    element to be combined in array column

    values

    all values to be combined in array column for searching purpose.

    returns

    dataframe with new column with column name outputCol

  40. def replaceStringWithNull(sparkSession: SparkSession, df: DataFrame, outputCol: String, inputCol: String, value: String, values: String*): DataFrame

    Function to add new column in passed dataframe.

    Function to add new column in passed dataframe. Newly added column value is decided by the presence of value corresponding to inputCol in array comprised of value and values and null. If inputCol is found then value of null is added in new column otherwise inputCol value is added.

    sparkSession

    spark session.

    df

    input dataframe.

    outputCol

    name of new Column to be added.

    inputCol

    column name whose value is searched.

    value

    element to be combined in array column.

    values

    all values to be combined in array column for searching purpose.

    returns

    dataframe with new column with column name outputCol

  41. val replace_string: UserDefinedFunction

    UDF to find str in input sequence toBeReplaced and return replace if found.

    UDF to find str in input sequence toBeReplaced and return replace if found. Otherwise str is returned.

  42. val replace_string_with_null: UserDefinedFunction

    UDF to find str in input sequence toBeReplaced and return null if found.

    UDF to find str in input sequence toBeReplaced and return null if found. Otherwise str is returned.

  43. def splitIntoMultipleColumns(sparkSession: SparkSession, df: DataFrame, colName: String, pattern: String, prefix: String = null): DataFrame

    Function to split column with colName in input dataframe using split pattern into multiple columns.

    Function to split column with colName in input dataframe using split pattern into multiple columns. If prefix name is provided each new generated column is prefixed with prefix followed by column number, otherwise original column name is used.

    sparkSession

    spark session.

    df

    input dataframe.

    colName

    column in dataframe which needs to be split into multiple columns.

    pattern

    regex with which column in input dataframe will be split into multiple columns.

    prefix

    column prefix to be used with all newly generated columns.

    returns

    new dataframe with new columns where new column values are generated after splitting original column colName.

  44. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  45. val take_last_nth: UserDefinedFunction

    UDF to return nth element from last in passed array of elements.

    UDF to return nth element from last in passed array of elements. In case input sequence has less number of elements than n then first element is returned.

  46. val take_nth: UserDefinedFunction

    UDF to take Nth element from beginning.

    UDF to take Nth element from beginning. In case input sequence has less element than N then exception is thrown.

  47. def toString(): String
    Definition Classes
    AnyRef → Any
  48. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  49. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  50. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Deprecated Value Members

  1. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] ) @Deprecated
    Deprecated

Inherited from Serializable

Inherited from Serializable

Inherited from RestAPIUtils

Inherited from LazyLogging

Inherited from AnyRef

Inherited from Any

Ungrouped