implicit class ExtendedDataFrameGlobal extends ExtendedDataFrame
- Alphabetic
- By Inheritance
- ExtendedDataFrameGlobal
- ExtendedDataFrame
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
breakAndWriteDataFrameForOutputFile(outputColumns: Seq[String], fileColumnName: String, format: String, delimiter: Option[String] = None): Unit
Method to break input dataframe via unique values of fileColumnName colume into multiple dataframes and persist each dataframe into its corresponding output file.
Method to break input dataframe via unique values of fileColumnName colume into multiple dataframes and persist each dataframe into its corresponding output file.
- Definition Classes
- ExtendedDataFrame
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native() @HotSpotIntrinsicCandidate()
-
def
collectDataFrameColumnsToApplyFilter(columnList: List[String], filterSourceDataFrame: DataFrame): DataFrame
Method to collect values for columnList columns from filterSourceDataFrame and pass it to caller DataFrame to filter out values in caller DataFrame.
Method to collect values for columnList columns from filterSourceDataFrame and pass it to caller DataFrame to filter out values in caller DataFrame.
- Definition Classes
- ExtendedDataFrame
-
def
compareRecords(otherDataFrame: DataFrame, componentName: String, limit: Int, spark: SparkSession): DataFrame
Method which implements logic of Compare Records abinitio component.
Method which implements logic of Compare Records abinitio component. Its functioning is as explained below
1. It takes join of both input dataframes via adding incremental sequence number and takes join on this sequence number. 2. It compares all records of both input dataframes and finds count of mismatching records. 3. If mismatch record count is more than limit than it throws error to terminate workflow execution. Otherwise it returns dataframe with mismatch count report.
- Definition Classes
- ExtendedDataFrame
-
val
dataFrame: DataFrame
- Definition Classes
- ExtendedDataFrame
-
def
deduplicate(typeToKeep: String, groupByColumns: List[Column] = List(lit(1)), orderByColumns: List[Column] = List(lit(1))): DataFrame
Method for Deduplicate operation when rows to be kept in each group of rows to be either first, Last or unique-only.
Method for Deduplicate operation when rows to be kept in each group of rows to be either first, Last or unique-only. It does first groupBy on all passed groupByColumns and then depending on typeToKeep value it does further operations.
For both first and last option, it adds new temporary row_number column which returns the row number within a group of rows grouped by groupByColumns. Then to find first records it simply filters out all rows with row_number as 1. To find last records within each group it also computes the count value for each group and filters out all the records where row_number is same as group count
For unique-only case it adds new temporary count column which returns the count of rows within a window partition. Then it filters the resultant dataframe with count value 1.
- typeToKeep
option to find kind of rows. Possible values are first, last and unique-only
- groupByColumns
columns to be used to group input records.
- returns
DataFrame with first or last or unique-only records in each grouping of input records.
- Definition Classes
- ExtendedDataFrame
-
def
deduplicateFromColumnNames(typeToKeep: String, groupByColumns: ArrayList[String]): DataFrame
- Definition Classes
- ExtendedDataFrame
-
def
denormalizeSorted(groupByColumns: List[Column] = List(lit(1)), orderByColumns: List[Column] = List(lit(1)), denormalizeRecordExpression: Column, finalizeExpressionMap: Map[String, Column], inputFilter: Option[Column] = None, outputFilter: Option[Column] = None, denormColumnName: String, countColumnName: String = "count"): DataFrame
- Definition Classes
- ExtendedDataFrame
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
generateLogOutput(componentName: String, subComponentName: String = "", perRowEventTypes: Option[Column] = None, perRowEventTexts: Option[Column] = None, inputRowCount: Long = 0, outputRowCount: Option[Long] = Some(0), finalLogEventType: Option[Column] = None, finalLogEventText: Option[Column] = None, finalEventExtraColumnMap: Map[String, Column] = Map(), sparkSession: SparkSession): DataFrame
Method to generate abinitio log output for any component.
Method to generate abinitio log output for any component. This method takes as input array of non-standard events which are emitted by workflow component and serializes these events into separate row. This method will also add start and finish events with adding count information with finish event.
- Definition Classes
- ExtendedDataFrame
-
def
generateSurrogateKeys(keyDF: DataFrame, naturalKeys: List[String], surrogateKey: String, overrideSurrogateKeys: Option[String], computeOldPortOutput: Boolean = false, spark: SparkSession): (DataFrame, DataFrame, DataFrame)
- Definition Classes
- ExtendedDataFrame
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
def
grouped(windowSize: Int): DataFrame
- Definition Classes
- ExtendedDataFrame
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
def
interim(subgraph: String, component: String, port: String)(implicit interimOutput: InterimOutput): DataFrame
- Definition Classes
- ExtendedDataFrame
- Annotations
- @Py4JWhitelist()
-
def
interim(subgraph: String, component: String, port: String, subPath: String, numRows: Int, detailedStats: Boolean = false)(implicit interimOutput: InterimOutput): DataFrame
- Definition Classes
- ExtendedDataFrame
- Annotations
- @Py4JWhitelist()
-
def
interim(subgraph: String, component: String, port: String, subPath: String, numRows: Int, interimOutput: InterimOutput, detailedStats: Boolean): DataFrame
- Definition Classes
- ExtendedDataFrame
- Annotations
- @Py4JWhitelist()
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
def
mergeMultipleFileContentInDataFrame(fileNameDF: DataFrame, spark: SparkSession, abinitioSchema: String, delimiter: String, readFormat: String, joinWithInputDataframe: Boolean): DataFrame
Method to read passed dataframe fileNameDF and read the content of filenames passed in this dataframe.
Method to read passed dataframe fileNameDF and read the content of filenames passed in this dataframe. It will also merge the fileName column and unique sequence id in the final generated dataframe with file content for all passed fileNames.
Finally it joins the dataframe with content of file and dataframe corresponding to input dataframe and returns the joined dataframe.
- Definition Classes
- ExtendedDataFrame
-
def
mergeMultipleFileContentInDataFrame(fileNameDF: DataFrame, spark: SparkSession, outputSchema: StructType, delimiter: String, readFormat: String, joinWithInputDataframe: Boolean, ffSchema: Option[FFSchemaRecord]): DataFrame
Method to read passed dataframe fileNameDF and read the content of filenames passed in this dataframe.
Method to read passed dataframe fileNameDF and read the content of filenames passed in this dataframe. It will also merge the fileName column and unique sequence id in the final generated dataframe with file content for all passed fileNames.
Finally it joins the dataframe with content of file and dataframe corresponding to input dataframe and returns the joined dataframe.
- Definition Classes
- ExtendedDataFrame
-
def
metaPivot(pivotColumns: Seq[String], nameField: String, valueField: String, sparkSession: SparkSession): DataFrame
Method to take pivot on passed pivot columns.
Method to take pivot on passed pivot columns. This method splits records by pivot columns, converting each input record into a series of separate output records. There is one separate output record for each field of data in the original input record which is not in pivot list. Each output record contains the name and value of a single data field from the original input record along with pivot columns.
- Definition Classes
- ExtendedDataFrame
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
normalize(lengthExpression: Option[Column], finishedExpression: Option[Column], finishedCondition: Option[Column], alias: String, colsToSelect: List[Column], tempWindowExpr: Map[String, Column], lengthRelatedGlobalExpressions: Map[String, Column] = Map()): DataFrame
Method to take care of abinitio normalize functionality.
Method to take care of abinitio normalize functionality. It first replicates input dataframe rows, muliple times depending on passed lengthExpression or finishedExpression. LengthExpression evaluates to a number and will replicate each row in input data by this number.
FinishedExpression and finishedCondition are used to apply filter condition on input data and use this condition result to duplicate each input row multiple times.
tempWindowExpr is used to evaluate temp variables for Normalize with Temp case, using window functions. These expressions are then used in computation of final value for normalize output.
- lengthExpression
expression which evaluates to a integer value, used to duplicate input records.
- finishedExpression
expression to be used in filterCondition during its evaluation for duplication of records. return finishedCondition condition to be used to duplicate input records till condition result is false.
- alias
to be used to rename finishedExpressions
- colsToSelect
columns to be selected after normalize operations.
- tempWindowExpr
window expressions to compute value of temp variables.
- returns
final normalize output for both with Temp and without Temp case.
- Definition Classes
- ExtendedDataFrame
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
def
readSeparatedValues(inputColumn: Column, outputSchemaColumns: List[String], recordSeparator: String, fieldSeparator: String): DataFrame
Method to read textual data from inputColumn and split it into multiple records via recordSeparator and then further split each record into multiple columns via fieldSeparator.
Method to read textual data from inputColumn and split it into multiple records via recordSeparator and then further split each record into multiple columns via fieldSeparator. Then finally map the resultant data to output columns passed.
- Definition Classes
- ExtendedDataFrame
-
def
syncDataFrameColumnsWithSchema(columnNames: Seq[String]): DataFrame
Method to sync column names in dataframe with column names passed as input.
Method to sync column names in dataframe with column names passed as input.
- Definition Classes
- ExtendedDataFrame
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
def
unionWithSchema(otherDataFrame: DataFrame): DataFrame
Method to take union of current dataframe with passed otherDataFrame.
Method to take union of current dataframe with passed otherDataFrame. This method also rearranges the columns ot otherDataFrame in the same order as of current dataFrame columns
- Definition Classes
- ExtendedDataFrame
-
lazy val
vectorUDF: UserDefinedFunction
- Definition Classes
- ExtendedDataFrame
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
def
withColumnOptional(name: String, value: Column): DataFrame
Adds a column with defined value, if it doesn't exist.
Adds a column with defined value, if it doesn't exist.
- name
Column's name
- value
New column's value
- returns
DataFrame with a new column if it doesn't exist already
- Definition Classes
- ExtendedDataFrame
-
def
zipWithIndex(startValue: Long = 0L, incrementBy: Long = 1L, indexColName: String, sparkSession: SparkSession): DataFrame
Method to add new unique sequence column in dataframe where value in each row is incremented by
incrementBy
value and sequence starts with startValue.Method to add new unique sequence column in dataframe where value in each row is incremented by
incrementBy
value and sequence starts with startValue.- Definition Classes
- ExtendedDataFrame
Deprecated Value Members
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] ) @Deprecated
- Deprecated