org.apache.spark.sql.execution.python
ApplyInPandasWithStateWriter
Companion object ApplyInPandasWithStateWriter
class ApplyInPandasWithStateWriter extends AnyRef
This class abstracts the complexity on constructing Arrow RecordBatches for data and state with
bin-packing and chunking. The caller only need to call the proper public methods of this class
startNewGroup, writeRow, finalizeGroup, finalizeData and this class will write the data
and state into Arrow RecordBatches with performing bin-pack and chunk internally.
This class requires that the parameter root has been initialized with the Arrow schema like
below:
- data fields
- state field
- nested schema (Refer ApplyInPandasWithStateWriter.STATE_METADATA_SCHEMA)
Please refer the code comment in the implementation to see how the writes of data and state against Arrow RecordBatch work with consideration of bin-packing and chunking.
- Alphabetic
- By Inheritance
- ApplyInPandasWithStateWriter
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
- new ApplyInPandasWithStateWriter(root: VectorSchemaRoot, writer: ArrowStreamWriter, arrowMaxRecordsPerBatch: Int)
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
def
finalizeData(): Unit
Indicates writer that all groups have been processed.
-
def
finalizeGroup(): Unit
Indicates writer that current group has finalized and there will be no further row bound to the current group.
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
def
startNewGroup(keyRow: UnsafeRow, groupState: GroupStateImpl[Row]): Unit
Indicates writer to start with new grouping key.
Indicates writer to start with new grouping key.
- keyRow
The grouping key row for current group.
- groupState
The instance of GroupStateImpl for current group.
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
def
writeRow(dataRow: InternalRow): Unit
Indicates writer to write a row in the current group.
Indicates writer to write a row in the current group.
- dataRow
The row to write in the current group.