A physical plan that evaluates a PythonUDF, one partition of tuples at a time.
A serialized version of a Python lambda function.
A user-defined Python function.
A user-defined Python function. This is used by the Python API.
A physical plan that evaluates a PythonUDF, one partition of tuples at a time.
Python evaluation works by sending the necessary (projected) input data via a socket to an external Python process, and combine the result from the Python process with the original row.
For each row we send to Python, we also put it in a queue. For each output row from Python, we drain the queue to find the original input row. Note that if the Python process is way too slow, this could lead to the queue growing unbounded and eventually run out of memory.