public class StageOperationsValidator
extends Object
Validate the input and output fields of operations.
For each operation, the input must be coming from the input schema that stage receives
or it must be one of the output of operations recorded by that stage prior to the
current operation.
For example, consider that stage has input schema as [a, b, c] and it records
following operations in the order
OP1: [a] -> [x]
OP2: [x] -> [y]
OP3: [z] -> [d]
OP4: [c] -> [z]
In the above case, OP1 has valid input [a] which is coming from input schema, OP2 has valid input
[x] which is generated by OP1 which is recorded before OP2. However OP3 has invalid input [z], since
it is not part of the input schema and it is not outputted by any other operation occurring prior to
OP3, even though [z] is created by OP4 which occurs after OP3.
For each operation, the output generated by it must be the part of the output schema or it must be
used as an input by the subsequent operations recorded by that stage.
For example, in above case if the stage output schema is [x, y, z] then the output [d] created
by OP3 is not the part of schema and it is also not used as an input by any subsequent operations
of OP3, so it is treated as invalid.
It is also possible to generate the redundant outputs by operations.
For example, consider we add two more operations to the above list:
OP1: [a] -> [x]
OP2: [x] -> [y]
OP3: [z] -> [d]
OP4: [c] -> [z]
OP5: [c] -> [x]
OP6: [a, c] -> [z]
In this case the output field [z] created by OP4 is redundant (and so is invalid), since the
field [z] of the output schema will always come from OP6 and [z] is not used as input by any
operation subsequent of OP6. However note that even OP1 and OP5 both outputs [x], OP1 output is
not considered as invalid, since its used as input by OP2 which happens before OP5.