Class Sets
- java.lang.Object
-
- org.apache.beam.sdk.transforms.Sets
-
public class Sets extends java.lang.Object
ThePTransform
s that allow to compute different set functions acrossPCollection
s.They come in two variants. 1. Between two
PCollection
2. Between two or morePCollection
in aPCollectionList
.Following
PTransform
s follows SET DISTINCT semantics: intersectDistinct, expectDistinct, unionDistinctFollowing
PTransform
s follows SET ALL semantics: intersectAll, expectAll, unionAllFor example, the following demonstrates intersectDistinct between two collections
PCollection
s.Pipeline p = ...; PCollection<String> left = p.apply(Create.of("1", "2", "3", "3", "4", "5")); PCollection<String> right = p.apply(Create.of("1", "3", "4", "4", "6")); PCollection<String> results = left.apply(SetFns.intersectDistinct(right)); // results will be PCollection<String> containing: "1","3","4"
For example, the following demonstrates intersectDistinct between three collections
PCollection
s in aPCollectionList
.Pipeline p = ...; PCollection<String> first = p.apply(Create.of("1", "2", "3", "3", "4", "5")); PCollection<String> second = p.apply(Create.of("1", "3", "4", "4", "6")); PCollection<String> third = p.apply(Create.of("3", "4", "4")); // Following example will perform (first intersect second) intersect third. PCollection<String> results = PCollectionList.of(first).and(second).and(third) .apply(SetFns.intersectDistinct()); // results will be PCollection<String> containing: "3","4"
-
-
Constructor Summary
Constructors Constructor Description Sets()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static <T> PTransform<PCollectionList<T>,PCollection<T>>
exceptAll()
Returns a newPTransform
transform that follows SET ALL semantics which takes aPCollectionList<PCollection<T>>
and returns aPCollection<T>
containing the difference all (exceptAll) of collections done in order for all collections inPCollectionList<T>
.static <T> PTransform<PCollection<T>,PCollection<T>>
exceptAll(PCollection<T> rightCollection)
Returns a newPTransform
transform that follows SET ALL semantics to compute the difference all (exceptAll) with providedPCollection<T>
.static <T> PTransform<PCollectionList<T>,PCollection<T>>
exceptDistinct()
Returns aPTransform
that takes aPCollectionList<PCollection<T>>
and returns aPCollection<T>
containing the difference (except) of collections done in order for all collections inPCollectionList<T>
.static <T> PTransform<PCollection<T>,PCollection<T>>
exceptDistinct(PCollection<T> rightCollection)
Returns a newPTransform
transform that follows SET DISTINCT semantics to compute the difference (except) with providedPCollection<T>
.static <T> PTransform<PCollectionList<T>,PCollection<T>>
intersectAll()
Returns a newPTransform
transform that follows SET ALL semantics which takes aPCollectionList<PCollection<T>>
and returns aPCollection<T>
containing the intersection all of collections done in order for all collections inPCollectionList<T>
.static <T> PTransform<PCollection<T>,PCollection<T>>
intersectAll(PCollection<T> rightCollection)
Returns a newPTransform
transform that follows SET ALL semantics to compute the intersection with providedPCollection<T>
.static <T> PTransform<PCollectionList<T>,PCollection<T>>
intersectDistinct()
Returns aPTransform
that takes aPCollectionList<PCollection<T>>
and returns aPCollection<T>
containing the intersection of collections done in order for all collections inPCollectionList<T>
.static <T> PTransform<PCollection<T>,PCollection<T>>
intersectDistinct(PCollection<T> rightCollection)
Returns a newPTransform
transform that follows SET DISTINCT semantics to compute the intersection with providedPCollection<T>
.static <T> Flatten.PCollections<T>
unionAll()
Returns a newPTransform
transform that follows SET ALL semantics which takes aPCollectionList<PCollection<T>>
and returns aPCollection<T>
containing the unionAll of collections done in order for all collections inPCollectionList<T>
.static <T> PTransform<PCollection<T>,PCollection<T>>
unionAll(PCollection<T> rightCollection)
Returns a newPTransform
transform that follows SET ALL semantics to compute the unionAll with providedPCollection<T>
.static <T> PTransform<PCollectionList<T>,PCollection<T>>
unionDistinct()
Returns a newPTransform
transform that follows SET DISTINCT semantics which takes aPCollectionList<PCollection<T>>
and returns aPCollection<T>
containing the union of collections done in order for all collections inPCollectionList<T>
.static <T> PTransform<PCollection<T>,PCollection<T>>
unionDistinct(PCollection<T> rightCollection)
Returns a newPTransform
transform that follows SET DISTINCT semantics to compute the union with providedPCollection<T>
.
-
-
-
Method Detail
-
intersectDistinct
public static <T> PTransform<PCollection<T>,PCollection<T>> intersectDistinct(PCollection<T> rightCollection)
Returns a newPTransform
transform that follows SET DISTINCT semantics to compute the intersection with providedPCollection<T>
.The argument should not be modified after this is called.
The elements of the output
PCollection
will all distinct elements that present in both pipeline is constructed and providedPCollection
.Note that this transform requires that the
Coder
of the allPCollection<T>
to be deterministic (seeCoder.verifyDeterministic()
). If the collectionCoder
is not deterministic, an exception is thrown at pipeline construction time.All inputs must have equal
WindowFn
s and compatible triggers (seeTrigger.isCompatible(Trigger)
). Triggers with multiple firings may lead to nondeterministic results since the thisPTransform
is only computed over each individual firing.By default, the output
PCollection<T>
encodes its elements using the sameCoder
as that of the inputPCollection<T>
Pipeline p = ...; PCollection<String> left = p.apply(Create.of("1", "2", "3", "3", "4", "5")); PCollection<String> right = p.apply(Create.of("1", "3", "4", "4", "6")); PCollection<String> results = left.apply(SetFns.intersectDistinct(right)); // results will be PCollection<String> containing: "1","3","4"
- Type Parameters:
T
- the type of the elements in the input and outputPCollection<T>
s.
-
intersectDistinct
public static <T> PTransform<PCollectionList<T>,PCollection<T>> intersectDistinct()
Returns aPTransform
that takes aPCollectionList<PCollection<T>>
and returns aPCollection<T>
containing the intersection of collections done in order for all collections inPCollectionList<T>
.Returns a new
PTransform
transform that follows SET DISTINCT semantics which takes aPCollectionList<PCollection<T>>
and returns aPCollection<T>
containing the intersection of collections done in order for all collections inPCollectionList<T>
.The elements of the output
PCollection
will have all distinct elements that are present in both pipeline is constructed and nextPCollection
in the list and applied to all collections in order.Note that this transform requires that the
Coder
of the allPCollection<T>
to be deterministic (seeCoder.verifyDeterministic()
). If the collectionCoder
is not deterministic, an exception is thrown at pipeline construction time.All inputs must have equal
WindowFn
s and compatible triggers (seeTrigger.isCompatible(Trigger)
).Triggers with multiple firings may lead to nondeterministic results since the thisPTransform
is only computed over each individual firing.By default, the output
PCollection<T>
encodes its elements using the sameCoder
as that of the firstPCollection<T>
inPCollectionList<T>
.Pipeline p = ...; PCollection<String> first = p.apply(Create.of("1", "2", "3", "3", "4", "5")); PCollection<String> second = p.apply(Create.of("1", "3", "4", "4", "6")); PCollection<String> third = p.apply(Create.of("3", "4", "4")); // Following example will perform (first intersect second) intersect third. PCollection<String> results = PCollectionList.of(first).and(second).and(third) .apply(SetFns.intersectDistinct()); // results will be PCollection<String> containing: "3","4"
- Type Parameters:
T
- the type of the elements in the inputPCollectionList<T>
and outputPCollection<T>
s.
-
intersectAll
public static <T> PTransform<PCollection<T>,PCollection<T>> intersectAll(PCollection<T> rightCollection)
Returns a newPTransform
transform that follows SET ALL semantics to compute the intersection with providedPCollection<T>
.The argument should not be modified after this is called.
The elements of the output
PCollection
which will follow INTESECT_ALL Semantics as follows: Given there are m elements on pipeline which is constructedPCollection
(left) and n elements on in providedPCollection
(right): - it will output MIN(m - n, 0) elements of left for all elements which are present in both left and right.Note that this transform requires that the
Coder
of the allPCollection<T>
to be deterministic (seeCoder.verifyDeterministic()
). If the collectionCoder
is not deterministic, an exception is thrown at pipeline construction time.All inputs must have equal
WindowFn
s and compatible triggers (seeTrigger.isCompatible(Trigger)
).Triggers with multiple firings may lead to nondeterministic results since the thisPTransform
is only computed over each individual firing.By default, the output
PCollection<T>
encodes its elements using the sameCoder
as that of the inputPCollection<T>
Pipeline p = ...; PCollection<String> left = p.apply(Create.of("1", "1", "1", "2", "3", "3", "4", "5")); PCollection<String> right = p.apply(Create.of("1", "1", "3", "4", "4", "6")); PCollection<String> results = left.apply(SetFns.intersectAll(right)); // results will be PCollection<String> containing: "1","1","3","4"
- Type Parameters:
T
- the type of the elements in the input and outputPCollection<T>
s.
-
intersectAll
public static <T> PTransform<PCollectionList<T>,PCollection<T>> intersectAll()
Returns a newPTransform
transform that follows SET ALL semantics which takes aPCollectionList<PCollection<T>>
and returns aPCollection<T>
containing the intersection all of collections done in order for all collections inPCollectionList<T>
.The elements of the output
PCollection
which will follow INTERSECT_ALL semantics. Output is calculated as follows: Given there are m elements on pipeline which is constructedPCollection
(left) and n elements on in providedPCollection
(right): - it will output MIN(m - n, 0) elements of left for all elements which are present in both left and right.Note that this transform requires that the
Coder
of the allPCollection<T>
to be deterministic (seeCoder.verifyDeterministic()
). If the collectionCoder
is not deterministic, an exception is thrown at pipeline construction time.All inputs must have equal
WindowFn
s and compatible triggers (seeTrigger.isCompatible(Trigger)
).Triggers with multiple firings may lead to nondeterministic results since the thisPTransform
is only computed over each individual firing.By default, the output
PCollection<T>
encodes its elements using the sameCoder
as that of the firstPCollection<T>
inPCollectionList<T>
.Pipeline p = ...; PCollection<String> first = p.apply(Create.of("1", "1", "1", "2", "3", "3", "4", "5")); PCollection<String> second = p.apply(Create.of("1", "1", "3", "4", "4", "6")); PCollection<String> third = p.apply(Create.of("1", "5")); // Following example will perform (first intersect second) intersect third. PCollection<String> results = PCollectionList.of(first).and(second).and(third) .apply(SetFns.intersectAll()); // results will be PCollection<String> containing: "1"
- Type Parameters:
T
- the type of the elements in the inputPCollectionList<T>
and outputPCollection<T>
s.
-
exceptDistinct
public static <T> PTransform<PCollection<T>,PCollection<T>> exceptDistinct(PCollection<T> rightCollection)
Returns a newPTransform
transform that follows SET DISTINCT semantics to compute the difference (except) with providedPCollection<T>
.The argument should not be modified after this is called.
The elements of the output
PCollection
will all distinct elements that present in pipeline is constructed but not present in providedPCollection
.Note that this transform requires that the
Coder
of the allPCollection<T>
to be deterministic (seeCoder.verifyDeterministic()
). If the collectionCoder
is not deterministic, an exception is thrown at pipeline construction time.All inputs must have equal
WindowFn
s and compatible triggers (seeTrigger.isCompatible(Trigger)
).Triggers with multiple firings may lead to nondeterministic results since the thisPTransform
is only computed over each individual firing.By default, the output
PCollection<T>
encodes its elements using the sameCoder
as that of the inputPCollection<T>
Pipeline p = ...; PCollection<String> left = p.apply(Create.of("1", "1", "1", "2", "3", "3","4", "5")); PCollection<String> right = p.apply(Create.of("1", "1", "3", "4", "4", "6")); PCollection<String> results = left.apply(SetFns.exceptDistinct(right)); // results will be PCollection<String> containing: "2","5"
- Type Parameters:
T
- the type of the elements in the input and outputPCollection<T>
s.
-
exceptDistinct
public static <T> PTransform<PCollectionList<T>,PCollection<T>> exceptDistinct()
Returns aPTransform
that takes aPCollectionList<PCollection<T>>
and returns aPCollection<T>
containing the difference (except) of collections done in order for all collections inPCollectionList<T>
.Returns a new
PTransform
transform that follows SET DISTINCT semantics which takes aPCollectionList<PCollection<T>>
and returns aPCollection<T>
containing the difference (except) of collections done in order for all collections inPCollectionList<T>
.The elements of the output
PCollection
will have all distinct elements that are present in pipeline is constructed but not present in nextPCollection
in the list and applied to all collections in order.Note that this transform requires that the
Coder
of the allPCollection<T>
to be deterministic (seeCoder.verifyDeterministic()
). If the collectionCoder
is not deterministic, an exception is thrown at pipeline construction time.All inputs must have equal
WindowFn
s and compatible triggers (seeTrigger.isCompatible(Trigger)
).Triggers with multiple firings may lead to nondeterministic results since the thisPTransform
is only computed over each individual firing.By default, the output
PCollection<T>
encodes its elements using the sameCoder
as that of the firstPCollection<T>
inPCollectionList<T>
.Pipeline p = ...; PCollection<String> first = p.apply(Create.of("1", "1", "1", "2", "3", "3", "4", "5")); PCollection<String> second = p.apply(Create.of("1", "1", "3", "4", "4", "6")); PCollection<String> third = p.apply(Create.of("1", "2", "2")); // Following example will perform (first intersect second) intersect third. PCollection<String> results = PCollectionList.of(first).and(second).and(third) .apply(SetFns.exceptDistinct()); // results will be PCollection<String> containing: "5"
- Type Parameters:
T
- the type of the elements in the inputPCollectionList<T>
and outputPCollection<T>
s.
-
exceptAll
public static <T> PTransform<PCollection<T>,PCollection<T>> exceptAll(PCollection<T> rightCollection)
Returns a newPTransform
transform that follows SET ALL semantics to compute the difference all (exceptAll) with providedPCollection<T>
.The argument should not be modified after this is called.
The elements of the output
PCollection
which will follow EXCEPT_ALL Semantics as follows: Given there are m elements on pipeline which is constructedPCollection
(left) and n elements on in providedPCollection
(right): - it will output m elements of left for all elements which are present in left but not in right. - it will output MAX(m - n, 0) elements of left for all elements which are present in both left and right.Note that this transform requires that the
Coder
of the allPCollection<T>
to be deterministic (seeCoder.verifyDeterministic()
). If the collectionCoder
is not deterministic, an exception is thrown at pipeline construction time.All inputs must have equal
WindowFn
s and compatible triggers (seeTrigger.isCompatible(Trigger)
).Triggers with multiple firings may lead to nondeterministic results since the thisPTransform
is only computed over each individual firing.By default, the output
PCollection<T>
encodes its elements using the sameCoder
as that of the inputPCollection<T>
Pipeline p = ...; PCollection<String> left = p.apply(Create.of("1", "1", "1", "2", "3", "3", "3", "4", "5")); PCollection<String> right = p.apply(Create.of("1", "3", "4", "4", "6")); PCollection<String> results = left.apply(SetFns.exceptAll(right)); // results will be PCollection<String> containing: "1","1","2","3","3","5"
- Type Parameters:
T
- the type of the elements in the input and outputPCollection<T>
s.
-
exceptAll
public static <T> PTransform<PCollectionList<T>,PCollection<T>> exceptAll()
Returns a newPTransform
transform that follows SET ALL semantics which takes aPCollectionList<PCollection<T>>
and returns aPCollection<T>
containing the difference all (exceptAll) of collections done in order for all collections inPCollectionList<T>
.The elements of the output
PCollection
which will follow EXCEPT_ALL semantics. Output is calculated as follows: Given there are m elements on pipeline which is constructedPCollection
(left) and n elements on in providedPCollection
(right): - it will output m elements of left for all elements which are present in left but not in right. - it will output MAX(m - n, 0) elements of left for all elements which are present in both left and right.Note that this transform requires that the
Coder
of the allPCollection<T>
to be deterministic (seeCoder.verifyDeterministic()
). If the collectionCoder
is not deterministic, an exception is thrown at pipeline construction time.All inputs must have equal
WindowFn
s and compatible triggers (seeTrigger.isCompatible(Trigger)
).Triggers with multiple firings may lead to nondeterministic results since the thisPTransform
is only computed over each individual firing.By default, the output
PCollection<T>
encodes its elements using the sameCoder
as that of the firstPCollection<T>
inPCollectionList<T>
.Pipeline p = ...; PCollection<String> first = p.apply(Create.of("1", "1", "1", "2", "3", "3", "3", "4", "5")); PCollection<String> second = p.apply(Create.of("1", "3", "4", "4", "6")); PCollection<String> third = p.apply(Create.of("1", "5")); // Following example will perform (first intersect second) intersect third. PCollection<String> results = PCollectionList.of(first).and(second).and(third) .apply(SetFns.exceptAll()); // results will be PCollection<String> containing: "1","2","3","3"
- Type Parameters:
T
- the type of the elements in the inputPCollectionList<T>
and outputPCollection<T>
s.
-
unionDistinct
public static <T> PTransform<PCollection<T>,PCollection<T>> unionDistinct(PCollection<T> rightCollection)
Returns a newPTransform
transform that follows SET DISTINCT semantics to compute the union with providedPCollection<T>
.The argument should not be modified after this is called.
The elements of the output
PCollection
will all distinct elements that present in pipeline is constructed or present in providedPCollection
.Note that this transform requires that the
Coder
of the allPCollection<T>
to be deterministic (seeCoder.verifyDeterministic()
). If the collectionCoder
is not deterministic, an exception is thrown at pipeline construction time.All inputs must have equal
WindowFn
s and compatible triggers (seeTrigger.isCompatible(Trigger)
).Triggers with multiple firings may lead to nondeterministic results since the thisPTransform
is only computed over each individual firing.By default, the output
PCollection<T>
encodes its elements using the sameCoder
as that of the inputPCollection<T>
Pipeline p = ...; PCollection<String> left = p.apply(Create.of("1", "1", "2")); PCollection<String> right = p.apply(Create.of("1", "3", "4", "4")); PCollection<String> results = left.apply(SetFns.unionDistinct(right)); // results will be PCollection<String> containing: "1","2","3","4"
- Type Parameters:
T
- the type of the elements in the input and outputPCollection<T>
s.
-
unionDistinct
public static <T> PTransform<PCollectionList<T>,PCollection<T>> unionDistinct()
Returns a newPTransform
transform that follows SET DISTINCT semantics which takes aPCollectionList<PCollection<T>>
and returns aPCollection<T>
containing the union of collections done in order for all collections inPCollectionList<T>
.The elements of the output
PCollection
will have all distinct elements that are present in pipeline is constructed or present in nextPCollection
in the list and applied to all collections in order.Note that this transform requires that the
Coder
of the allPCollection<T>
to be deterministic (seeCoder.verifyDeterministic()
). If the collectionCoder
is not deterministic, an exception is thrown at pipeline construction time.All inputs must have equal
WindowFn
s and compatible triggers (seeTrigger.isCompatible(Trigger)
).Triggers with multiple firings may lead to nondeterministic results since the thisPTransform
is only computed over each individual firing.By default, the output
PCollection<T>
encodes its elements using the sameCoder
as that of the firstPCollection<T>
inPCollectionList<T>
.Pipeline p = ...; PCollection<String> first = p.apply(Create.of("1", "1", "2")); PCollection<String> second = p.apply(Create.of("1", "3", "4", "4")); PCollection<String> third = p.apply(Create.of("1", "5")); // Following example will perform (first intersect second) intersect third. PCollection<String> results = PCollectionList.of(first).and(second).and(third) .apply(SetFns.unionDistinct()); // results will be PCollection<String> containing: "1","2","3","4","5"
- Type Parameters:
T
- the type of the elements in the inputPCollectionList<T>
and outputPCollection<T>
s.
-
unionAll
public static <T> PTransform<PCollection<T>,PCollection<T>> unionAll(PCollection<T> rightCollection)
Returns a newPTransform
transform that follows SET ALL semantics to compute the unionAll with providedPCollection<T>
.The argument should not be modified after this is called.
The elements of the output
PCollection
which will follow UNION_ALL semantics as follows: Given there are m elements on pipeline which is constructedPCollection
(left) and n elements on in providedPCollection
(right): - it will output m elements of left and m elements of right.Note that this transform requires that the
Coder
of the allPCollection<T>
to be deterministic (seeCoder.verifyDeterministic()
). If the collectionCoder
is not deterministic, an exception is thrown at pipeline construction time.All inputs must have equal
WindowFn
s and compatible triggers (seeTrigger.isCompatible(Trigger)
).Triggers with multiple firings may lead to nondeterministic results since the thisPTransform
is only computed over each individual firing.By default, the output
PCollection<T>
encodes its elements using the sameCoder
as that of the inputPCollection<T>
Pipeline p = ...; PCollection<String> left = p.apply(Create.of("1", "1", "2")); PCollection<String> right = p.apply(Create.of("1", "3", "4", "4")); PCollection<String> results = left.apply(SetFns.unionAll(right)); // results will be PCollection<String> containing: "1","1","1","2","3","4","4"
- Type Parameters:
T
- the type of the elements in the input and outputPCollection<T>
s.
-
unionAll
public static <T> Flatten.PCollections<T> unionAll()
Returns a newPTransform
transform that follows SET ALL semantics which takes aPCollectionList<PCollection<T>>
and returns aPCollection<T>
containing the unionAll of collections done in order for all collections inPCollectionList<T>
.The elements of the output
PCollection
which will follow UNION_ALL semantics. Output is calculated as follows: Given there are m elements on pipeline which is constructedPCollection
(left) and n elements on in providedPCollection
(right): - it will output m elements of left and m elements of right.Note that this transform requires that the
Coder
of the all inputsPCollection<T>
to be deterministic (seeCoder.verifyDeterministic()
). If the collectionCoder
is not deterministic, an exception is thrown at pipeline construction time.All inputs must have equal
WindowFn
s and compatible triggers (seeTrigger.isCompatible(Trigger)
).Triggers with multiple firings may lead to nondeterministic results since the thisPTransform
is only computed over each individual firing.By default, the output
PCollection<T>
encodes its elements using the sameCoder
as that of the firstPCollection<T>
inPCollectionList<T>
.Pipeline p = ...; PCollection<String> first = p.apply(Create.of("1", "1", "2")); PCollection<String> second = p.apply(Create.of("1", "3", "4", "4")); PCollection<String> third = p.apply(Create.of("1", "5")); // Following example will perform (first intersect second) intersect third. PCollection<String> results = PCollectionList.of(first).and(second).and(third) .apply(SetFns.unionAll()); // results will be PCollection<String> containing: "1","1","1","1","2","3","4","4","5"
- Type Parameters:
T
- the type of the elements in the inputPCollectionList<T>
and outputPCollection<T>
s.
-
-