Class Watch


  • public class Watch
    extends java.lang.Object
    Given a "poll function" that produces a potentially growing set of outputs for an input, this transform simultaneously continuously watches the growth of output sets of all inputs, until a per-input termination condition is reached.

    The output is returned as an unbounded PCollection of KV<InputT, OutputT>, where each OutputT is associated with the InputT that produced it, and is assigned with the timestamp that the poll function returned when this output was detected for the first time.

    Hypothetical usage example for watching new files in a collection of directories, where for each directory we assume that new files will not appear if the directory contains a file named ".complete":

    
     PCollection<String> directories = ...;  // E.g. Create.of(single directory)
     PCollection<KV<String, String>> matches = filepatterns.apply(Watch.<String, String>growthOf(
       new PollFn<String, String>() {
         public PollResult<String> apply(TimestampedValue<String> input) {
           String directory = input.getValue();
           List<TimestampedValue<String>> outputs = new ArrayList<>();
           ... List the directory and get creation times of all files ...
           boolean isComplete = ... does a file ".complete" exist in the directory ...
           return isComplete ? PollResult.complete(outputs) : PollResult.incomplete(outputs);
         }
       })
       // Poll each directory every 5 seconds
       .withPollInterval(Duration.standardSeconds(5))
       // Stop watching each directory 12 hours after it's seen even if it's incomplete
       .withTerminationPerInput(afterTotalOf(Duration.standardHours(12)));
     

    By default, the watermark for a particular input is computed from a poll result as "earliest timestamp of new elements in this poll result". It can also be set explicitly via Watch.Growth.PollResult.withWatermark(org.joda.time.Instant) if the Watch.Growth.PollFn can provide a more optimistic estimate.

    Note: This transform works only in runners supporting Splittable DoFn: see capability matrix.

    • Constructor Detail

      • Watch

        public Watch()
    • Method Detail

      • growthOf

        public static <InputT,​OutputT> Watch.Growth<InputT,​OutputT,​OutputT> growthOf​(Watch.Growth.PollFn<InputT,​OutputT> pollFn,
                                                                                                       Requirements requirements)
        Watches the growth of the given poll function. See class documentation for more details.
      • growthOf

        public static <InputT,​OutputT> Watch.Growth<InputT,​OutputT,​OutputT> growthOf​(Watch.Growth.PollFn<InputT,​OutputT> pollFn)
        Watches the growth of the given poll function. See class documentation for more details.
      • growthOf

        public static <InputT,​OutputT,​KeyT> Watch.Growth<InputT,​OutputT,​KeyT> growthOf​(Contextful<Watch.Growth.PollFn<InputT,​OutputT>> pollFn,
                                                                                                               SerializableFunction<OutputT,​KeyT> outputKeyFn)
        Watches the growth of the given poll function, using the given "key function" to deduplicate outputs. For example, if OutputT is a filename + file size, this can be a function that returns just the filename, so that if the same file is observed multiple times with different sizes, only the first observation is emitted.

        By default, this is the identity function, i.e. the output is used as its own key.