Class ShardNameTemplate


  • public class ShardNameTemplate
    extends java.lang.Object
    Standard shard naming templates.

    Shard naming templates are strings that may contain placeholders for the shard number and shard count. When constructing a filename for a particular shard number, the upper-case letters 'S' and 'N' are replaced with the 0-padded shard number and shard count respectively.

    Left-padding of the numbers enables lexicographical sorting of the resulting filenames. If the shard number or count are too large for the space provided in the template, then the result may no longer sort lexicographically. For example, a shard template of "S-of-N", for 200 shards, will result in outputs named "0-of-200", ... '10-of-200', '100-of-200", etc.

    Shard numbers start with 0, so the last shard number is the shard count minus one. For example, the template "-SSSSS-of-NNNNN" will be instantiated as "-00000-of-01000" for the first shard (shard 0) of a 1000-way sharded output.

    A shard name template is typically provided along with a name prefix and suffix, which allows constructing complex paths that have embedded shard information. For example, outputs in the form "gs://bucket/path-01-of-99.txt" could be constructed by providing the individual components:

    
     pipeline.apply(
         TextIO.write().to("gs://bucket/path")
                     .withShardNameTemplate("-SS-of-NN")
                     .withSuffix(".txt"))
     

    In the example above, you could make parts of the output configurable by users without the user having to specify all components of the output name.

    If a shard name template does not contain any repeating 'S', then the output shard count must be 1, as otherwise the same filename would be generated for multiple shards.

    • Field Summary

      Fields 
      Modifier and Type Field Description
      static java.lang.String DIRECTORY_CONTAINER
      Shard is a file within a directory.
      static java.lang.String INDEX_OF_MAX
      Shard name containing the index and max.
    • Method Summary

      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • INDEX_OF_MAX

        public static final java.lang.String INDEX_OF_MAX
        Shard name containing the index and max.

        Eg: [prefix]-00000-of-00100[suffix] and [prefix]-00001-of-00100[suffix]

        See Also:
        Constant Field Values
      • DIRECTORY_CONTAINER

        public static final java.lang.String DIRECTORY_CONTAINER
        Shard is a file within a directory.

        Eg: [prefix]/part-00000[suffix] and [prefix]/part-00001[suffix]

        See Also:
        Constant Field Values
    • Constructor Detail

      • ShardNameTemplate

        public ShardNameTemplate()