Class AbstractStatusUpdaterBolt

  • All Implemented Interfaces:
    Serializable, org.apache.storm.task.IBolt, org.apache.storm.topology.IComponent, org.apache.storm.topology.IRichBolt
    Direct Known Subclasses:
    MemoryStatusUpdater, StdOutStatusUpdater

    public abstract class AbstractStatusUpdaterBolt
    extends org.apache.storm.topology.base.BaseRichBolt
    Abstract bolt used to store the status of URLs. Uses the DefaultScheduler and MetadataTransfer.
    See Also:
    Serialized Form
    • Field Detail

      • useCacheParamName

        public static String useCacheParamName
        Parameter name to indicate whether the internal cache should be used for discovered URLs. The value of the parameter is a boolean - true by default.
      • maxFetchErrorsParamName

        public static String maxFetchErrorsParamName
        Number of successive FETCH_ERROR before status changes to ERROR *
      • cacheConfigParamName

        public static String cacheConfigParamName
        Parameter name to configure the cache @see http://docs.guava-libraries.googlecode .com/git/javadoc/com/google/common/cache/CacheBuilderSpec.html Default value is "maximumSize=10000,expireAfterAccess=1h"
      • roundDateParamName

        public static String roundDateParamName
        Used for rounding nextFetchDates. Values are hour, minute or second, the latter is the default value.
      • AS_IS_NEXTFETCHDATE_METADATA

        public static final String AS_IS_NEXTFETCHDATE_METADATA
        Key used to pass a preset Date to use as nextFetchDate. The value must represent a valid instant in UTC and be parsable using DateTimeFormatter.ISO_INSTANT. This also indicates that the storage can be done directly on the metadata as-is.
        See Also:
        Constant Field Values
      • _collector

        protected org.apache.storm.task.OutputCollector _collector
    • Constructor Detail

      • AbstractStatusUpdaterBolt

        public AbstractStatusUpdaterBolt()
    • Method Detail

      • prepare

        public void prepare​(Map<String,​Object> stormConf,
                            org.apache.storm.task.TopologyContext context,
                            org.apache.storm.task.OutputCollector collector)
      • execute

        public void execute​(org.apache.storm.tuple.Tuple tuple)
      • getDocumentID

        protected String getDocumentID​(Metadata metadata,
                                       String url)
        Get the document id.
        Parameters:
        metadata - The Metadata.
        url - The normalised url.
        Returns:
        Return the normalised url SHA-256 digest as String.
      • ack

        protected final void ack​(org.apache.storm.tuple.Tuple t,
                                 String url)
        Must be called by extending classes to store and collect in one go
      • declareOutputFields

        public void declareOutputFields​(org.apache.storm.topology.OutputFieldsDeclarer declarer)