Class AbstractQueryingSpout

  • All Implemented Interfaces:
    Serializable, org.apache.storm.spout.ISpout, org.apache.storm.topology.IComponent, org.apache.storm.topology.IRichSpout

    public abstract class AbstractQueryingSpout
    extends org.apache.storm.topology.base.BaseRichSpout
    Common features of spouts which query a backend to generate tuples. Tracks the URLs being processes, with an optional delay before they are removed from the cache. Throttles the rate a which queries are emitted and provides a buffer to store the URLs waiting to be sent.
    Since:
    1.11
    See Also:
    Serialized Form
    • Field Detail

      • StatusTTLPurgatory

        protected static final String StatusTTLPurgatory
        Time in seconds for which acked or failed URLs will be considered for fetching again, default 30 secs.
        See Also:
        Constant Field Values
      • StatusMinDelayParamName

        protected static final String StatusMinDelayParamName
        Min time to allow between 2 successive queries to the backend. Value in msecs, default 2000.
        See Also:
        Constant Field Values
      • minDelayBetweenQueries

        protected long minDelayBetweenQueries
      • StatusMaxDelayParamName

        protected static final String StatusMaxDelayParamName
        Max time to allow between 2 successive queries to the backend. Value in msecs, default 20000.
        See Also:
        Constant Field Values
      • maxDelayBetweenQueries

        protected long maxDelayBetweenQueries
      • resetFetchDateParamName

        protected static final String resetFetchDateParamName
        Delay in seconds after which the nextFetchDate filter is set to the current time, default 120. Is used to prevent the search to be limited to a handful of sources.
        See Also:
        Constant Field Values
      • resetFetchDateAfterNSecs

        protected int resetFetchDateAfterNSecs
      • lastTimeResetToNOW

        protected Instant lastTimeResetToNOW
      • eventCounter

        protected org.apache.storm.metric.api.MultiCountMetric eventCounter
      • _collector

        protected org.apache.storm.spout.SpoutOutputCollector _collector
      • isInQuery

        protected AtomicBoolean isInQuery
        Required for implementations doing asynchronous calls *
      • beingProcessed

        protected AbstractQueryingSpout.InProcessMap<String,​Object> beingProcessed
        Map to keep in-process URLs, with the URL as key and optional value depending on the spout implementation. The entries are kept in a cache for a configurable amount of time to avoid that some items are fetched a second time if new items are queried shortly after they have been acked.
    • Constructor Detail

      • AbstractQueryingSpout

        public AbstractQueryingSpout()
    • Method Detail

      • open

        public void open​(Map<String,​Object> stormConf,
                         org.apache.storm.task.TopologyContext context,
                         org.apache.storm.spout.SpoutOutputCollector collector)
      • populateBuffer

        protected abstract void populateBuffer()
        Method where specific implementations query the storage. Implementations should call markQueryReceivedNow when the documents have been received.
      • nextTuple

        public void nextTuple()
      • getTimeLastQuerySent

        protected long getTimeLastQuerySent()
      • markQueryReceivedNow

        protected void markQueryReceivedNow()
        sets the marker that we are in a query to false and timeLastQueryReceived to now
      • activate

        public void activate()
        Specified by:
        activate in interface org.apache.storm.spout.ISpout
        Overrides:
        activate in class org.apache.storm.topology.base.BaseRichSpout
      • deactivate

        public void deactivate()
        Specified by:
        deactivate in interface org.apache.storm.spout.ISpout
        Overrides:
        deactivate in class org.apache.storm.topology.base.BaseRichSpout
      • ack

        public void ack​(Object msgId)
        Specified by:
        ack in interface org.apache.storm.spout.ISpout
        Overrides:
        ack in class org.apache.storm.topology.base.BaseRichSpout
      • fail

        public void fail​(Object msgId)
        Specified by:
        fail in interface org.apache.storm.spout.ISpout
        Overrides:
        fail in class org.apache.storm.topology.base.BaseRichSpout
      • declareOutputFields

        public void declareOutputFields​(org.apache.storm.topology.OutputFieldsDeclarer declarer)