Class AbstractStatusUpdaterBolt
- java.lang.Object
-
- org.apache.storm.topology.base.BaseComponent
-
- org.apache.storm.topology.base.BaseRichBolt
-
- com.digitalpebble.stormcrawler.persistence.AbstractStatusUpdaterBolt
-
- All Implemented Interfaces:
Serializable
,org.apache.storm.task.IBolt
,org.apache.storm.topology.IComponent
,org.apache.storm.topology.IRichBolt
- Direct Known Subclasses:
MemoryStatusUpdater
,StdOutStatusUpdater
public abstract class AbstractStatusUpdaterBolt extends org.apache.storm.topology.base.BaseRichBolt
Abstract bolt used to store the status of URLs. Uses the DefaultScheduler and MetadataTransfer.- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description protected org.apache.storm.task.OutputCollector
_collector
static String
AS_IS_NEXTFETCHDATE_METADATA
Key used to pass a preset Date to use as nextFetchDate.static String
cacheConfigParamName
Parameter name to configure the cache @see http://docs.guava-libraries.googlecode .com/git/javadoc/com/google/common/cache/CacheBuilderSpec.html Default value is "maximumSize=10000,expireAfterAccess=1h"static String
maxFetchErrorsParamName
Number of successive FETCH_ERROR before status changes to ERROR *static String
roundDateParamName
Used for rounding nextFetchDates.static String
useCacheParamName
Parameter name to indicate whether the internal cache should be used for discovered URLs.
-
Constructor Summary
Constructors Constructor Description AbstractStatusUpdaterBolt()
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description protected void
ack(org.apache.storm.tuple.Tuple t, String url)
Must be called by extending classes to store and collect in one govoid
declareOutputFields(org.apache.storm.topology.OutputFieldsDeclarer declarer)
void
execute(org.apache.storm.tuple.Tuple tuple)
protected String
getDocumentID(Metadata metadata, String url)
Get the document id.void
prepare(Map<String,Object> stormConf, org.apache.storm.task.TopologyContext context, org.apache.storm.task.OutputCollector collector)
protected abstract void
store(String url, Status status, Metadata metadata, Optional<Date> nextFetch, org.apache.storm.tuple.Tuple t)
-
-
-
Field Detail
-
useCacheParamName
public static String useCacheParamName
Parameter name to indicate whether the internal cache should be used for discovered URLs. The value of the parameter is a boolean - true by default.
-
maxFetchErrorsParamName
public static String maxFetchErrorsParamName
Number of successive FETCH_ERROR before status changes to ERROR *
-
cacheConfigParamName
public static String cacheConfigParamName
Parameter name to configure the cache @see http://docs.guava-libraries.googlecode .com/git/javadoc/com/google/common/cache/CacheBuilderSpec.html Default value is "maximumSize=10000,expireAfterAccess=1h"
-
roundDateParamName
public static String roundDateParamName
Used for rounding nextFetchDates. Values are hour, minute or second, the latter is the default value.
-
AS_IS_NEXTFETCHDATE_METADATA
public static final String AS_IS_NEXTFETCHDATE_METADATA
Key used to pass a preset Date to use as nextFetchDate. The value must represent a valid instant in UTC and be parsable usingDateTimeFormatter.ISO_INSTANT
. This also indicates that the storage can be done directly on the metadata as-is.- See Also:
- Constant Field Values
-
_collector
protected org.apache.storm.task.OutputCollector _collector
-
-
Method Detail
-
prepare
public void prepare(Map<String,Object> stormConf, org.apache.storm.task.TopologyContext context, org.apache.storm.task.OutputCollector collector)
-
execute
public void execute(org.apache.storm.tuple.Tuple tuple)
-
getDocumentID
protected String getDocumentID(Metadata metadata, String url)
Get the document id.- Parameters:
metadata
- TheMetadata
.url
- The normalised url.- Returns:
- Return the normalised url SHA-256 digest as String.
-
ack
protected final void ack(org.apache.storm.tuple.Tuple t, String url)
Must be called by extending classes to store and collect in one go
-
store
protected abstract void store(String url, Status status, Metadata metadata, Optional<Date> nextFetch, org.apache.storm.tuple.Tuple t) throws Exception
- Throws:
Exception
-
declareOutputFields
public void declareOutputFields(org.apache.storm.topology.OutputFieldsDeclarer declarer)
-
-