Class AbstractSpout
- java.lang.Object
-
- org.apache.storm.topology.base.BaseComponent
-
- org.apache.storm.topology.base.BaseRichSpout
-
- com.digitalpebble.stormcrawler.persistence.AbstractQueryingSpout
-
- com.digitalpebble.stormcrawler.elasticsearch.persistence.AbstractSpout
-
- All Implemented Interfaces:
Serializable
,org.apache.storm.spout.ISpout
,org.apache.storm.topology.IComponent
,org.apache.storm.topology.IRichSpout
- Direct Known Subclasses:
AggregationSpout
,CollapsingSpout
,ScrollSpout
public abstract class AbstractSpout extends AbstractQueryingSpout
- See Also:
- Serialized Form
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class com.digitalpebble.stormcrawler.persistence.AbstractQueryingSpout
AbstractQueryingSpout.InProcessMap<K extends Object,V extends Object>
-
-
Field Summary
Fields Modifier and Type Field Description protected List<String>
bucketSortField
protected static org.elasticsearch.client.RestHighLevelClient
client
protected static String
ESBoltType
protected static String
ESStatusBucketFieldParamName
Field name to use for aggregating *protected static String
ESStatusBucketSortFieldParamName
Field name to use for sorting the URLs within a bucket, not used if empty or null.protected static String
ESStatusFilterParamName
protected static String
ESStatusGlobalSortFieldParamName
Field name to use for sorting the buckets, not used if empty or null.protected static String
ESStatusIndexNameParamName
protected static String
ESStatusMaxBucketParamName
protected static String
ESStatusMaxURLsParamName
protected static String
ESStatusQueryTimeoutParamName
protected List<String>
filterQueries
Query to use as a positive filter, set by es.status.filterQueryprotected String
indexName
protected String
logIdprefix
Used to distinguish between instances in the logs *protected int
maxBucketNum
protected int
maxURLsPerBucket
protected String
partitionField
Field name used for field collapsing e.g.protected Date
queryDate
protected int
queryTimeout
protected int
shardID
when using multiple instances - each one is in charge of a specific shard useful when sharding based on host or domain to guarantee a good mix of URLsprotected String
totalSortField
-
Fields inherited from class com.digitalpebble.stormcrawler.persistence.AbstractQueryingSpout
_collector, beingProcessed, buffer, eventCounter, isInQuery, lastTimeResetToNOW, maxDelayBetweenQueries, minDelayBetweenQueries, queryTimes, resetFetchDateAfterNSecs, resetFetchDateParamName, StatusMaxDelayParamName, StatusMinDelayParamName, StatusTTLPurgatory
-
-
Constructor Summary
Constructors Constructor Description AbstractSpout()
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description void
ack(Object msgId)
protected boolean
addHitToBuffer(org.elasticsearch.search.SearchHit hit)
void
close()
void
fail(Object msgId)
protected Metadata
fromKeyValues(Map<String,Object> keyValues)
void
open(Map<String,Object> stormConf, org.apache.storm.task.TopologyContext context, org.apache.storm.spout.SpoutOutputCollector collector)
protected abstract void
populateBuffer()
Builds a query and use it retrieve the results from ES *-
Methods inherited from class com.digitalpebble.stormcrawler.persistence.AbstractQueryingSpout
activate, deactivate, declareOutputFields, getTimeLastQuerySent, markQueryReceivedNow, nextTuple
-
-
-
-
Field Detail
-
ESBoltType
protected static final String ESBoltType
- See Also:
- Constant Field Values
-
ESStatusIndexNameParamName
protected static final String ESStatusIndexNameParamName
- See Also:
- Constant Field Values
-
ESStatusBucketFieldParamName
protected static final String ESStatusBucketFieldParamName
Field name to use for aggregating *- See Also:
- Constant Field Values
-
ESStatusMaxBucketParamName
protected static final String ESStatusMaxBucketParamName
- See Also:
- Constant Field Values
-
ESStatusMaxURLsParamName
protected static final String ESStatusMaxURLsParamName
- See Also:
- Constant Field Values
-
ESStatusBucketSortFieldParamName
protected static final String ESStatusBucketSortFieldParamName
Field name to use for sorting the URLs within a bucket, not used if empty or null.- See Also:
- Constant Field Values
-
ESStatusGlobalSortFieldParamName
protected static final String ESStatusGlobalSortFieldParamName
Field name to use for sorting the buckets, not used if empty or null.- See Also:
- Constant Field Values
-
ESStatusFilterParamName
protected static final String ESStatusFilterParamName
- See Also:
- Constant Field Values
-
ESStatusQueryTimeoutParamName
protected static final String ESStatusQueryTimeoutParamName
- See Also:
- Constant Field Values
-
filterQueries
protected List<String> filterQueries
Query to use as a positive filter, set by es.status.filterQuery
-
indexName
protected String indexName
-
client
protected static org.elasticsearch.client.RestHighLevelClient client
-
shardID
protected int shardID
when using multiple instances - each one is in charge of a specific shard useful when sharding based on host or domain to guarantee a good mix of URLs
-
logIdprefix
protected String logIdprefix
Used to distinguish between instances in the logs *
-
partitionField
protected String partitionField
Field name used for field collapsing e.g. key *
-
maxURLsPerBucket
protected int maxURLsPerBucket
-
maxBucketNum
protected int maxBucketNum
-
totalSortField
protected String totalSortField
-
queryDate
protected Date queryDate
-
queryTimeout
protected int queryTimeout
-
-
Method Detail
-
open
public void open(Map<String,Object> stormConf, org.apache.storm.task.TopologyContext context, org.apache.storm.spout.SpoutOutputCollector collector)
- Specified by:
open
in interfaceorg.apache.storm.spout.ISpout
- Overrides:
open
in classAbstractQueryingSpout
-
populateBuffer
protected abstract void populateBuffer()
Builds a query and use it retrieve the results from ES *- Specified by:
populateBuffer
in classAbstractQueryingSpout
-
addHitToBuffer
protected final boolean addHitToBuffer(org.elasticsearch.search.SearchHit hit)
-
ack
public void ack(Object msgId)
- Specified by:
ack
in interfaceorg.apache.storm.spout.ISpout
- Overrides:
ack
in classAbstractQueryingSpout
-
fail
public void fail(Object msgId)
- Specified by:
fail
in interfaceorg.apache.storm.spout.ISpout
- Overrides:
fail
in classAbstractQueryingSpout
-
close
public void close()
- Specified by:
close
in interfaceorg.apache.storm.spout.ISpout
- Overrides:
close
in classorg.apache.storm.topology.base.BaseRichSpout
-
-