A B C D E F G H I J L M N O P Q R S T W 
All Classes All Packages

A

AbstractSpout - Class in com.digitalpebble.stormcrawler.elasticsearch.persistence
 
AbstractSpout() - Constructor for class com.digitalpebble.stormcrawler.elasticsearch.persistence.AbstractSpout
 
ack(Object) - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AbstractSpout
 
addHitInfoToMetadata(Metadata, SearchHit) - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AbstractSpout
 
addHitToBuffer(SearchHit) - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AbstractSpout
 
addToProcessor(IndexRequest) - Method in class com.digitalpebble.stormcrawler.elasticsearch.ElasticSearchConnection
 
afterBulk(long, BulkRequest, Throwable) - Method in class com.digitalpebble.stormcrawler.elasticsearch.bolt.IndexerBolt
 
afterBulk(long, BulkRequest, Throwable) - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.StatusUpdaterBolt
 
afterBulk(long, BulkRequest, BulkResponse) - Method in class com.digitalpebble.stormcrawler.elasticsearch.bolt.IndexerBolt
 
afterBulk(long, BulkRequest, BulkResponse) - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.StatusUpdaterBolt
 
AggregationSpout - Class in com.digitalpebble.stormcrawler.elasticsearch.persistence
Spout which pulls URL from an ES index.
AggregationSpout() - Constructor for class com.digitalpebble.stormcrawler.elasticsearch.persistence.AggregationSpout
 

B

beforeBulk(long, BulkRequest) - Method in class com.digitalpebble.stormcrawler.elasticsearch.bolt.IndexerBolt
 
beforeBulk(long, BulkRequest) - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.StatusUpdaterBolt
 
bucketSortField - Variable in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AbstractSpout
 
BulkItemResponseToFailedFlag - Class in com.digitalpebble.stormcrawler.elasticsearch
 
BulkItemResponseToFailedFlag(BulkItemResponse, boolean) - Constructor for class com.digitalpebble.stormcrawler.elasticsearch.BulkItemResponseToFailedFlag
 

C

cleanup() - Method in class com.digitalpebble.stormcrawler.elasticsearch.bolt.DeletionBolt
 
cleanup() - Method in class com.digitalpebble.stormcrawler.elasticsearch.bolt.IndexerBolt
 
cleanup() - Method in class com.digitalpebble.stormcrawler.elasticsearch.metrics.MetricsConsumer
 
cleanup() - Method in class com.digitalpebble.stormcrawler.elasticsearch.metrics.StatusMetricsBolt
 
cleanup() - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.StatusUpdaterBolt
 
client - Static variable in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AbstractSpout
 
close() - Method in class com.digitalpebble.stormcrawler.elasticsearch.ElasticSearchConnection
 
close() - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AbstractSpout
 
CollapsingSpout - Class in com.digitalpebble.stormcrawler.elasticsearch.persistence
Spout which pulls URL from an ES index.
CollapsingSpout() - Constructor for class com.digitalpebble.stormcrawler.elasticsearch.persistence.CollapsingSpout
 
com.digitalpebble.stormcrawler.elasticsearch - package com.digitalpebble.stormcrawler.elasticsearch
 
com.digitalpebble.stormcrawler.elasticsearch.bolt - package com.digitalpebble.stormcrawler.elasticsearch.bolt
 
com.digitalpebble.stormcrawler.elasticsearch.filtering - package com.digitalpebble.stormcrawler.elasticsearch.filtering
 
com.digitalpebble.stormcrawler.elasticsearch.metrics - package com.digitalpebble.stormcrawler.elasticsearch.metrics
 
com.digitalpebble.stormcrawler.elasticsearch.parse.filter - package com.digitalpebble.stormcrawler.elasticsearch.parse.filter
 
com.digitalpebble.stormcrawler.elasticsearch.persistence - package com.digitalpebble.stormcrawler.elasticsearch.persistence
 
configure(Map<String, Object>, JsonNode) - Method in class com.digitalpebble.stormcrawler.elasticsearch.filtering.JSONURLFilterWrapper
 
configure(Map<String, Object>, JsonNode) - Method in class com.digitalpebble.stormcrawler.elasticsearch.parse.filter.JSONResourceWrapper
 
currentBuckets - Variable in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AggregationSpout
 

D

declareOutputFields(OutputFieldsDeclarer) - Method in class com.digitalpebble.stormcrawler.elasticsearch.bolt.DeletionBolt
 
declareOutputFields(OutputFieldsDeclarer) - Method in class com.digitalpebble.stormcrawler.elasticsearch.metrics.StatusMetricsBolt
 
declareOutputFields(OutputFieldsDeclarer) - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.ScrollSpout
 
DeletionBolt - Class in com.digitalpebble.stormcrawler.elasticsearch.bolt
Deletes documents to ElasticSearch.
DeletionBolt() - Constructor for class com.digitalpebble.stormcrawler.elasticsearch.bolt.DeletionBolt
 
DeletionBolt(String) - Constructor for class com.digitalpebble.stormcrawler.elasticsearch.bolt.DeletionBolt
Sets the index name instead of taking it from the configuration.

E

ElasticSearchConnection - Class in com.digitalpebble.stormcrawler.elasticsearch
Utility class to instantiate an ES client and bulkprocessor based on the configuration.
emptyQueue(String) - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.HybridSpout
 
equals(Object) - Method in class com.digitalpebble.stormcrawler.elasticsearch.BulkItemResponseToFailedFlag
 
ESBoltType - Static variable in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AbstractSpout
 
ESStatusBucketFieldParamName - Static variable in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AbstractSpout
Field name to use for aggregating *
ESStatusBucketSortFieldParamName - Static variable in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AbstractSpout
Field name to use for sorting the URLs within a bucket, not used if empty or null.
ESStatusFilterParamName - Static variable in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AbstractSpout
 
ESStatusGlobalSortFieldParamName - Static variable in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AbstractSpout
Field name to use for sorting the buckets, not used if empty or null.
ESStatusIndexNameParamName - Static variable in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AbstractSpout
 
ESStatusMaxBucketParamName - Static variable in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AbstractSpout
 
ESStatusMaxURLsParamName - Static variable in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AbstractSpout
 
ESStatusQueryTimeoutParamName - Static variable in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AbstractSpout
 
execute(Tuple) - Method in class com.digitalpebble.stormcrawler.elasticsearch.bolt.DeletionBolt
 
execute(Tuple) - Method in class com.digitalpebble.stormcrawler.elasticsearch.bolt.IndexerBolt
 
execute(Tuple) - Method in class com.digitalpebble.stormcrawler.elasticsearch.metrics.StatusMetricsBolt
 

F

fail(Object) - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AbstractSpout
 
fail(Object) - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.ScrollSpout
 
failed - Variable in class com.digitalpebble.stormcrawler.elasticsearch.BulkItemResponseToFailedFlag
 
filter(URL, Metadata, String) - Method in class com.digitalpebble.stormcrawler.elasticsearch.filtering.JSONURLFilterWrapper
 
filter(String, byte[], DocumentFragment, ParseResult) - Method in class com.digitalpebble.stormcrawler.elasticsearch.parse.filter.JSONResourceWrapper
 
filterQueries - Variable in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AbstractSpout
Query to use as a positive filter, set by es.status.filterQuery
fromKeyValues(Map<String, Object>) - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AbstractSpout
 

G

getClient() - Method in class com.digitalpebble.stormcrawler.elasticsearch.ElasticSearchConnection
 
getClient(Map<String, Object>, String) - Static method in class com.digitalpebble.stormcrawler.elasticsearch.ElasticSearchConnection
 
getComponentConfiguration() - Method in class com.digitalpebble.stormcrawler.elasticsearch.metrics.StatusMetricsBolt
 
getConnection(Map<String, Object>, String) - Static method in class com.digitalpebble.stormcrawler.elasticsearch.ElasticSearchConnection
Creates a connection with a default listener.
getConnection(Map<String, Object>, String, BulkProcessor.Listener) - Static method in class com.digitalpebble.stormcrawler.elasticsearch.ElasticSearchConnection
 
getFailure() - Method in class com.digitalpebble.stormcrawler.elasticsearch.BulkItemResponseToFailedFlag
 
getFailureMessage() - Method in class com.digitalpebble.stormcrawler.elasticsearch.BulkItemResponseToFailedFlag
 
getIndex() - Method in class com.digitalpebble.stormcrawler.elasticsearch.BulkItemResponseToFailedFlag
 
getIndexName(Metadata) - Method in class com.digitalpebble.stormcrawler.elasticsearch.bolt.DeletionBolt
Must be overridden for implementing custom index names based on some metadata information By Default, indexName coming from config is used
getIndexName(Metadata) - Method in class com.digitalpebble.stormcrawler.elasticsearch.bolt.IndexerBolt
Must be overridden for implementing custom index names based on some metadata information By Default, indexName coming from config is used
getIndexName(Metadata) - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.StatusUpdaterBolt
Must be overridden for implementing custom index names based on some metadata information By Default, indexName coming from config is used
getItemId() - Method in class com.digitalpebble.stormcrawler.elasticsearch.BulkItemResponseToFailedFlag
 
getOpType() - Method in class com.digitalpebble.stormcrawler.elasticsearch.BulkItemResponseToFailedFlag
 
getResponse() - Method in class com.digitalpebble.stormcrawler.elasticsearch.BulkItemResponseToFailedFlag
 
getType() - Method in class com.digitalpebble.stormcrawler.elasticsearch.BulkItemResponseToFailedFlag
 
getVersion() - Method in class com.digitalpebble.stormcrawler.elasticsearch.BulkItemResponseToFailedFlag
 

H

handleDataPoints(IMetricsConsumer.TaskInfo, Collection<IMetricsConsumer.DataPoint>) - Method in class com.digitalpebble.stormcrawler.elasticsearch.metrics.MetricsConsumer
 
hashCode() - Method in class com.digitalpebble.stormcrawler.elasticsearch.BulkItemResponseToFailedFlag
 
HybridSpout - Class in com.digitalpebble.stormcrawler.elasticsearch.persistence
Uses collapsing spouts to get an initial set of URLs and keys to query for and gets emptyQueue notifications from the URLBuffer to query ES for a specific key.
HybridSpout() - Constructor for class com.digitalpebble.stormcrawler.elasticsearch.persistence.HybridSpout
 

I

id - Variable in class com.digitalpebble.stormcrawler.elasticsearch.BulkItemResponseToFailedFlag
 
IndexerBolt - Class in com.digitalpebble.stormcrawler.elasticsearch.bolt
Sends documents to ElasticSearch.
IndexerBolt() - Constructor for class com.digitalpebble.stormcrawler.elasticsearch.bolt.IndexerBolt
 
IndexerBolt(String) - Constructor for class com.digitalpebble.stormcrawler.elasticsearch.bolt.IndexerBolt
Sets the index name instead of taking it from the configuration.
indexName - Variable in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AbstractSpout
 
isFailed() - Method in class com.digitalpebble.stormcrawler.elasticsearch.BulkItemResponseToFailedFlag
 
isFragment() - Method in class com.digitalpebble.stormcrawler.elasticsearch.BulkItemResponseToFailedFlag
 

J

JSONResourceWrapper - Class in com.digitalpebble.stormcrawler.elasticsearch.parse.filter
Wraps a ParseFilter whose resources are in a JSON file that can be stored in ES.
JSONResourceWrapper() - Constructor for class com.digitalpebble.stormcrawler.elasticsearch.parse.filter.JSONResourceWrapper
 
JSONURLFilterWrapper - Class in com.digitalpebble.stormcrawler.elasticsearch.filtering
Wraps a URLFilter whose resources are in a JSON file that can be stored in ES.
JSONURLFilterWrapper() - Constructor for class com.digitalpebble.stormcrawler.elasticsearch.filtering.JSONURLFilterWrapper
 

L

logIdprefix - Variable in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AbstractSpout
Used to distinguish between instances in the logs *

M

maxBucketNum - Variable in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AbstractSpout
 
maxURLsPerBucket - Variable in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AbstractSpout
 
MetricsConsumer - Class in com.digitalpebble.stormcrawler.elasticsearch.metrics
Sends metrics to an Elasticsearch index.
MetricsConsumer() - Constructor for class com.digitalpebble.stormcrawler.elasticsearch.metrics.MetricsConsumer
 

N

nextTuple() - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.ScrollSpout
 

O

onFailure(Exception) - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AggregationSpout
 
onFailure(Exception) - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.CollapsingSpout
 
onFailure(Exception) - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.ScrollSpout
 
onRemoval(String, List<Tuple>, RemovalCause) - Method in class com.digitalpebble.stormcrawler.elasticsearch.bolt.IndexerBolt
 
onRemoval(String, List<Tuple>, RemovalCause) - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.StatusUpdaterBolt
 
onResponse(SearchResponse) - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AggregationSpout
 
onResponse(SearchResponse) - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.CollapsingSpout
 
onResponse(SearchResponse) - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.HybridSpout
 
onResponse(SearchResponse) - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.ScrollSpout
 
open(Map<String, Object>, TopologyContext, SpoutOutputCollector) - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AbstractSpout
 
open(Map<String, Object>, TopologyContext, SpoutOutputCollector) - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AggregationSpout
 
open(Map<String, Object>, TopologyContext, SpoutOutputCollector) - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.CollapsingSpout
 
open(Map<String, Object>, TopologyContext, SpoutOutputCollector) - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.HybridSpout
 

P

partitionField - Variable in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AbstractSpout
Field name used for field collapsing e.g.
populateBuffer() - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AbstractSpout
Builds a query and use it retrieve the results from ES *
populateBuffer() - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AggregationSpout
 
populateBuffer() - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.CollapsingSpout
 
populateBuffer() - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.ScrollSpout
 
prepare(Map<String, Object>, TopologyContext, OutputCollector) - Method in class com.digitalpebble.stormcrawler.elasticsearch.bolt.DeletionBolt
 
prepare(Map<String, Object>, TopologyContext, OutputCollector) - Method in class com.digitalpebble.stormcrawler.elasticsearch.bolt.IndexerBolt
 
prepare(Map<String, Object>, TopologyContext, OutputCollector) - Method in class com.digitalpebble.stormcrawler.elasticsearch.metrics.StatusMetricsBolt
 
prepare(Map<String, Object>, TopologyContext, OutputCollector) - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.StatusUpdaterBolt
 
prepare(Map, Object, TopologyContext, IErrorReporter) - Method in class com.digitalpebble.stormcrawler.elasticsearch.metrics.MetricsConsumer
 

Q

queryDate - Variable in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AbstractSpout
 
queryTimeout - Variable in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AbstractSpout
 

R

RELOADPARAMNAME - Static variable in class com.digitalpebble.stormcrawler.elasticsearch.persistence.HybridSpout
 
response - Variable in class com.digitalpebble.stormcrawler.elasticsearch.BulkItemResponseToFailedFlag
 

S

ScrollSpout - Class in com.digitalpebble.stormcrawler.elasticsearch.persistence
Reads all the documents from a shard and emits them on the status stream.
ScrollSpout() - Constructor for class com.digitalpebble.stormcrawler.elasticsearch.persistence.ScrollSpout
 
shardID - Variable in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AbstractSpout
when using multiple instances - each one is in charge of a specific shard useful when sharding based on host or domain to guarantee a good mix of URLs
sortValuesForKey(String, Object[]) - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AggregationSpout
 
sortValuesForKey(String, Object[]) - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.HybridSpout
 
status() - Method in class com.digitalpebble.stormcrawler.elasticsearch.BulkItemResponseToFailedFlag
 
StatusMetricsBolt - Class in com.digitalpebble.stormcrawler.elasticsearch.metrics
Queries the status index periodically to get the count of URLs per status.
StatusMetricsBolt() - Constructor for class com.digitalpebble.stormcrawler.elasticsearch.metrics.StatusMetricsBolt
 
StatusUpdaterBolt - Class in com.digitalpebble.stormcrawler.elasticsearch.persistence
Simple bolt which stores the status of URLs into ElasticSearch.
StatusUpdaterBolt() - Constructor for class com.digitalpebble.stormcrawler.elasticsearch.persistence.StatusUpdaterBolt
 
StatusUpdaterBolt(String) - Constructor for class com.digitalpebble.stormcrawler.elasticsearch.persistence.StatusUpdaterBolt
Loads the configuration using a substring different from the default value 'status' in order to distinguish it from the spout configurations
store(String, Status, Metadata, Optional<Date>, Tuple) - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.StatusUpdaterBolt
 

T

toString() - Method in class com.digitalpebble.stormcrawler.elasticsearch.BulkItemResponseToFailedFlag
 
totalSortField - Variable in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AbstractSpout
 
toXContent(XContentBuilder, ToXContent.Params) - Method in class com.digitalpebble.stormcrawler.elasticsearch.BulkItemResponseToFailedFlag
 

W

writeThin(StreamOutput) - Method in class com.digitalpebble.stormcrawler.elasticsearch.BulkItemResponseToFailedFlag
 
writeTo(StreamOutput) - Method in class com.digitalpebble.stormcrawler.elasticsearch.BulkItemResponseToFailedFlag
 
A B C D E F G H I J L M N O P Q R S T W 
All Classes All Packages