A B C D E F G H I J L M N O P Q R S T W
All Classes All Packages
All Classes All Packages
All Classes All Packages
A
- AbstractSpout - Class in com.digitalpebble.stormcrawler.elasticsearch.persistence
- AbstractSpout() - Constructor for class com.digitalpebble.stormcrawler.elasticsearch.persistence.AbstractSpout
- ack(Object) - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AbstractSpout
- addHitInfoToMetadata(Metadata, SearchHit) - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AbstractSpout
- addHitToBuffer(SearchHit) - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AbstractSpout
- addToProcessor(IndexRequest) - Method in class com.digitalpebble.stormcrawler.elasticsearch.ElasticSearchConnection
- afterBulk(long, BulkRequest, Throwable) - Method in class com.digitalpebble.stormcrawler.elasticsearch.bolt.IndexerBolt
- afterBulk(long, BulkRequest, Throwable) - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.StatusUpdaterBolt
- afterBulk(long, BulkRequest, BulkResponse) - Method in class com.digitalpebble.stormcrawler.elasticsearch.bolt.IndexerBolt
- afterBulk(long, BulkRequest, BulkResponse) - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.StatusUpdaterBolt
- AggregationSpout - Class in com.digitalpebble.stormcrawler.elasticsearch.persistence
-
Spout which pulls URL from an ES index.
- AggregationSpout() - Constructor for class com.digitalpebble.stormcrawler.elasticsearch.persistence.AggregationSpout
B
- beforeBulk(long, BulkRequest) - Method in class com.digitalpebble.stormcrawler.elasticsearch.bolt.IndexerBolt
- beforeBulk(long, BulkRequest) - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.StatusUpdaterBolt
- bucketSortField - Variable in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AbstractSpout
- BulkItemResponseToFailedFlag - Class in com.digitalpebble.stormcrawler.elasticsearch
- BulkItemResponseToFailedFlag(BulkItemResponse, boolean) - Constructor for class com.digitalpebble.stormcrawler.elasticsearch.BulkItemResponseToFailedFlag
C
- cleanup() - Method in class com.digitalpebble.stormcrawler.elasticsearch.bolt.DeletionBolt
- cleanup() - Method in class com.digitalpebble.stormcrawler.elasticsearch.bolt.IndexerBolt
- cleanup() - Method in class com.digitalpebble.stormcrawler.elasticsearch.metrics.MetricsConsumer
- cleanup() - Method in class com.digitalpebble.stormcrawler.elasticsearch.metrics.StatusMetricsBolt
- cleanup() - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.StatusUpdaterBolt
- client - Static variable in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AbstractSpout
- close() - Method in class com.digitalpebble.stormcrawler.elasticsearch.ElasticSearchConnection
- close() - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AbstractSpout
- CollapsingSpout - Class in com.digitalpebble.stormcrawler.elasticsearch.persistence
-
Spout which pulls URL from an ES index.
- CollapsingSpout() - Constructor for class com.digitalpebble.stormcrawler.elasticsearch.persistence.CollapsingSpout
- com.digitalpebble.stormcrawler.elasticsearch - package com.digitalpebble.stormcrawler.elasticsearch
- com.digitalpebble.stormcrawler.elasticsearch.bolt - package com.digitalpebble.stormcrawler.elasticsearch.bolt
- com.digitalpebble.stormcrawler.elasticsearch.filtering - package com.digitalpebble.stormcrawler.elasticsearch.filtering
- com.digitalpebble.stormcrawler.elasticsearch.metrics - package com.digitalpebble.stormcrawler.elasticsearch.metrics
- com.digitalpebble.stormcrawler.elasticsearch.parse.filter - package com.digitalpebble.stormcrawler.elasticsearch.parse.filter
- com.digitalpebble.stormcrawler.elasticsearch.persistence - package com.digitalpebble.stormcrawler.elasticsearch.persistence
- configure(Map<String, Object>, JsonNode) - Method in class com.digitalpebble.stormcrawler.elasticsearch.filtering.JSONURLFilterWrapper
- configure(Map<String, Object>, JsonNode) - Method in class com.digitalpebble.stormcrawler.elasticsearch.parse.filter.JSONResourceWrapper
- currentBuckets - Variable in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AggregationSpout
D
- declareOutputFields(OutputFieldsDeclarer) - Method in class com.digitalpebble.stormcrawler.elasticsearch.bolt.DeletionBolt
- declareOutputFields(OutputFieldsDeclarer) - Method in class com.digitalpebble.stormcrawler.elasticsearch.metrics.StatusMetricsBolt
- declareOutputFields(OutputFieldsDeclarer) - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.ScrollSpout
- DeletionBolt - Class in com.digitalpebble.stormcrawler.elasticsearch.bolt
-
Deletes documents to ElasticSearch.
- DeletionBolt() - Constructor for class com.digitalpebble.stormcrawler.elasticsearch.bolt.DeletionBolt
- DeletionBolt(String) - Constructor for class com.digitalpebble.stormcrawler.elasticsearch.bolt.DeletionBolt
-
Sets the index name instead of taking it from the configuration.
E
- ElasticSearchConnection - Class in com.digitalpebble.stormcrawler.elasticsearch
-
Utility class to instantiate an ES client and bulkprocessor based on the configuration.
- emptyQueue(String) - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.HybridSpout
- equals(Object) - Method in class com.digitalpebble.stormcrawler.elasticsearch.BulkItemResponseToFailedFlag
- ESBoltType - Static variable in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AbstractSpout
- ESStatusBucketFieldParamName - Static variable in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AbstractSpout
-
Field name to use for aggregating *
- ESStatusBucketSortFieldParamName - Static variable in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AbstractSpout
-
Field name to use for sorting the URLs within a bucket, not used if empty or null.
- ESStatusFilterParamName - Static variable in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AbstractSpout
- ESStatusGlobalSortFieldParamName - Static variable in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AbstractSpout
-
Field name to use for sorting the buckets, not used if empty or null.
- ESStatusIndexNameParamName - Static variable in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AbstractSpout
- ESStatusMaxBucketParamName - Static variable in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AbstractSpout
- ESStatusMaxURLsParamName - Static variable in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AbstractSpout
- ESStatusQueryTimeoutParamName - Static variable in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AbstractSpout
- execute(Tuple) - Method in class com.digitalpebble.stormcrawler.elasticsearch.bolt.DeletionBolt
- execute(Tuple) - Method in class com.digitalpebble.stormcrawler.elasticsearch.bolt.IndexerBolt
- execute(Tuple) - Method in class com.digitalpebble.stormcrawler.elasticsearch.metrics.StatusMetricsBolt
F
- fail(Object) - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AbstractSpout
- fail(Object) - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.ScrollSpout
- failed - Variable in class com.digitalpebble.stormcrawler.elasticsearch.BulkItemResponseToFailedFlag
- filter(URL, Metadata, String) - Method in class com.digitalpebble.stormcrawler.elasticsearch.filtering.JSONURLFilterWrapper
- filter(String, byte[], DocumentFragment, ParseResult) - Method in class com.digitalpebble.stormcrawler.elasticsearch.parse.filter.JSONResourceWrapper
- filterQueries - Variable in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AbstractSpout
-
Query to use as a positive filter, set by es.status.filterQuery
- fromKeyValues(Map<String, Object>) - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AbstractSpout
G
- getClient() - Method in class com.digitalpebble.stormcrawler.elasticsearch.ElasticSearchConnection
- getClient(Map<String, Object>, String) - Static method in class com.digitalpebble.stormcrawler.elasticsearch.ElasticSearchConnection
- getComponentConfiguration() - Method in class com.digitalpebble.stormcrawler.elasticsearch.metrics.StatusMetricsBolt
- getConnection(Map<String, Object>, String) - Static method in class com.digitalpebble.stormcrawler.elasticsearch.ElasticSearchConnection
-
Creates a connection with a default listener.
- getConnection(Map<String, Object>, String, BulkProcessor.Listener) - Static method in class com.digitalpebble.stormcrawler.elasticsearch.ElasticSearchConnection
- getFailure() - Method in class com.digitalpebble.stormcrawler.elasticsearch.BulkItemResponseToFailedFlag
- getFailureMessage() - Method in class com.digitalpebble.stormcrawler.elasticsearch.BulkItemResponseToFailedFlag
- getIndex() - Method in class com.digitalpebble.stormcrawler.elasticsearch.BulkItemResponseToFailedFlag
- getIndexName(Metadata) - Method in class com.digitalpebble.stormcrawler.elasticsearch.bolt.DeletionBolt
-
Must be overridden for implementing custom index names based on some metadata information By Default, indexName coming from config is used
- getIndexName(Metadata) - Method in class com.digitalpebble.stormcrawler.elasticsearch.bolt.IndexerBolt
-
Must be overridden for implementing custom index names based on some metadata information By Default, indexName coming from config is used
- getIndexName(Metadata) - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.StatusUpdaterBolt
-
Must be overridden for implementing custom index names based on some metadata information By Default, indexName coming from config is used
- getItemId() - Method in class com.digitalpebble.stormcrawler.elasticsearch.BulkItemResponseToFailedFlag
- getOpType() - Method in class com.digitalpebble.stormcrawler.elasticsearch.BulkItemResponseToFailedFlag
- getResponse() - Method in class com.digitalpebble.stormcrawler.elasticsearch.BulkItemResponseToFailedFlag
- getType() - Method in class com.digitalpebble.stormcrawler.elasticsearch.BulkItemResponseToFailedFlag
- getVersion() - Method in class com.digitalpebble.stormcrawler.elasticsearch.BulkItemResponseToFailedFlag
H
- handleDataPoints(IMetricsConsumer.TaskInfo, Collection<IMetricsConsumer.DataPoint>) - Method in class com.digitalpebble.stormcrawler.elasticsearch.metrics.MetricsConsumer
- hashCode() - Method in class com.digitalpebble.stormcrawler.elasticsearch.BulkItemResponseToFailedFlag
- HybridSpout - Class in com.digitalpebble.stormcrawler.elasticsearch.persistence
-
Uses collapsing spouts to get an initial set of URLs and keys to query for and gets emptyQueue notifications from the URLBuffer to query ES for a specific key.
- HybridSpout() - Constructor for class com.digitalpebble.stormcrawler.elasticsearch.persistence.HybridSpout
I
- id - Variable in class com.digitalpebble.stormcrawler.elasticsearch.BulkItemResponseToFailedFlag
- IndexerBolt - Class in com.digitalpebble.stormcrawler.elasticsearch.bolt
-
Sends documents to ElasticSearch.
- IndexerBolt() - Constructor for class com.digitalpebble.stormcrawler.elasticsearch.bolt.IndexerBolt
- IndexerBolt(String) - Constructor for class com.digitalpebble.stormcrawler.elasticsearch.bolt.IndexerBolt
-
Sets the index name instead of taking it from the configuration.
- indexName - Variable in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AbstractSpout
- isFailed() - Method in class com.digitalpebble.stormcrawler.elasticsearch.BulkItemResponseToFailedFlag
- isFragment() - Method in class com.digitalpebble.stormcrawler.elasticsearch.BulkItemResponseToFailedFlag
J
- JSONResourceWrapper - Class in com.digitalpebble.stormcrawler.elasticsearch.parse.filter
-
Wraps a ParseFilter whose resources are in a JSON file that can be stored in ES.
- JSONResourceWrapper() - Constructor for class com.digitalpebble.stormcrawler.elasticsearch.parse.filter.JSONResourceWrapper
- JSONURLFilterWrapper - Class in com.digitalpebble.stormcrawler.elasticsearch.filtering
-
Wraps a URLFilter whose resources are in a JSON file that can be stored in ES.
- JSONURLFilterWrapper() - Constructor for class com.digitalpebble.stormcrawler.elasticsearch.filtering.JSONURLFilterWrapper
L
- logIdprefix - Variable in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AbstractSpout
-
Used to distinguish between instances in the logs *
M
- maxBucketNum - Variable in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AbstractSpout
- maxURLsPerBucket - Variable in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AbstractSpout
- MetricsConsumer - Class in com.digitalpebble.stormcrawler.elasticsearch.metrics
-
Sends metrics to an Elasticsearch index.
- MetricsConsumer() - Constructor for class com.digitalpebble.stormcrawler.elasticsearch.metrics.MetricsConsumer
N
- nextTuple() - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.ScrollSpout
O
- onFailure(Exception) - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AggregationSpout
- onFailure(Exception) - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.CollapsingSpout
- onFailure(Exception) - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.ScrollSpout
- onRemoval(String, List<Tuple>, RemovalCause) - Method in class com.digitalpebble.stormcrawler.elasticsearch.bolt.IndexerBolt
- onRemoval(String, List<Tuple>, RemovalCause) - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.StatusUpdaterBolt
- onResponse(SearchResponse) - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AggregationSpout
- onResponse(SearchResponse) - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.CollapsingSpout
- onResponse(SearchResponse) - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.HybridSpout
- onResponse(SearchResponse) - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.ScrollSpout
- open(Map<String, Object>, TopologyContext, SpoutOutputCollector) - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AbstractSpout
- open(Map<String, Object>, TopologyContext, SpoutOutputCollector) - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AggregationSpout
- open(Map<String, Object>, TopologyContext, SpoutOutputCollector) - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.CollapsingSpout
- open(Map<String, Object>, TopologyContext, SpoutOutputCollector) - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.HybridSpout
P
- partitionField - Variable in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AbstractSpout
-
Field name used for field collapsing e.g.
- populateBuffer() - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AbstractSpout
-
Builds a query and use it retrieve the results from ES *
- populateBuffer() - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AggregationSpout
- populateBuffer() - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.CollapsingSpout
- populateBuffer() - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.ScrollSpout
- prepare(Map<String, Object>, TopologyContext, OutputCollector) - Method in class com.digitalpebble.stormcrawler.elasticsearch.bolt.DeletionBolt
- prepare(Map<String, Object>, TopologyContext, OutputCollector) - Method in class com.digitalpebble.stormcrawler.elasticsearch.bolt.IndexerBolt
- prepare(Map<String, Object>, TopologyContext, OutputCollector) - Method in class com.digitalpebble.stormcrawler.elasticsearch.metrics.StatusMetricsBolt
- prepare(Map<String, Object>, TopologyContext, OutputCollector) - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.StatusUpdaterBolt
- prepare(Map, Object, TopologyContext, IErrorReporter) - Method in class com.digitalpebble.stormcrawler.elasticsearch.metrics.MetricsConsumer
Q
- queryDate - Variable in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AbstractSpout
- queryTimeout - Variable in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AbstractSpout
R
- RELOADPARAMNAME - Static variable in class com.digitalpebble.stormcrawler.elasticsearch.persistence.HybridSpout
- response - Variable in class com.digitalpebble.stormcrawler.elasticsearch.BulkItemResponseToFailedFlag
S
- ScrollSpout - Class in com.digitalpebble.stormcrawler.elasticsearch.persistence
-
Reads all the documents from a shard and emits them on the status stream.
- ScrollSpout() - Constructor for class com.digitalpebble.stormcrawler.elasticsearch.persistence.ScrollSpout
- shardID - Variable in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AbstractSpout
-
when using multiple instances - each one is in charge of a specific shard useful when sharding based on host or domain to guarantee a good mix of URLs
- sortValuesForKey(String, Object[]) - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AggregationSpout
- sortValuesForKey(String, Object[]) - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.HybridSpout
- status() - Method in class com.digitalpebble.stormcrawler.elasticsearch.BulkItemResponseToFailedFlag
- StatusMetricsBolt - Class in com.digitalpebble.stormcrawler.elasticsearch.metrics
-
Queries the status index periodically to get the count of URLs per status.
- StatusMetricsBolt() - Constructor for class com.digitalpebble.stormcrawler.elasticsearch.metrics.StatusMetricsBolt
- StatusUpdaterBolt - Class in com.digitalpebble.stormcrawler.elasticsearch.persistence
-
Simple bolt which stores the status of URLs into ElasticSearch.
- StatusUpdaterBolt() - Constructor for class com.digitalpebble.stormcrawler.elasticsearch.persistence.StatusUpdaterBolt
- StatusUpdaterBolt(String) - Constructor for class com.digitalpebble.stormcrawler.elasticsearch.persistence.StatusUpdaterBolt
-
Loads the configuration using a substring different from the default value 'status' in order to distinguish it from the spout configurations
- store(String, Status, Metadata, Optional<Date>, Tuple) - Method in class com.digitalpebble.stormcrawler.elasticsearch.persistence.StatusUpdaterBolt
T
- toString() - Method in class com.digitalpebble.stormcrawler.elasticsearch.BulkItemResponseToFailedFlag
- totalSortField - Variable in class com.digitalpebble.stormcrawler.elasticsearch.persistence.AbstractSpout
- toXContent(XContentBuilder, ToXContent.Params) - Method in class com.digitalpebble.stormcrawler.elasticsearch.BulkItemResponseToFailedFlag
W
- writeThin(StreamOutput) - Method in class com.digitalpebble.stormcrawler.elasticsearch.BulkItemResponseToFailedFlag
- writeTo(StreamOutput) - Method in class com.digitalpebble.stormcrawler.elasticsearch.BulkItemResponseToFailedFlag
All Classes All Packages