FeedParserBolt |
Extracts URLs from feeds
|
FetcherBolt |
A multithreaded, queue-based fetcher adapted from Apache Nutch.
|
JSoupParserBolt |
Parser for HTML documents only which uses ICU4J to detect the charset encoding.
|
SimpleFetcherBolt |
A simple fetcher with no internal queues.
|
SiteMapParserBolt |
Extracts URLs from a sitemap file.
|
StatusEmitterBolt |
Provides common functionalities for Bolts which emit tuples to the status stream, e.g.
|
URLFilterBolt |
|
URLPartitionerBolt |
Generates a partition key for a given URL based on the hostname, domain or IP address.
|