Class SiteMapParserBolt
- java.lang.Object
-
- org.apache.storm.topology.base.BaseComponent
-
- org.apache.storm.topology.base.BaseRichBolt
-
- com.digitalpebble.stormcrawler.bolt.StatusEmitterBolt
-
- com.digitalpebble.stormcrawler.bolt.SiteMapParserBolt
-
- All Implemented Interfaces:
Serializable
,org.apache.storm.task.IBolt
,org.apache.storm.topology.IComponent
,org.apache.storm.topology.IRichBolt
public class SiteMapParserBolt extends StatusEmitterBolt
Extracts URLs from a sitemap file. The parsing is triggered by sniffing the content and can also be forced by 'isSitemap=true' in the metadata, otherwise the tuple are passed on to the default stream, whereas any URLs extracted from the sitemaps are sent to the 'status' field with a 'DISCOVERED' status.- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description static String
foundSitemapKey
static String
isSitemapKey
-
Fields inherited from class com.digitalpebble.stormcrawler.bolt.StatusEmitterBolt
collector
-
-
Constructor Summary
Constructors Constructor Description SiteMapParserBolt()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
declareOutputFields(org.apache.storm.topology.OutputFieldsDeclarer declarer)
void
execute(org.apache.storm.tuple.Tuple tuple)
void
parseExtensionAttributes(crawlercommons.sitemaps.SiteMapURL url, Metadata metadata)
void
prepare(Map<String,Object> stormConf, org.apache.storm.task.TopologyContext context, org.apache.storm.task.OutputCollector collector)
-
Methods inherited from class com.digitalpebble.stormcrawler.bolt.StatusEmitterBolt
allowRedirs, emitOutlink, filterOutlink
-
-
-
-
Field Detail
-
isSitemapKey
public static final String isSitemapKey
- See Also:
- Constant Field Values
-
foundSitemapKey
public static final String foundSitemapKey
- See Also:
- Constant Field Values
-
-
Method Detail
-
execute
public void execute(org.apache.storm.tuple.Tuple tuple)
-
parseExtensionAttributes
public void parseExtensionAttributes(crawlercommons.sitemaps.SiteMapURL url, Metadata metadata)
-
prepare
public void prepare(Map<String,Object> stormConf, org.apache.storm.task.TopologyContext context, org.apache.storm.task.OutputCollector collector)
- Specified by:
prepare
in interfaceorg.apache.storm.task.IBolt
- Overrides:
prepare
in classStatusEmitterBolt
-
declareOutputFields
public void declareOutputFields(org.apache.storm.topology.OutputFieldsDeclarer declarer)
- Specified by:
declareOutputFields
in interfaceorg.apache.storm.topology.IComponent
- Overrides:
declareOutputFields
in classStatusEmitterBolt
-
-