Class FileSpout
- java.lang.Object
-
- org.apache.storm.topology.base.BaseComponent
-
- org.apache.storm.topology.base.BaseRichSpout
-
- com.digitalpebble.stormcrawler.spout.FileSpout
-
- All Implemented Interfaces:
Serializable
,org.apache.storm.spout.ISpout
,org.apache.storm.topology.IComponent
,org.apache.storm.topology.IRichSpout
public class FileSpout extends org.apache.storm.topology.base.BaseRichSpout
Reads the lines from a UTF-8 file and use them as a spout. Load the entire content into memory. Uses StringTabScheme to parse the lines into URLs and Metadata, generates tuples on the default stream unless withDiscoveredStatus is set to true.- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description protected org.apache.storm.spout.SpoutOutputCollector
_collector
protected org.apache.storm.spout.Scheme
_scheme
protected boolean
active
static int
BATCH_SIZE
protected LinkedList<byte[]>
buffer
static org.slf4j.Logger
LOG
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
ack(Object msgId)
void
activate()
void
close()
void
deactivate()
void
declareOutputFields(org.apache.storm.topology.OutputFieldsDeclarer declarer)
void
fail(Object msgId)
void
nextTuple()
void
open(Map<String,Object> conf, org.apache.storm.task.TopologyContext context, org.apache.storm.spout.SpoutOutputCollector collector)
protected void
populateBuffer()
void
setScheme(org.apache.storm.spout.Scheme scheme)
Specify a Scheme for parsing the lines into URLs and Metadata.
-
-
-
Field Detail
-
BATCH_SIZE
public static final int BATCH_SIZE
- See Also:
- Constant Field Values
-
LOG
public static final org.slf4j.Logger LOG
-
_collector
protected org.apache.storm.spout.SpoutOutputCollector _collector
-
_scheme
protected org.apache.storm.spout.Scheme _scheme
-
buffer
protected LinkedList<byte[]> buffer
-
active
protected boolean active
-
-
Constructor Detail
-
FileSpout
public FileSpout(String dir, String filter)
- Parameters:
dir
- containing the seed filesfilter
- to apply on the file names
-
FileSpout
public FileSpout(String... files)
- Parameters:
files
- containing the URLs
-
FileSpout
public FileSpout(String dir, String filter, boolean withDiscoveredStatus)
- Parameters:
withDiscoveredStatus
- whether the tuples generated should contain a Status field with DISCOVERED as value and be emitted on the status streamdir
- containing the seed filesfilter
- to apply on the file names- Since:
- 1.13
-
FileSpout
public FileSpout(boolean withDiscoveredStatus, String... files)
- Parameters:
withDiscoveredStatus
- whether the tuples generated should contain a Status field with DISCOVERED as value and be emitted on the status streamfiles
- containing the URLs- Since:
- 1.13
-
-
Method Detail
-
setScheme
public void setScheme(org.apache.storm.spout.Scheme scheme)
Specify a Scheme for parsing the lines into URLs and Metadata. StringTabScheme is used by default. The Scheme must generate a String for the URL and a Metadata object.- Since:
- 1.13
-
populateBuffer
protected void populateBuffer() throws IOException
- Throws:
IOException
-
open
public void open(Map<String,Object> conf, org.apache.storm.task.TopologyContext context, org.apache.storm.spout.SpoutOutputCollector collector)
-
nextTuple
public void nextTuple()
-
declareOutputFields
public void declareOutputFields(org.apache.storm.topology.OutputFieldsDeclarer declarer)
-
close
public void close()
- Specified by:
close
in interfaceorg.apache.storm.spout.ISpout
- Overrides:
close
in classorg.apache.storm.topology.base.BaseRichSpout
-
activate
public void activate()
- Specified by:
activate
in interfaceorg.apache.storm.spout.ISpout
- Overrides:
activate
in classorg.apache.storm.topology.base.BaseRichSpout
-
deactivate
public void deactivate()
- Specified by:
deactivate
in interfaceorg.apache.storm.spout.ISpout
- Overrides:
deactivate
in classorg.apache.storm.topology.base.BaseRichSpout
-
ack
public void ack(Object msgId)
- Specified by:
ack
in interfaceorg.apache.storm.spout.ISpout
- Overrides:
ack
in classorg.apache.storm.topology.base.BaseRichSpout
-
fail
public void fail(Object msgId)
- Specified by:
fail
in interfaceorg.apache.storm.spout.ISpout
- Overrides:
fail
in classorg.apache.storm.topology.base.BaseRichSpout
-
-