java.lang.Object
- org.apache.storm.topology.base.BaseComponent
- - org.apache.storm.topology.base.BaseRichBolt
  - - com.digitalpebble.stormcrawler.indexing.AbstractIndexerBolt

All Implemented Interfaces:

Serializable, org.apache.storm.task.IBolt, org.apache.storm.topology.IComponent, org.apache.storm.topology.IRichBolt

Direct Known Subclasses:

DummyIndexer, StdOutIndexer
```
public abstract class AbstractIndexerBolt
extends org.apache.storm.topology.base.BaseRichBolt
```
Abstract class to simplify writing IndexerBolts *

See Also:

Serialized Form

Field Summary

Fields
Modifier and Type	Field	Description
`static String`	`canonicalMetadataParamName`	Field name to use for reading the canonical property of the metadata
`static String`	`ignoreEmptyFieldValueParamName`	Indicates that empty field values should not be emitted at all.
`static String`	`metadata2fieldParamName`	Mapping between metadata keys and field names for indexing Can be a list of values separated by a = or a single string
`static String`	`metadataFilterParamName`	list of metadata key + values to be used as a filter.
`static String`	`textFieldParamName`	Field name to use for storing the text of a document *
`static String`	`textLengthParamName`	Trim length of text to index.
`static String`	`urlFieldParamName`	Field name to use for storing the url of a document *

Constructor Summary

Constructors
Constructor Description

AbstractIndexerBolt()

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method	Description
`void`	`declareOutputFields(org.apache.storm.topology.OutputFieldsDeclarer declarer)`
`protected String`	`fieldNameForText()`	Returns the field name to use for the text or null if the text must not be indexed
`protected String`	`fieldNameForURL()`	Returns the field name to use for the URL or null if the URL must not be indexed
`protected boolean`	`filterDocument(Metadata meta)`	Determine whether a document should be indexed based on the presence of a given key/value or the RobotsTags.ROBOTS_NO_INDEX directive.
`protected Map<String,String[]>`	`filterMetadata(Metadata meta)`	Returns a mapping field name / values for the metadata to index *
`protected String`	`getDocumentID(Metadata metadata, String normalisedUrl)`	Get the document id.
`protected boolean`	`ignoreEmptyFields()`
`void`	`prepare(Map<String,Object> conf, org.apache.storm.task.TopologyContext context, org.apache.storm.task.OutputCollector collector)`
`protected String`	`trimText(String text)`	Returns a trimmed string or the original one if it is below the threshold set in the configuration.
`protected String`	`valueForURL(org.apache.storm.tuple.Tuple tuple)`	Returns the value to be used as the URL for indexing purposes, if present the canonical value is used instead

Methods inherited from class org.apache.storm.topology.base.BaseRichBolt
cleanup

Methods inherited from class org.apache.storm.topology.base.BaseComponent
getComponentConfiguration

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface org.apache.storm.task.IBolt
execute

Methods inherited from interface org.apache.storm.topology.IComponent
getComponentConfiguration

- Field Detail
  - metadata2fieldParamName
```
public static final String metadata2fieldParamName
```
    Mapping between metadata keys and field names for indexing Can be a list of values separated by a = or a single string
    
    See Also:
    
    Constant Field Values
  - metadataFilterParamName
```
public static final String metadataFilterParamName
```
    list of metadata key + values to be used as a filter. A document will be indexed only if it has such a md. Can be null in which case we don't filter at all.
    
    See Also:
    
    Constant Field Values
  - textFieldParamName
```
public static final String textFieldParamName
```
    Field name to use for storing the text of a document *
    
    See Also:
    
    Constant Field Values
  - textLengthParamName
```
public static final String textLengthParamName
```
    Trim length of text to index. Defaults to -1 to keep it intact *
    
    See Also:
    
    Constant Field Values
  - urlFieldParamName
```
public static final String urlFieldParamName
```
    Field name to use for storing the url of a document *
    
    See Also:
    
    Constant Field Values
  - canonicalMetadataParamName
```
public static final String canonicalMetadataParamName
```
    Field name to use for reading the canonical property of the metadata
    
    See Also:
    
    Constant Field Values
  - ignoreEmptyFieldValueParamName
```
public static final String ignoreEmptyFieldValueParamName
```
    Indicates that empty field values should not be emitted at all.
    
    See Also:
    
    Constant Field Values
- Constructor Detail
  - AbstractIndexerBolt
```
public AbstractIndexerBolt()
```
- Method Detail
  - prepare
```
public void prepare(Map<String,Object> conf,
                    org.apache.storm.task.TopologyContext context,
                    org.apache.storm.task.OutputCollector collector)
```
  - filterDocument
```
protected boolean filterDocument(Metadata meta)
```
    Determine whether a document should be indexed based on the presence of a given key/value or the RobotsTags.ROBOTS_NO_INDEX directive.
    
    Returns:
    
    true if the document should be kept.
  - filterMetadata
```
protected Map<String,String[]> filterMetadata(Metadata meta)
```
    Returns a mapping field name / values for the metadata to index *
  - getDocumentID
```
protected String getDocumentID(Metadata metadata,
                               String normalisedUrl)
```
    Get the document id.
    
    Parameters:
    
    metadata - The Metadata.
    
    normalisedUrl - The normalised url.
    
    Returns:
    
    Return the normalised url SHA-256 digest as String.
  - valueForURL
```
protected String valueForURL(org.apache.storm.tuple.Tuple tuple)
```
    Returns the value to be used as the URL for indexing purposes, if present the canonical value is used instead
  - fieldNameForText
```
protected String fieldNameForText()
```
    Returns the field name to use for the text or null if the text must not be indexed
  - trimText
```
protected String trimText(String text)
```
    Returns a trimmed string or the original one if it is below the threshold set in the configuration.
  - fieldNameForURL
```
protected String fieldNameForURL()
```
    Returns the field name to use for the URL or null if the URL must not be indexed
  - ignoreEmptyFields
```
protected boolean ignoreEmptyFields()
```
  - declareOutputFields
```
public void declareOutputFields(org.apache.storm.topology.OutputFieldsDeclarer declarer)
```

Class AbstractIndexerBolt

Field Summary

Constructor Summary

Method Summary

Methods inherited from class org.apache.storm.topology.base.BaseRichBolt

Methods inherited from class org.apache.storm.topology.base.BaseComponent

Methods inherited from class java.lang.Object

Methods inherited from interface org.apache.storm.task.IBolt

Methods inherited from interface org.apache.storm.topology.IComponent

Field Detail

metadata2fieldParamName

metadataFilterParamName

textFieldParamName

textLengthParamName

urlFieldParamName

canonicalMetadataParamName

ignoreEmptyFieldValueParamName

Constructor Detail

AbstractIndexerBolt

Method Detail

prepare

filterDocument

filterMetadata

getDocumentID

valueForURL

fieldNameForText

trimText

fieldNameForURL

ignoreEmptyFields

declareOutputFields