Class MD5SignatureParseFilter
- java.lang.Object
-
- com.digitalpebble.stormcrawler.util.AbstractConfigurable
-
- com.digitalpebble.stormcrawler.parse.ParseFilter
-
- com.digitalpebble.stormcrawler.parse.filter.MD5SignatureParseFilter
-
- All Implemented Interfaces:
Configurable
public class MD5SignatureParseFilter extends ParseFilter
Computes a signature for a page, based on the binary content or text. If the content is empty, the URL is used.Configuration properties:
- useText
- compute signature on plain text, instead of binary content
- keyName
- name of the metadata field to hold the signature (default: "signature")
- keyNameCopy
- name of the metadata field to hold a temporary copy of the signature used to decide by signature comparison whether the document has changed. If not defined or empty, the signature is not copied.
-
-
Constructor Summary
Constructors Constructor Description MD5SignatureParseFilter()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
configure(@NotNull Map<String,Object> stormConf, @NotNull com.fasterxml.jackson.databind.JsonNode filterParams)
Called when this filter is being initializedvoid
filter(String URL, byte[] content, DocumentFragment doc, ParseResult parse)
Called when parsing a specific page-
Methods inherited from class com.digitalpebble.stormcrawler.parse.ParseFilter
needsDOM
-
Methods inherited from class com.digitalpebble.stormcrawler.util.AbstractConfigurable
configure, getName
-
-
-
-
Method Detail
-
filter
public void filter(String URL, byte[] content, DocumentFragment doc, ParseResult parse)
Description copied from class:ParseFilter
Called when parsing a specific page- Specified by:
filter
in classParseFilter
- Parameters:
URL
- the URL of the page being parsedcontent
- the content being parseddoc
- the DOM tree resulting of the parsing of the content or null ifParseFilter.needsDOM()
returnsfalse
parse
- the metadata to be updated with the resulting of the parsing
-
configure
public void configure(@NotNull @NotNull Map<String,Object> stormConf, @NotNull @NotNull com.fasterxml.jackson.databind.JsonNode filterParams)
Description copied from interface:Configurable
Called when this filter is being initialized- Parameters:
stormConf
- The Storm configuration used for the configurablefilterParams
- the filter specific configuration. Never null
-
-