Class ParseFilters
- java.lang.Object
-
- com.digitalpebble.stormcrawler.util.AbstractConfigurable
-
- com.digitalpebble.stormcrawler.parse.ParseFilter
-
- com.digitalpebble.stormcrawler.parse.ParseFilters
-
- All Implemented Interfaces:
JSONResource
,Configurable
public class ParseFilters extends ParseFilter implements JSONResource
Wrapper for the ParseFilters defined in a JSON configuration- See Also:
for more information.
-
-
Field Summary
Fields Modifier and Type Field Description static ParseFilters
emptyParseFilter
-
Constructor Summary
Constructors Constructor Description ParseFilters(Map<String,Object> stormConf, String configFile)
loads the filters from a JSON configuration file
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
configure(@NotNull Map<String,Object> stormConf, @NotNull com.fasterxml.jackson.databind.JsonNode filtersConf)
Called when this filter is being initializedvoid
filter(String URL, byte[] content, DocumentFragment doc, ParseResult parse)
Called when parsing a specific pagestatic ParseFilters
fromConf(Map<String,Object> stormConf)
Loads and configure the ParseFilters based on the storm config if there is one otherwise returns an emptyParseFilter.String
getResourceFile()
void
loadJSONResources(InputStream inputStream)
Load the resources from an input streamstatic void
main(String[] args)
* Used for quick testing + debuggingboolean
needsDOM()
Specifies whether this filter requires a DOM representation of the document-
Methods inherited from class com.digitalpebble.stormcrawler.util.AbstractConfigurable
configure, getName
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface com.digitalpebble.stormcrawler.JSONResource
loadJSONResources
-
-
-
-
Field Detail
-
emptyParseFilter
public static final ParseFilters emptyParseFilter
-
-
Constructor Detail
-
ParseFilters
public ParseFilters(Map<String,Object> stormConf, String configFile) throws IOException
loads the filters from a JSON configuration file- Throws:
IOException
-
-
Method Detail
-
fromConf
public static ParseFilters fromConf(Map<String,Object> stormConf)
Loads and configure the ParseFilters based on the storm config if there is one otherwise returns an emptyParseFilter.
-
loadJSONResources
public void loadJSONResources(InputStream inputStream) throws com.fasterxml.jackson.core.JsonParseException, com.fasterxml.jackson.databind.JsonMappingException, IOException
Description copied from interface:JSONResource
Load the resources from an input stream- Specified by:
loadJSONResources
in interfaceJSONResource
- Throws:
com.fasterxml.jackson.core.JsonParseException
com.fasterxml.jackson.databind.JsonMappingException
IOException
-
getResourceFile
public String getResourceFile()
- Specified by:
getResourceFile
in interfaceJSONResource
- Returns:
- filename of the JSON resource
-
configure
public void configure(@NotNull @NotNull Map<String,Object> stormConf, @NotNull @NotNull com.fasterxml.jackson.databind.JsonNode filtersConf)
Description copied from interface:Configurable
Called when this filter is being initialized- Specified by:
configure
in interfaceConfigurable
- Parameters:
stormConf
- The Storm configuration used for the configurablefiltersConf
- the filter specific configuration. Never null
-
needsDOM
public boolean needsDOM()
Description copied from class:ParseFilter
Specifies whether this filter requires a DOM representation of the document- Overrides:
needsDOM
in classParseFilter
- Returns:
true
if this needs a DOM representation of the document,false
otherwise.
-
filter
public void filter(String URL, byte[] content, DocumentFragment doc, ParseResult parse)
Description copied from class:ParseFilter
Called when parsing a specific page- Specified by:
filter
in classParseFilter
- Parameters:
URL
- the URL of the page being parsedcontent
- the content being parseddoc
- the DOM tree resulting of the parsing of the content or null ifParseFilter.needsDOM()
returnsfalse
parse
- the metadata to be updated with the resulting of the parsing
-
main
public static void main(String[] args) throws IOException, org.apache.commons.cli.ParseException
* Used for quick testing + debugging- Throws:
IOException
org.apache.commons.cli.ParseException
- Since:
- 1.17
-
-