Class URLFilters
- java.lang.Object
-
- com.digitalpebble.stormcrawler.util.AbstractConfigurable
-
- com.digitalpebble.stormcrawler.filtering.URLFilter
-
- com.digitalpebble.stormcrawler.filtering.URLFilters
-
- All Implemented Interfaces:
JSONResource
,Configurable
public class URLFilters extends URLFilter implements JSONResource
Wrapper for the URLFilters defined in a JSON configuration.- See Also:
for more information.
-
-
Field Summary
Fields Modifier and Type Field Description static URLFilters
emptyURLFilters
-
Constructor Summary
Constructors Constructor Description URLFilters(Map<String,Object> stormConf, String configFile)
Loads the filters from a JSON configuration file
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
configure(@NotNull Map<String,Object> stormConf, @NotNull com.fasterxml.jackson.databind.JsonNode filtersConf)
Called when this filter is being initialized@Nullable String
filter(@Nullable URL sourceUrl, @Nullable Metadata sourceMetadata, @NotNull String urlToFilter)
Returns null if the URL is to be removed or a normalised representation which can correspond to the input URLstatic URLFilters
fromConf(Map<String,Object> stormConf)
Loads and configure the URLFilters based on the storm config if there is one otherwise returns an empty URLFilter.String
getResourceFile()
void
loadJSONResources(InputStream inputStream)
Load the resources from an input streamstatic void
main(String[] args)
Utility to check the filtering of a URL *-
Methods inherited from class com.digitalpebble.stormcrawler.util.AbstractConfigurable
configure, getName
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface com.digitalpebble.stormcrawler.JSONResource
loadJSONResources
-
-
-
-
Field Detail
-
emptyURLFilters
public static final URLFilters emptyURLFilters
-
-
Constructor Detail
-
URLFilters
public URLFilters(Map<String,Object> stormConf, String configFile) throws IOException
Loads the filters from a JSON configuration file- Throws:
IOException
-
-
Method Detail
-
fromConf
public static URLFilters fromConf(Map<String,Object> stormConf)
Loads and configure the URLFilters based on the storm config if there is one otherwise returns an empty URLFilter.
-
loadJSONResources
public void loadJSONResources(InputStream inputStream) throws com.fasterxml.jackson.core.JsonParseException, com.fasterxml.jackson.databind.JsonMappingException, IOException
Description copied from interface:JSONResource
Load the resources from an input stream- Specified by:
loadJSONResources
in interfaceJSONResource
- Throws:
com.fasterxml.jackson.core.JsonParseException
com.fasterxml.jackson.databind.JsonMappingException
IOException
-
filter
@Nullable public @Nullable String filter(@Nullable @Nullable URL sourceUrl, @Nullable @Nullable Metadata sourceMetadata, @NotNull @NotNull String urlToFilter)
Description copied from class:URLFilter
Returns null if the URL is to be removed or a normalised representation which can correspond to the input URL- Specified by:
filter
in classURLFilter
- Parameters:
sourceUrl
- the URL of the page where the URL was found. Can be null.sourceMetadata
- the metadata collected for the pageurlToFilter
- the URL to be filtered- Returns:
- null if the url is to be removed or a normalised representation which can correspond to the input URL
-
getResourceFile
public String getResourceFile()
- Specified by:
getResourceFile
in interfaceJSONResource
- Returns:
- filename of the JSON resource
-
configure
public void configure(@NotNull @NotNull Map<String,Object> stormConf, @NotNull @NotNull com.fasterxml.jackson.databind.JsonNode filtersConf)
Description copied from interface:Configurable
Called when this filter is being initialized- Specified by:
configure
in interfaceConfigurable
- Parameters:
stormConf
- The Storm configuration used for the configurablefiltersConf
- the filter specific configuration. Never null
-
main
public static void main(String[] args) throws org.apache.commons.cli.ParseException
Utility to check the filtering of a URL *- Throws:
org.apache.commons.cli.ParseException
-
-