Class URLFilters
- java.lang.Object
-
- com.digitalpebble.stormcrawler.util.AbstractConfigurable
-
- com.digitalpebble.stormcrawler.filtering.URLFilter
-
- com.digitalpebble.stormcrawler.filtering.URLFilters
-
- All Implemented Interfaces:
JSONResource
,Configurable
public class URLFilters extends URLFilter implements JSONResource
Wrapper for the URLFilters defined in a JSON configuration.- See Also:
for more information.
-
-
Field Summary
Fields Modifier and Type Field Description static URLFilters
emptyURLFilters
-
Constructor Summary
Constructors Constructor Description URLFilters(Map<String,Object> stormConf, String configFile)
Loads the filters from a JSON configuration file
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
configure(@NotNull Map<String,Object> stormConf, @NotNull com.fasterxml.jackson.databind.JsonNode filtersConf)
Called when this filter is being initialized@Nullable String
filter(@Nullable URL sourceUrl, @Nullable Metadata sourceMetadata, @NotNull String urlToFilter)
Returns null if the URL is to be removed or a normalised representation which can correspond to the input URLstatic URLFilters
fromConf(Map<String,Object> stormConf)
Loads and configure the URLFilters based on the storm config if there is one otherwise returns an empty URLFilter.String
getResourceFile()
void
loadJSONResources(InputStream inputStream)
Load the resources from an input stream-
Methods inherited from class com.digitalpebble.stormcrawler.util.AbstractConfigurable
configure, getName
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface com.digitalpebble.stormcrawler.JSONResource
loadJSONResources
-
-
-
-
Field Detail
-
emptyURLFilters
public static final URLFilters emptyURLFilters
-
-
Constructor Detail
-
URLFilters
public URLFilters(Map<String,Object> stormConf, String configFile) throws IOException
Loads the filters from a JSON configuration file- Throws:
IOException
-
-
Method Detail
-
fromConf
public static URLFilters fromConf(Map<String,Object> stormConf)
Loads and configure the URLFilters based on the storm config if there is one otherwise returns an empty URLFilter.
-
loadJSONResources
public void loadJSONResources(InputStream inputStream) throws com.fasterxml.jackson.core.JsonParseException, com.fasterxml.jackson.databind.JsonMappingException, IOException
Description copied from interface:JSONResource
Load the resources from an input stream- Specified by:
loadJSONResources
in interfaceJSONResource
- Throws:
com.fasterxml.jackson.core.JsonParseException
com.fasterxml.jackson.databind.JsonMappingException
IOException
-
filter
@Nullable public @Nullable String filter(@Nullable @Nullable URL sourceUrl, @Nullable @Nullable Metadata sourceMetadata, @NotNull @NotNull String urlToFilter)
Description copied from class:URLFilter
Returns null if the URL is to be removed or a normalised representation which can correspond to the input URL- Specified by:
filter
in classURLFilter
- Parameters:
sourceUrl
- the URL of the page where the URL was found. Can be null.sourceMetadata
- the metadata collected for the pageurlToFilter
- the URL to be filtered- Returns:
- null if the url is to be removed or a normalised representation which can correspond to the input URL
-
getResourceFile
public String getResourceFile()
- Specified by:
getResourceFile
in interfaceJSONResource
- Returns:
- filename of the JSON resource
-
configure
public void configure(@NotNull @NotNull Map<String,Object> stormConf, @NotNull @NotNull com.fasterxml.jackson.databind.JsonNode filtersConf)
Description copied from interface:Configurable
Called when this filter is being initialized- Specified by:
configure
in interfaceConfigurable
- Parameters:
stormConf
- The Storm configuration used for the configurablefiltersConf
- the filter specific configuration. Never null
-
-