Class URLFilter
- java.lang.Object
-
- com.digitalpebble.stormcrawler.util.AbstractConfigurable
-
- com.digitalpebble.stormcrawler.filtering.URLFilter
-
- All Implemented Interfaces:
Configurable
- Direct Known Subclasses:
BasicURLFilter
,BasicURLNormalizer
,FastURLFilter
,HostURLFilter
,MaxDepthFilter
,MetadataFilter
,RegexURLFilterBase
,RegexURLNormalizer
,RobotsFilter
,SelfURLFilter
,SitemapFilter
,URLFilters
public abstract class URLFilter extends AbstractConfigurable
Unlike Nutch, URLFilters can normalise the URLs as well as filtering them. URLFilter instances should be used viaURLFilters
- See Also:
for more information.
-
-
Constructor Summary
Constructors Constructor Description URLFilter()
-
Method Summary
All Methods Instance Methods Abstract Methods Modifier and Type Method Description abstract @Nullable String
filter(@Nullable URL sourceUrl, @Nullable Metadata sourceMetadata, @NotNull String urlToFilter)
Returns null if the URL is to be removed or a normalised representation which can correspond to the input URL-
Methods inherited from class com.digitalpebble.stormcrawler.util.AbstractConfigurable
configure, getName
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface com.digitalpebble.stormcrawler.util.Configurable
configure
-
-
-
-
Method Detail
-
filter
@Nullable public abstract @Nullable String filter(@Nullable @Nullable URL sourceUrl, @Nullable @Nullable Metadata sourceMetadata, @NotNull @NotNull String urlToFilter)
Returns null if the URL is to be removed or a normalised representation which can correspond to the input URL- Parameters:
sourceUrl
- the URL of the page where the URL was found. Can be null.sourceMetadata
- the metadata collected for the pageurlToFilter
- the URL to be filtered- Returns:
- null if the url is to be removed or a normalised representation which can correspond to the input URL
-
-