Class HostURLFilter

  • All Implemented Interfaces:

    public class HostURLFilter
    extends URLFilter
    Filters URL based on the hostname.

    This filter has 2 modes:

    • if ignoreOutsideHost is true, all URLs with a host different from the host of the source URL are filtered out
    • if ignoreOutsideDomain is true, all URLs with a domain different from the source's domain are filtered out
    • Constructor Detail

      • HostURLFilter

        public HostURLFilter()
    • Method Detail

      • configure

        public void configure​(@NotNull
                              @NotNull Map<String,​Object> stormConf,
                              @NotNull com.fasterxml.jackson.databind.JsonNode filterParams)
        Description copied from interface: Configurable
        Called when this filter is being initialized
        stormConf - The Storm configuration used for the configurable
        filterParams - the filter specific configuration. Never null
      • filter

        public @Nullable String filter​(@Nullable
                                       @Nullable URL sourceUrl,
                                       @Nullable Metadata sourceMetadata,
                                       @NotNull String urlToFilter)
        Description copied from class: URLFilter
        Returns null if the URL is to be removed or a normalised representation which can correspond to the input URL
        Specified by:
        filter in class URLFilter
        sourceUrl - the URL of the page where the URL was found. Can be null.
        sourceMetadata - the metadata collected for the page
        urlToFilter - the URL to be filtered
        null if the url is to be removed or a normalised representation which can correspond to the input URL