Class LinkParseFilter

  • All Implemented Interfaces:
    Configurable

    public class LinkParseFilter
    extends XPathFilter
    ParseFilter to extract additional links with Xpath can be configured with e.g.
    
     {
       "class": "com.digitalpebble.stormcrawler.parse.filter.LinkParseFilter",
       "name": "LinkParseFilter",
       "params": {
         "pattern": "//IMG/@src",
         "pattern2": "//VIDEO/SOURCE/@src"
       }
     }
    
     
    • Constructor Detail

      • LinkParseFilter

        public LinkParseFilter()
    • Method Detail

      • filter

        public void filter​(String URL,
                           byte[] content,
                           DocumentFragment doc,
                           ParseResult parse)
        Description copied from class: ParseFilter
        Called when parsing a specific page
        Overrides:
        filter in class XPathFilter
        Parameters:
        URL - the URL of the page being parsed
        content - the content being parsed
        doc - the DOM tree resulting of the parsing of the content or null if ParseFilter.needsDOM() returns false
        parse - the metadata to be updated with the resulting of the parsing
      • configure

        public void configure​(@NotNull
                              @NotNull Map<String,​Object> stormConf,
                              @NotNull
                              @NotNull com.fasterxml.jackson.databind.JsonNode filterParams)
        Description copied from interface: Configurable
        Called when this filter is being initialized
        Specified by:
        configure in interface Configurable
        Overrides:
        configure in class XPathFilter
        Parameters:
        stormConf - The Storm configuration used for the configurable
        filterParams - the filter specific configuration. Never null