Class ParseFilters

    • Field Detail

      • emptyParseFilter

        public static final ParseFilters emptyParseFilter
    • Method Detail

      • fromConf

        public static ParseFilters fromConf​(Map<String,​Object> stormConf)
        Loads and configure the ParseFilters based on the storm config if there is one otherwise returns an emptyParseFilter.
      • loadJSONResources

        public void loadJSONResources​(InputStream inputStream)
                               throws com.fasterxml.jackson.core.JsonParseException,
                                      com.fasterxml.jackson.databind.JsonMappingException,
                                      IOException
        Description copied from interface: JSONResource
        Load the resources from an input stream
        Specified by:
        loadJSONResources in interface JSONResource
        Throws:
        com.fasterxml.jackson.core.JsonParseException
        com.fasterxml.jackson.databind.JsonMappingException
        IOException
      • configure

        public void configure​(@NotNull
                              @NotNull Map<String,​Object> stormConf,
                              @NotNull
                              @NotNull com.fasterxml.jackson.databind.JsonNode filtersConf)
        Description copied from interface: Configurable
        Called when this filter is being initialized
        Specified by:
        configure in interface Configurable
        Parameters:
        stormConf - The Storm configuration used for the configurable
        filtersConf - the filter specific configuration. Never null
      • needsDOM

        public boolean needsDOM()
        Description copied from class: ParseFilter
        Specifies whether this filter requires a DOM representation of the document
        Overrides:
        needsDOM in class ParseFilter
        Returns:
        trueif this needs a DOM representation of the document, false otherwise.
      • filter

        public void filter​(String URL,
                           byte[] content,
                           DocumentFragment doc,
                           ParseResult parse)
        Description copied from class: ParseFilter
        Called when parsing a specific page
        Specified by:
        filter in class ParseFilter
        Parameters:
        URL - the URL of the page being parsed
        content - the content being parsed
        doc - the DOM tree resulting of the parsing of the content or null if ParseFilter.needsDOM() returns false
        parse - the metadata to be updated with the resulting of the parsing
      • main

        public static void main​(String[] args)
                         throws IOException,
                                org.apache.commons.cli.ParseException
        * Used for quick testing + debugging
        Throws:
        IOException
        org.apache.commons.cli.ParseException
        Since:
        1.17