Class JSoupFilters

    • Field Detail

      • emptyParseFilter

        public static final JSoupFilters emptyParseFilter
    • Method Detail

      • fromConf

        public static JSoupFilters fromConf​(Map<String,​Object> stormConf)
        Loads and configure the JSoupFilters based on the storm config if there is one otherwise returns an empty JSoupFilter.
      • configure

        public void configure​(@NotNull
                              @NotNull Map<String,​Object> stormConf,
                              @NotNull
                              @NotNull com.fasterxml.jackson.databind.JsonNode filtersConf)
        Description copied from interface: Configurable
        Called when this filter is being initialized
        Specified by:
        configure in interface Configurable
        Parameters:
        stormConf - The Storm configuration used for the configurable
        filtersConf - the filter specific configuration. Never null
      • filter

        public void filter​(@NotNull
                           @NotNull String url,
                           byte[] content,
                           @NotNull
                           @NotNull org.jsoup.nodes.Document doc,
                           @NotNull
                           @NotNull ParseResult parse)
        Description copied from interface: JSoupFilter
        Called when parsing a specific page
        Specified by:
        filter in interface JSoupFilter
        Parameters:
        url - the URL of the page being parsed
        content - the content being parsed
        doc - document produced by JSoup's parsingF
        parse - the metadata to be updated with the resulting of the parsing
      • main

        public static void main​(String[] args)
                         throws IOException,
                                org.apache.commons.cli.ParseException
        Used for quick testing + debugging
        Throws:
        IOException
        org.apache.commons.cli.ParseException