Interface JSoupFilter
-
- All Superinterfaces:
Configurable
- All Known Implementing Classes:
JSoupFilters
,LDJsonParseFilter
,LinkParseFilter
,XPathFilter
public interface JSoupFilter extends Configurable
Implementations of ParseFilter are responsible for extracting custom data from the crawled content. They are used exclusively byJSoupParserBolt
.
-
-
Method Summary
All Methods Instance Methods Abstract Methods Modifier and Type Method Description void
filter(@NotNull String url, byte[] content, org.jsoup.nodes.Document doc, @NotNull ParseResult parse)
Called when parsing a specific page-
Methods inherited from interface com.digitalpebble.stormcrawler.util.Configurable
configure, configure, getName
-
-
-
-
Method Detail
-
filter
void filter(@NotNull @NotNull String url, byte[] content, @NotNull org.jsoup.nodes.Document doc, @NotNull @NotNull ParseResult parse)
Called when parsing a specific page- Parameters:
url
- the URL of the page being parsedcontent
- the content being parseddoc
- document produced by JSoup's parsingFparse
- the metadata to be updated with the resulting of the parsing
-
-