Class LDJsonParseFilter
- java.lang.Object
-
- com.digitalpebble.stormcrawler.util.AbstractConfigurable
-
- com.digitalpebble.stormcrawler.jsoup.LDJsonParseFilter
-
- All Implemented Interfaces:
JSoupFilter
,Configurable
public class LDJsonParseFilter extends AbstractConfigurable implements JSoupFilter
Extracts data from JSON-LD representation (https://json-ld.org/). Illustrates how to use the JSoupFilters
-
-
Field Summary
Fields Modifier and Type Field Description static org.slf4j.Logger
LOG
-
Constructor Summary
Constructors Constructor Description LDJsonParseFilter()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
configure(@NotNull Map<String,Object> stormConf, @NotNull com.fasterxml.jackson.databind.JsonNode filterParams)
Called when this filter is being initializedvoid
filter(@NotNull String url, byte[] content, @NotNull org.jsoup.nodes.Document doc, @NotNull ParseResult parse)
Called when parsing a specific pagestatic com.fasterxml.jackson.databind.JsonNode
filterJson(org.jsoup.nodes.Document doc)
-
Methods inherited from class com.digitalpebble.stormcrawler.util.AbstractConfigurable
configure, getName
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface com.digitalpebble.stormcrawler.util.Configurable
configure, getName
-
-
-
-
Method Detail
-
filterJson
public static com.fasterxml.jackson.databind.JsonNode filterJson(org.jsoup.nodes.Document doc) throws Exception
- Throws:
Exception
-
configure
public void configure(@NotNull @NotNull Map<String,Object> stormConf, @NotNull @NotNull com.fasterxml.jackson.databind.JsonNode filterParams)
Description copied from interface:Configurable
Called when this filter is being initialized- Specified by:
configure
in interfaceConfigurable
- Parameters:
stormConf
- The Storm configuration used for the configurablefilterParams
- the filter specific configuration. Never null
-
filter
public void filter(@NotNull @NotNull String url, byte[] content, @NotNull @NotNull org.jsoup.nodes.Document doc, @NotNull @NotNull ParseResult parse)
Description copied from interface:JSoupFilter
Called when parsing a specific page- Specified by:
filter
in interfaceJSoupFilter
- Parameters:
url
- the URL of the page being parsedcontent
- the content being parseddoc
- document produced by JSoup's parsingFparse
- the metadata to be updated with the resulting of the parsing
-
-