Class HttpProtocol
- java.lang.Object
-
- com.digitalpebble.stormcrawler.protocol.AbstractHttpProtocol
-
- com.digitalpebble.stormcrawler.protocol.okhttp.HttpProtocol
-
- All Implemented Interfaces:
Protocol
public class HttpProtocol extends AbstractHttpProtocol
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class com.digitalpebble.stormcrawler.protocol.AbstractHttpProtocol
AbstractHttpProtocol.KeyValue
-
-
Field Summary
-
Fields inherited from class com.digitalpebble.stormcrawler.protocol.AbstractHttpProtocol
customHeaders, protocolMDprefix, protocolVersions, proxyManager, RESPONSE_COOKIES_HEADER, SET_HEADER_BY_REQUEST, skipRobots, storeHTTPHeaders, useCookies
-
-
Constructor Summary
Constructors Constructor Description HttpProtocol()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description protected void
addHeadersToRequest(okhttp3.Request.Builder rb, Metadata md)
void
configure(org.apache.storm.Config conf)
ProtocolResponse
getProtocolOutput(String url, Metadata metadata)
Fetches the content and additional metadatastatic void
main(String[] args)
-
Methods inherited from class com.digitalpebble.stormcrawler.protocol.AbstractHttpProtocol
cleanup, getAgentString, getRobotRules
-
-
-
-
Method Detail
-
configure
public void configure(org.apache.storm.Config conf)
- Specified by:
configure
in interfaceProtocol
- Overrides:
configure
in classAbstractHttpProtocol
-
addHeadersToRequest
protected void addHeadersToRequest(okhttp3.Request.Builder rb, Metadata md)
-
getProtocolOutput
public ProtocolResponse getProtocolOutput(String url, Metadata metadata) throws Exception
Description copied from interface:Protocol
Fetches the content and additional metadataIMPORTANT: the metadata returned within the response should only be new additional, no need to return the metadata passed in.
- Parameters:
url
- the location of the contentmetadata
- extra information- Returns:
- the content and optional metadata fetched via this protocol
- Throws:
Exception
-
-