Class HttpProtocol
- java.lang.Object
-
- com.digitalpebble.stormcrawler.protocol.AbstractHttpProtocol
-
- com.digitalpebble.stormcrawler.protocol.httpclient.HttpProtocol
-
- All Implemented Interfaces:
Protocol
,org.apache.http.client.ResponseHandler<ProtocolResponse>
public class HttpProtocol extends AbstractHttpProtocol implements org.apache.http.client.ResponseHandler<ProtocolResponse>
Uses Apache httpclient to handle http and https
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class com.digitalpebble.stormcrawler.protocol.AbstractHttpProtocol
AbstractHttpProtocol.KeyValue
-
-
Field Summary
-
Fields inherited from class com.digitalpebble.stormcrawler.protocol.AbstractHttpProtocol
customHeaders, protocolMDprefix, protocolVersions, proxyManager, RESPONSE_COOKIES_HEADER, SET_HEADER_BY_REQUEST, skipRobots, storeHTTPHeaders, useCookies
-
-
Constructor Summary
Constructors Constructor Description HttpProtocol()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description protected void
addHeadersToRequest(org.apache.http.client.methods.HttpRequestBase request, Metadata md)
void
configure(org.apache.storm.Config conf)
ProtocolResponse
getProtocolOutput(String url, Metadata md)
Fetches the content and additional metadataProtocolResponse
handleResponse(org.apache.http.HttpResponse response)
ProtocolResponse
handleResponseWithContentLimit(org.apache.http.HttpResponse response, int maxContent)
static void
main(String[] args)
-
Methods inherited from class com.digitalpebble.stormcrawler.protocol.AbstractHttpProtocol
cleanup, getAgentString, getRobotRules
-
-
-
-
Method Detail
-
configure
public void configure(org.apache.storm.Config conf)
- Specified by:
configure
in interfaceProtocol
- Overrides:
configure
in classAbstractHttpProtocol
-
getProtocolOutput
public ProtocolResponse getProtocolOutput(String url, Metadata md) throws Exception
Description copied from interface:Protocol
Fetches the content and additional metadataIMPORTANT: the metadata returned within the response should only be new additional, no need to return the metadata passed in.
- Specified by:
getProtocolOutput
in interfaceProtocol
- Parameters:
url
- the location of the contentmd
- extra information- Returns:
- the content and optional metadata fetched via this protocol
- Throws:
Exception
-
addHeadersToRequest
protected void addHeadersToRequest(org.apache.http.client.methods.HttpRequestBase request, Metadata md)
-
handleResponse
public ProtocolResponse handleResponse(org.apache.http.HttpResponse response) throws IOException
- Specified by:
handleResponse
in interfaceorg.apache.http.client.ResponseHandler<ProtocolResponse>
- Throws:
IOException
-
handleResponseWithContentLimit
public ProtocolResponse handleResponseWithContentLimit(org.apache.http.HttpResponse response, int maxContent) throws IOException
- Throws:
IOException
-
-