Interface Protocol
-
- All Known Implementing Classes:
AbstractHttpProtocol
,DelegatorProtocol
,FileProtocol
,HttpProtocol
,HttpProtocol
,RemoteDriverProtocol
,SeleniumProtocol
public interface Protocol
-
-
Method Summary
All Methods Static Methods Instance Methods Abstract Methods Modifier and Type Method Description void
cleanup()
void
configure(org.apache.storm.Config conf)
ProtocolResponse
getProtocolOutput(String url, Metadata metadata)
Fetches the content and additional metadatacrawlercommons.robots.BaseRobotRules
getRobotRules(String url)
static void
main(Protocol protocol, String[] args)
-
-
-
Method Detail
-
configure
void configure(org.apache.storm.Config conf)
-
getProtocolOutput
ProtocolResponse getProtocolOutput(String url, Metadata metadata) throws Exception
Fetches the content and additional metadataIMPORTANT: the metadata returned within the response should only be new additional, no need to return the metadata passed in.
- Parameters:
url
- the location of the contentmetadata
- extra information- Returns:
- the content and optional metadata fetched via this protocol
- Throws:
Exception
-
getRobotRules
crawlercommons.robots.BaseRobotRules getRobotRules(String url)
-
cleanup
void cleanup()
-
-