Interface Protocol

    • Method Detail

      • configure

        void configure​(org.apache.storm.Config conf)
      • getProtocolOutput

        ProtocolResponse getProtocolOutput​(String url,
                                           Metadata metadata)
                                    throws Exception
        Fetches the content and additional metadata

        IMPORTANT: the metadata returned within the response should only be new additional, no need to return the metadata passed in.

        Parameters:
        url - the location of the content
        metadata - extra information
        Returns:
        the content and optional metadata fetched via this protocol
        Throws:
        Exception
      • getRobotRules

        crawlercommons.robots.BaseRobotRules getRobotRules​(String url)
      • cleanup

        void cleanup()