Interface URLBuffer

  • All Known Implementing Classes:
    AbstractURLBuffer, PriorityURLBuffer, SchedulingURLBuffer, SimpleURLBuffer

    public interface URLBuffer
    Buffers URLs to be processed into separate queues; used by spouts. Guarantees that no URL can be put in the buffer more than once.

    Configured by setting

    urlbuffer.class: "com.digitalpebble.stormcrawler.persistence.SimpleURLBuffer"

    in the configuration

    • Field Detail

      • bufferClassParamName

        static final String bufferClassParamName
        Implementation to use for URLBuffer. Must implement the interface URLBuffer.
        See Also:
        Constant Field Values
    • Method Detail

      • createInstance

        static @NotNull URLBuffer createInstance​(@NotNull
                                                 @NotNull Map<String,​Object> stormConf)
        Returns a URLBuffer instance based on the configuration *
      • add

        boolean add​(String URL,
                    Metadata m,
                    String key)
        Stores the URL and its Metadata under a given key.

        Implementations of this method should be synchronised

        false if the URL was already in the buffer, true if it wasn't and was added
      • add

        default boolean add​(String URL,
                            Metadata m)
        Stores the URL and its Metadata using the hostname as key.

        Implementations of this method should be synchronised

        false if the URL was already in the buffer, true if it wasn't and was added
      • size

        int size()
        Total number of URLs in the buffer *
      • numQueues

        int numQueues()
        Total number of queues in the buffer *
      • next

        org.apache.storm.tuple.Values next()
        Retrieves the next available URL, guarantees that the URLs are always perfectly shuffled

        Implementations of this method should be synchronised

      • hasNext

        boolean hasNext()
        Implementations of this method should be synchronised
      • acked

        default void acked​(String url)
        Notify the buffer that a URL has been successfully processed used e.g to compute an ideal delay for a host queue