Interface Harvester

  • All Superinterfaces:
    BoxConfigurable
    All Known Implementing Classes:
    FacetView, View

    public interface Harvester
    extends BoxConfigurable
    Harvests documents or at least the document ids from a source system. Given an initial state, the client will load documents from a source system paging through the results until an end is reached. The client should return the resulting documents after each page.

    If paging is not possible and streaming is necessary, the harvester may use Source.save(BoxDocument) to stream in the resulting documents rather than returning them in the HarvestResult.

    For Box's purposes, there will only be one instance of the harvester per source and only one thread will call harvest at a time. So a harvester does not need to be thread safe. Also for performance or other reasons, state may be maintained for subsequent calls of harvest(HarvestContext). In order to pick back up where left off in the case of application redeployments, a "cursor" will be saved to a database and offered back to the client letting the client know where it left off.

    The harvester can also be responsible for just gathering ids to be processed by the processor rather than documents to be saved. This is done by returning a list of unprocessed documents where only the id is set.

    Author:
    Charles Draper
    • Method Detail

      • harvest

        HarvestResult harvest​(HarvestContext context)
        Returns the next set or page of documents from the source system. The client knows what the "next" set is by observing the cursor object inside the context and determining which documents to return next. The cursor is an object that is created by the client and returned as part of the HarvestResult after each set.
        Parameters:
        context - context informing the harvester what to do next
        Returns:
        a result containing resulting documents and other information for Box