Interface BoxDatabase

  • All Superinterfaces:
    BoxConfigurable
    All Known Subinterfaces:
    ReadOnlyDatabase
    All Known Implementing Classes:
    FacetView, MemoryDatabase, RemoteDatabase, View

    public interface BoxDatabase
    extends BoxConfigurable
    A database to hold processed documents for later retrieval and all other metadata and controls used for processing the documents.

    Implementations should have a document repository with Box metadata, a queue for documents to be processed, and maintain harvest cursors and group start times and end times.

    Author:
    Charles Draper
    • Method Detail

      • addToQueue

        int addToQueue​(Duration olderThan)
        Used to mark old documents for reprocessing. Any documents that were last processed before (NOW - olderThan) will be placed on the queue.
        Parameters:
        olderThan - target documents older than this age
        Returns:
        number of documents added to queue
      • addToQueue

        void addToQueue​(Collection<String> ids,
                        Instant attempt,
                        boolean overwrite)
        Adds a collection of ids to the queue to be processed at the given time. If the id already exists in the queue, this operation has no effect unless overwrite is true.
        Parameters:
        ids - the ids to be added
        attempt - do not attempt to process until this time
        overwrite - true to force an update on the attempt time of an existing entry, false to keep the existing attempt time
      • deleteFromQueue

        void deleteFromQueue​(Collection<String> ids)
        Deletes the given document ids from the queue signifying that the processing of the documents was successful.
        Parameters:
        ids - the document ids to delete
      • findDependents

        Map<edu.byu.hbll.box.DocumentId,​Set<String>> findDependents​(Collection<edu.byu.hbll.box.DocumentId> dependencies)
        Finds documents dependent on the given dependencies.
        Parameters:
        dependencies - the dependencies
        Returns:
        all dependent documents per dependency, key is dependency, value is dependents
      • find

        edu.byu.hbll.box.QueryResult find​(edu.byu.hbll.box.BoxQuery query)
        Finds documents in the database according to the given query. If the query specifies ids, documents should be returned in the same order as ids and an unprocessed document should be created for missing documents. For id queries, all corresponding documents are returned in one page so limit is ignored. The database is not responsible for processing documents so the process and wait directives are ignored. The metadataLevel and metadataOnly directives are honored.
        Parameters:
        query - the query to use
        Returns:
        matching documents
      • getHarvestCursor

        com.fasterxml.jackson.databind.node.ObjectNode getHarvestCursor()
        Returns the harvest cursor for this source. Whatever is set using setHarvestCursor(ObjectNode) should be returned here.
        Returns:
        the harvest cursor
      • listSourceDependencies

        Set<String> listSourceDependencies()
        Finds the unique set of all sources this source is dependent on.
        Returns:
        the set of dependency source names
      • nextFromQueue

        List<String> nextFromQueue​(int limit)
        Return the next batch of ids from the queue.
        Parameters:
        limit - size of the batch to return
        Returns:
        the next ids in the queue
      • processOrphans

        void processOrphans​(String groupId,
                            Consumer<edu.byu.hbll.box.BoxDocument> function)
        Using the timestamp saved in the database when startGroup(String) was called, executes the given function on each document processed before that group start time.
        Parameters:
        groupId - documents belonging to this group should be processed
        function - the function to run on each document
      • removeDeleted

        void removeDeleted​(Duration olderThan)
        Removes all traces of documents that were deleted more than olderThan ago.
        Parameters:
        olderThan - the age of the deleted documents to remove
      • save

        void save​(Collection<? extends edu.byu.hbll.box.BoxDocument> documents)
        Save this collection of documents to the database.
        Parameters:
        documents - documents to save
      • updateDependencies

        void updateDependencies​(Collection<? extends edu.byu.hbll.box.BoxDocument> documents)
        Updates this collection of documents with their specified dependencies. Nothing else is updated.
        Parameters:
        documents - documents to update
      • setHarvestCursor

        void setHarvestCursor​(com.fasterxml.jackson.databind.node.ObjectNode cursor)
        Stores this cursor in the database for later retrieval by getHarvestCursor().
        Parameters:
        cursor - the cursor object to store
      • startGroup

        void startGroup​(String groupId)
        Marks time in the database that the given group has started.
        Parameters:
        groupId - the id of the group
      • updateProcessed

        void updateProcessed​(Collection<String> ids)
        Simply sets the processed date of the documents identified by the given ids to NOW. Does nothing if the document doesn't exist.
        Parameters:
        ids - the ids of the document
      • findDependencies

        Map<String,​Set<edu.byu.hbll.box.DocumentId>> findDependencies​(Collection<String> ids)
        Finds the dependencies for the given document ids.
        Parameters:
        ids - the document ids
        Returns:
        the dependencies for the given documents, key is the document id, value is the dependencies
      • findRegistryValue

        com.fasterxml.jackson.databind.JsonNode findRegistryValue​(String id)
        Retrieves a single entry from the registry. The registry is a general place for individual entries important for the normal function of box. IDs should uniquely identify the type of entry.
        Parameters:
        id - the id of the entry
        Returns:
        the entry value denoted by the id or null if not found
      • saveRegistryValue

        void saveRegistryValue​(String id,
                               com.fasterxml.jackson.databind.JsonNode value)
        Saves an entry to the registry denoted by id. The registry is a general place for individual entries important for the normal function of box. IDs should uniquely identify the type of entry.
        Parameters:
        id - the id of the entry
        value - the value of the entry