Package edu.byu.hbll.box
Interface BoxDatabase
-
- All Superinterfaces:
BoxConfigurable
- All Known Subinterfaces:
ReadOnlyDatabase
- All Known Implementing Classes:
FacetView
,MemoryDatabase
,RemoteDatabase
,View
public interface BoxDatabase extends BoxConfigurable
A database to hold processed documents for later retrieval and all other metadata and controls used for processing the documents.Implementations should have a document repository with Box metadata, a queue for documents to be processed, and maintain harvest cursors and group start times and end times.
- Author:
- Charles Draper
-
-
Method Summary
All Methods Instance Methods Abstract Methods Default Methods Modifier and Type Method Description int
addToQueue(Duration olderThan)
Used to mark old documents for reprocessing.void
addToQueue(Collection<String> ids, Instant attempt, boolean overwrite)
Adds a collection of ids to the queue to be processed at the given time.default void
clear()
Clears the box database for the source.void
deleteFromQueue(Collection<String> ids)
Deletes the given document ids from the queue signifying that the processing of the documents was successful.edu.byu.hbll.box.QueryResult
find(edu.byu.hbll.box.BoxQuery query)
Finds documents in the database according to the given query.Map<String,Set<edu.byu.hbll.box.DocumentId>>
findDependencies(Collection<String> ids)
Finds the dependencies for the given document ids.Map<edu.byu.hbll.box.DocumentId,Set<String>>
findDependents(Collection<edu.byu.hbll.box.DocumentId> dependencies)
Finds documents dependent on the given dependencies.com.fasterxml.jackson.databind.JsonNode
findRegistryValue(String id)
Retrieves a single entry from the registry.com.fasterxml.jackson.databind.node.ObjectNode
getHarvestCursor()
Returns the harvest cursor for this source.Set<String>
listSourceDependencies()
Finds the unique set of all sources this source is dependent on.List<String>
nextFromQueue(int limit)
Return the next batch of ids from the queue.void
processOrphans(String groupId, Consumer<edu.byu.hbll.box.BoxDocument> function)
Using the timestamp saved in the database whenstartGroup(String)
was called, executes the given function on each document processed before that group start time.void
removeDeleted(Duration olderThan)
Removes all traces of documents that were deleted more than olderThan ago.void
save(Collection<? extends edu.byu.hbll.box.BoxDocument> documents)
Save this collection of documents to the database.void
saveRegistryValue(String id, com.fasterxml.jackson.databind.JsonNode value)
Saves an entry to the registry denoted by id.void
setHarvestCursor(com.fasterxml.jackson.databind.node.ObjectNode cursor)
Stores this cursor in the database for later retrieval bygetHarvestCursor()
.void
startGroup(String groupId)
Marks time in the database that the given group has started.void
updateDependencies(Collection<? extends edu.byu.hbll.box.BoxDocument> documents)
Updates this collection of documents with their specified dependencies.void
updateProcessed(Collection<String> ids)
Simply sets the processed date of the documents identified by the given ids to NOW.-
Methods inherited from interface edu.byu.hbll.box.BoxConfigurable
postConstruct, postInit, preDestroy
-
-
-
-
Method Detail
-
addToQueue
int addToQueue(Duration olderThan)
Used to mark old documents for reprocessing. Any documents that were last processed before (NOW - olderThan) will be placed on the queue.- Parameters:
olderThan
- target documents older than this age- Returns:
- number of documents added to queue
-
addToQueue
void addToQueue(Collection<String> ids, Instant attempt, boolean overwrite)
Adds a collection of ids to the queue to be processed at the given time. If the id already exists in the queue, this operation has no effect unless overwrite is true.- Parameters:
ids
- the ids to be addedattempt
- do not attempt to process until this timeoverwrite
- true to force an update on the attempt time of an existing entry, false to keep the existing attempt time
-
deleteFromQueue
void deleteFromQueue(Collection<String> ids)
Deletes the given document ids from the queue signifying that the processing of the documents was successful.- Parameters:
ids
- the document ids to delete
-
findDependents
Map<edu.byu.hbll.box.DocumentId,Set<String>> findDependents(Collection<edu.byu.hbll.box.DocumentId> dependencies)
Finds documents dependent on the given dependencies.- Parameters:
dependencies
- the dependencies- Returns:
- all dependent documents per dependency, key is dependency, value is dependents
-
find
edu.byu.hbll.box.QueryResult find(edu.byu.hbll.box.BoxQuery query)
Finds documents in the database according to the given query. If the query specifies ids, documents should be returned in the same order as ids and an unprocessed document should be created for missing documents. For id queries, all corresponding documents are returned in one page so limit is ignored. The database is not responsible for processing documents so the process and wait directives are ignored. The metadataLevel and metadataOnly directives are honored.- Parameters:
query
- the query to use- Returns:
- matching documents
-
getHarvestCursor
com.fasterxml.jackson.databind.node.ObjectNode getHarvestCursor()
Returns the harvest cursor for this source. Whatever is set usingsetHarvestCursor(ObjectNode)
should be returned here.- Returns:
- the harvest cursor
-
listSourceDependencies
Set<String> listSourceDependencies()
Finds the unique set of all sources this source is dependent on.- Returns:
- the set of dependency source names
-
nextFromQueue
List<String> nextFromQueue(int limit)
Return the next batch of ids from the queue.- Parameters:
limit
- size of the batch to return- Returns:
- the next ids in the queue
-
processOrphans
void processOrphans(String groupId, Consumer<edu.byu.hbll.box.BoxDocument> function)
Using the timestamp saved in the database whenstartGroup(String)
was called, executes the given function on each document processed before that group start time.- Parameters:
groupId
- documents belonging to this group should be processedfunction
- the function to run on each document
-
removeDeleted
void removeDeleted(Duration olderThan)
Removes all traces of documents that were deleted more than olderThan ago.- Parameters:
olderThan
- the age of the deleted documents to remove
-
save
void save(Collection<? extends edu.byu.hbll.box.BoxDocument> documents)
Save this collection of documents to the database.- Parameters:
documents
- documents to save
-
updateDependencies
void updateDependencies(Collection<? extends edu.byu.hbll.box.BoxDocument> documents)
Updates this collection of documents with their specified dependencies. Nothing else is updated.- Parameters:
documents
- documents to update
-
setHarvestCursor
void setHarvestCursor(com.fasterxml.jackson.databind.node.ObjectNode cursor)
Stores this cursor in the database for later retrieval bygetHarvestCursor()
.- Parameters:
cursor
- the cursor object to store
-
startGroup
void startGroup(String groupId)
Marks time in the database that the given group has started.- Parameters:
groupId
- the id of the group
-
updateProcessed
void updateProcessed(Collection<String> ids)
Simply sets the processed date of the documents identified by the given ids to NOW. Does nothing if the document doesn't exist.- Parameters:
ids
- the ids of the document
-
findDependencies
Map<String,Set<edu.byu.hbll.box.DocumentId>> findDependencies(Collection<String> ids)
Finds the dependencies for the given document ids.- Parameters:
ids
- the document ids- Returns:
- the dependencies for the given documents, key is the document id, value is the dependencies
-
findRegistryValue
com.fasterxml.jackson.databind.JsonNode findRegistryValue(String id)
Retrieves a single entry from the registry. The registry is a general place for individual entries important for the normal function of box. IDs should uniquely identify the type of entry.- Parameters:
id
- the id of the entry- Returns:
- the entry value denoted by the id or null if not found
-
saveRegistryValue
void saveRegistryValue(String id, com.fasterxml.jackson.databind.JsonNode value)
Saves an entry to the registry denoted by id. The registry is a general place for individual entries important for the normal function of box. IDs should uniquely identify the type of entry.- Parameters:
id
- the id of the entryvalue
- the value of the entry
-
clear
default void clear()
Clears the box database for the source.Default operation is to throw an
UnsupportedOperationException
.
-
-