Class VisitorIterator
Enables transparent iteration of super/sub-buckets
Thread safety: safe for threads to hold their own iterators (no shared state), as long as they also hold the ProgressToken object associated with it. No two VisitorIterator instances may share the same progress token instance at the same time. Concurrent access to a single VisitorIterator instance is not safe and must be handled atomically by the caller.
- Author:
- vekterli
-
Nested Class Summary
Modifier and TypeClassDescriptionstatic class
protected static interface
Provides an abstract interface toVisitorIterator
for how pending buckets are acquired, decoupling this from the iteration itself.protected static class
Provides a bucket source that encompasses the entire range available through a given value of distribution bitsprotected static class
Provides an explicit set of bucket IDs to iterate over. -
Method Summary
Modifier and TypeMethodDescriptionstatic VisitorIterator
createFromDocumentSelection
(String documentSelection, com.yahoo.document.BucketIdFactory idFactory, int distributionBitCount, ProgressToken progress) static VisitorIterator
createFromDocumentSelection
(String documentSelection, com.yahoo.document.BucketIdFactory idFactory, int distributionBitCount, ProgressToken progress, int slices, int sliceId) Create a newVisitorIterator
instance based on the given document selection string.static VisitorIterator
createFromExplicitBucketSet
(Set<com.yahoo.document.BucketId> bucketsToVisit, int distributionBitCount, ProgressToken progress) Create a newVisitorIterator
instance based on the given set of buckets.protected VisitorIterator.BucketSource
int
getNext()
long
boolean
hasNext()
Check whether or not it is valid to callgetNext()
with the current iterator state.boolean
isDone()
Check if the iterator is actually donevoid
setDistributionBitCount
(int distBits) Set the distribution bit count for the iterator and the buckets it currently maintains and will return in the future.void
update
(com.yahoo.document.BucketId superbucket, com.yahoo.document.BucketId progress) Tell the iterator that we've finished processing up to and includingprogress
.boolean
-
Method Details
-
getNext
- Returns:
- The pair [superbucket, progress] that specifies the next iterable
bucket. When a superbucket is initially returned, the pair is equal to
that of [superbucket, 0], as there has been no progress into its sub-buckets
yet (if they exist).
Precondition:
hasNext() == true
-
hasNext
public boolean hasNext()Check whether or not it is valid to call
getNext()
with the current iterator state.There exists a case wherein
hasNext
may return false beforeupdate(com.yahoo.document.BucketId, com.yahoo.document.BucketId)
is called, but true afterwards. This happens when the set of pending buckets is empty, the bucket source is empty but the set of active buckets is not. A future progress update on any of the buckets in the active set may or may not make that bucket available to the pending set again. This must be handled explicitly by the caller by checkingisDone()
and ensuring thatupdate(com.yahoo.document.BucketId, com.yahoo.document.BucketId)
is called before retryinghasNext
.This method will also return false if the number of distribution bits have changed and there are active buckets needing to be flushed before the iterator will allow new buckets to be handed out.
- Returns:
- Whether or not it is valid to call
getNext()
with the current iterator state.
-
isDone
public boolean isDone()Check if the iterator is actually done- Returns:
true
iff the bucket source is empty and there are no pending or active buckets in the progress token.- See Also:
-
update
public void update(com.yahoo.document.BucketId superbucket, com.yahoo.document.BucketId progress) Tell the iterator that we've finished processing up to and including
progress
.progress
may be a sub-bucket or the invalid 0-bucket (in case the caller fails to process the bucket and must return it to the set of pending) or the special caseBucketId(Integer.MAX_VALUE)
, the latter indicating to the iterator that traversal is complete forsuperbucket
's tree. The null bucket should only be used if no non-null updates have yet been given for the superbucket.It is a requirement that each superbucket returned by
getNext()
must eventually result in 1-n update operations, where the last update operation has the special progress==super case.If the document selection used to create the iterator is unknown and there were active buckets at the time of a distribution bit state change, such a bucket passed to
update()
will be in an inconsistent state with regards to the number of bits it uses. For unfinished buckets, this is handled by splitting or merging it until it's consistent, depending on whether or not it had a lower or higher distribution bit count than that of the current system state. For finished buckets of a lower dist bit count, the amount of finished buckets in the ProgressToken is adjusted upwards to compensate for the fact that a bucket using fewer distribution bits actually covers more of the bucket space than the ones that are currently in use. For finished buckets of a higher dist bit count, the number of finished buckets is not increased at that point in time, since such a bucket doesn't actually cover an entire bucket with the current state.All this is done automatically and transparently to the caller once all active buckets have been updated.
- Parameters:
superbucket
- A valid bucket ID that has been retrieved earlier throughgetNext()
progress
- A bucket logically contained withinsuper
. Subsequent updates for the same superbucket must haveprogress
be in an increasing order, where order is defined as the in-order traversal of the bucket split tree. May also be the null bucket if the superbucket has not seen any "proper" progress updates yet or the special case Integer.MAX_VALUE. Note that inconsistent splitting might actually seeprogress
as containingsuper
rather than vice versa, so this is explicitly allowed to pass by the code.
-
getRemainingBucketCount
public long getRemainingBucketCount()- Returns:
- The total number of iterable buckets that remain to be processed Note: currently includes all non-finished (i.e. active and pending buckets) as well
-
getBucketSource
- Returns:
- Internal bucket source instance. Do NOT modify!
-
getProgressToken
-
getDistributionBitCount
public int getDistributionBitCount() -
setDistributionBitCount
public void setDistributionBitCount(int distBits) Set the distribution bit count for the iterator and the buckets it currently maintains and will return in the future.
For document selections that result in a explicit set of buckets, this is essentially a no-op, so in such a case, disregard the rest of this text.
Changing the number of distribution bits for an unknown document selection will effectively scale the bucket space that will be visited; each bit increase or decrease doubling or halving its size, respectively. When increasing, any pending buckets will be split to ensure the total bucket space covered remains the same. Correspondingly, when decreasing, any pending buckets will be merged appropriately.
If there are buckets active at the time of the change, the actual bucket splitting/merging operations are kept on hold until all active buckets have been updated, at which point they will be automatically performed. The iterator will force such an update by not giving out any new or pending buckets until that happens.
Note: when decreasing the number of distribution bits, there is a chance of losing superbucket progress in a bucket that is merged with another bucket, leading to potential duplicate results.
- Parameters:
distBits
- New system state distribution bit count
-
visitsAllBuckets
public boolean visitsAllBuckets() -
createFromDocumentSelection
public static VisitorIterator createFromDocumentSelection(String documentSelection, com.yahoo.document.BucketIdFactory idFactory, int distributionBitCount, ProgressToken progress) throws com.yahoo.document.select.parser.ParseException - Throws:
com.yahoo.document.select.parser.ParseException
-
createFromDocumentSelection
public static VisitorIterator createFromDocumentSelection(String documentSelection, com.yahoo.document.BucketIdFactory idFactory, int distributionBitCount, ProgressToken progress, int slices, int sliceId) throws com.yahoo.document.select.parser.ParseException Create a newVisitorIterator
instance based on the given document selection string.- Parameters:
documentSelection
- Document selection string used to create theVisitorIterator
instance. Depending on the characteristics of the selection, the iterator may iterate over only a small subset of the buckets or every bucket in the system. Both cases will be handled efficiently.idFactory
-BucketId
factory specifying the number of distribution bits to use et al.progress
- A uniqueProgressToken
instance which is used for maintaining the state of the iterator. Can not be shared with other iterator instances at the same time. Ifprogress
contains work done in an earlier iteration run, the iterator will pick up from where it left off- Returns:
- A new
VisitorIterator
instance - Throws:
com.yahoo.document.select.parser.ParseException
- ifdocumentSelection
fails to properly parse
-
createFromExplicitBucketSet
public static VisitorIterator createFromExplicitBucketSet(Set<com.yahoo.document.BucketId> bucketsToVisit, int distributionBitCount, ProgressToken progress) Create a newVisitorIterator
instance based on the given set of buckets. This is supported for internal use only, and is required by Synchronization. UsecreateFromDocumentSelection(java.lang.String, com.yahoo.document.BucketIdFactory, int, com.yahoo.documentapi.ProgressToken)
instead for all normal purposes.- Parameters:
bucketsToVisit
- The set of buckets that will be visiteddistributionBitCount
- Number of distribution bits to useprogress
- A unique ProgressToken instance which is used for maintaining the state of the iterator. Can not be shared with other iterator instances at the same time. Ifprogress
contains work done in an earlier iteration run, the iterator will pick up from where it left off- Returns:
- A new
VisitorIterator
instance
-