Class CharsToNameCanonicalizer
For optimal performance, usage pattern should be one where matches
should be very common (especially after "warm-up"), and as with most hash-based
maps/sets, that hash codes are uniformly distributed. Also, collisions
are slightly more expensive than with HashMap or HashSet, since hash codes
are not used in resolving collisions; that is, equals() comparison is
done with all symbols in same bucket index.
Finally, rehashing is also more expensive, as hash codes are not
stored; rehashing requires all entries' hash codes to be recalculated.
Reason for not storing hash codes is reduced memory usage, hoping
for better memory locality.
Usual usage pattern is to create a single "master" instance, and either use that instance in sequential fashion, or to create derived "child" instances, which after use, are asked to return possible symbol additions to master instance. In either case benefit is that symbol table gets initialized so that further uses are more efficient, as eventually all symbols needed will already be in symbol table. At that point no more Symbol String allocations are needed, nor changes to symbol table itself.
Note that while individual SymbolTable instances are NOT thread-safe (much like generic collection classes), concurrently used "child" instances can be freely used without synchronization. However, using master table concurrently with child instances can only be done if access to master instance is read-only (i.e. no modifications done).
-
Field Summary
Fields -
Method Summary
Modifier and TypeMethodDescriptionint
_hashToIndex
(int rawHash) Helper method that takes in a "raw" hash value, shuffles it as necessary, and truncates to be used as the index.int
Method for checking number of primary hash buckets this symbol table uses.int
calcHash
(char[] buffer, int start, int len) Implementation of a hashing method for variable length Strings.int
int
Method mostly needed by unit tests; calculates number of entries that are in collision list.static CharsToNameCanonicalizer
Deprecated.static CharsToNameCanonicalizer
createRoot
(int seed) Deprecated.Since 2.16 usecreateRoot(TokenStreamFactory)
insteadstatic CharsToNameCanonicalizer
createRoot
(TokenStreamFactory owner) Method called to create root canonicalizer for aJsonFactory
instance.static CharsToNameCanonicalizer
createRoot
(TokenStreamFactory owner, int seed) findSymbol
(char[] buffer, int start, int len, int h) int
hashSeed()
"Factory" method; will create a new child instance of this symbol table.makeChild
(int flags) Deprecated.Since 2.16 usemakeChild()
instead.int
Method mostly needed by unit tests; calculates length of the longest collision chain.boolean
void
release()
Method called by the using code to indicate it is done with this instance.int
size()
-
Field Details
-
HASH_MULT
public static final int HASH_MULT- See Also:
-
-
Method Details
-
createRoot
Deprecated.Since 2.16 usecreateRoot(TokenStreamFactory)
instead- Returns:
- Root instance to use for constructing new child instances
-
createRoot
Deprecated.Since 2.16 usecreateRoot(TokenStreamFactory)
instead- Parameters:
seed
- Seed for hash value calculation- Returns:
- Root instance to use for constructing new child instances
-
createRoot
Method called to create root canonicalizer for aJsonFactory
instance. Root instance is never used directly; its main use is for storing and sharing underlying symbol arrays as needed.- Parameters:
owner
- Factory that will use the root instance; used for accessing configuration- Returns:
- Root instance to use for constructing new child instances
-
createRoot
-
makeChild
"Factory" method; will create a new child instance of this symbol table. It will be a copy-on-write instance, ie. it will only use read-only copy of parent's data, but when changes are needed, a copy will be created.Note: while this method is synchronized, it is generally not safe to both use makeChild/mergeChild, AND to use instance actively. Instead, a separate 'root' instance should be used on which only makeChild/mergeChild are called, but instance itself is not used as a symbol table.
- Returns:
- Actual canonicalizer instance that can be used by a parser
-
makeChild
Deprecated.Since 2.16 usemakeChild()
instead.- Parameters:
flags
- Configuration flags (ignored)- Returns:
- Actual canonicalizer instance that can be used by a parser
-
release
public void release()Method called by the using code to indicate it is done with this instance. This lets instance merge accumulated changes into parent (if need be), safely and efficiently, and without calling code having to know about parent information. -
size
public int size()- Returns:
- Number of symbol entries contained by this canonicalizer instance
-
bucketCount
public int bucketCount()Method for checking number of primary hash buckets this symbol table uses.- Returns:
- number of primary slots table has currently
-
maybeDirty
public boolean maybeDirty() -
hashSeed
public int hashSeed() -
collisionCount
public int collisionCount()Method mostly needed by unit tests; calculates number of entries that are in collision list. Value can be at most (size()
- 1), but should usually be much lower, ideally 0.- Returns:
- Number of collisions in the primary hash area
- Since:
- 2.1
-
maxCollisionLength
public int maxCollisionLength()Method mostly needed by unit tests; calculates length of the longest collision chain. This should typically be a low number, but may be up tosize()
- 1 in the pathological case- Returns:
- Length of the collision chain
- Since:
- 2.1
-
findSymbol
- Throws:
IOException
-
_hashToIndex
public int _hashToIndex(int rawHash) Helper method that takes in a "raw" hash value, shuffles it as necessary, and truncates to be used as the index.- Parameters:
rawHash
- Raw hash value to use for calculating index- Returns:
- Index value calculated
-
calcHash
public int calcHash(char[] buffer, int start, int len) Implementation of a hashing method for variable length Strings. Most of the time intention is that this calculation is done by caller during parsing, not here; however, sometimes it needs to be done for parsed "String" too.- Parameters:
buffer
- Input buffer that contains name to decodestart
- Pointer to the first character of the namelen
- Length of String; has to be at least 1 (caller guarantees)- Returns:
- Hash code calculated
-
calcHash
-
createRoot(TokenStreamFactory)
instead