edu.internet2.middleware.grouperClientExt.com.fasterxml.jackson.core.sym.ByteQuadsCanonicalizer

public final class ByteQuadsCanonicalizer extends Object

Replacement for BytesToNameCanonicalizer which aims at more localized memory access due to flattening of name quad data. Performance improvement modest for simple JSON document data binding (maybe 3%), but should help more for larger symbol tables, or for binary formats like Smile.

Hash area is divided into 4 sections:

Primary area (1/2 of total size), direct match from hash (LSB)
Secondary area (1/4 of total size), match from hash (LSB) >> 1
Tertiary area (1/8 of total size), match from hash (LSB) >> 2
Spill-over area (remaining 1/8) with linear scan, insertion order

and within every area, entries are 4 ints, where 1 - 3 ints contain 1 - 12 UTF-8 encoded bytes of name (null-padded), and last int is offset in _names that contains actual name Strings.

Since:: 2.6

Field Summary

Fields

Modifier and Type

Field

Description

protected int

_count

Total number of Strings in the symbol table; only used for child tables.

protected final boolean

_failOnDoS

Flag that indicates whether we should throw an exception if enough hash collisions are detected (true); or just worked around (false).

protected int[]

_hashArea

Primary hash information area: consists of 2 * _hashSize entries of 16 bytes (4 ints), arranged in a cascading lookup structure (details of which may be tweaked depending on expected rates of collisions).

protected boolean

_hashShared

Flag that indicates whether underlying data structures for the main hash area are shared or not.

protected int

_hashSize

Number of slots for primary entries within _hashArea; which is at most 1/8 of actual size of the underlying array (4-int slots, primary covers only half of the area; plus, additional area for longer symbols after hash area).

protected final boolean

_intern

Whether canonical symbol Strings are to be intern()ed before added to the table or not.

protected int

_longNameOffset

Offset within _hashArea that follows main slots and contains quads for longer names (13 bytes or longer), and points to the first available int that may be used for appending quads of the next long name.

protected String[]

_names

Array that contains String instances matching entries in _hashArea.

protected final ByteQuadsCanonicalizer

_parent

Reference to the root symbol table, for child tables, so that they can merge table information back as necessary.

protected int

_secondaryStart

Offset within _hashArea where secondary entries start

protected final int

_seed

Seed value we use as the base to make hash codes non-static between different runs, but still stable for lifetime of a single symbol table instance.

protected int

_spilloverEnd

Pointer to the offset within spill-over area where there is room for more spilled over entries (if any).

protected final AtomicReference<edu.internet2.middleware.grouperClientExt.com.fasterxml.jackson.core.sym.ByteQuadsCanonicalizer.TableInfo>

_tableInfo

Member that is only used by the root table instance: root passes immutable state info child instances, and children may return new state if they add entries to the table.

protected int

_tertiaryShift

Constant that determines size of buckets for tertiary entries: 1 << _tertiaryShift is the size, and shift value is also used for translating from primary offset into tertiary bucket (shift right by 4 + _tertiaryShift).

protected int

_tertiaryStart

Offset within _hashArea where tertiary entries start

protected static final int

MAX_ENTRIES_FOR_REUSE

Let's only share reasonably sized symbol tables.
Method Summary

Modifier and Type

Method

Description

protected void

_reportTooManyCollisions()

String

addName(String name, int q1)

String

addName(String name, int[] q, int qlen)

String

addName(String name, int q1, int q2)

String

addName(String name, int q1, int q2, int q3)

int

bucketCount()

int

calcHash(int q1)

int

calcHash(int[] q, int qlen)

int

calcHash(int q1, int q2)

int

calcHash(int q1, int q2, int q3)

static ByteQuadsCanonicalizer

createRoot()

Factory method to call to create a symbol table instance with a randomized seed value.

protected static ByteQuadsCanonicalizer

createRoot(int seed)

String

findName(int q1)

String

findName(int[] q, int qlen)

String

findName(int q1, int q2)

String

findName(int q1, int q2, int q3)

int

hashSeed()

boolean

isCanonicalizing()

ByteQuadsCanonicalizer

makeChild(int flags)

Factory method used to create actual symbol table instance to use for parsing.

ByteQuadsCanonicalizer

makeChildOrPlaceholder(int flags)

Method similar to makeChild(int) but one that only creates real instance of JsonFactory.Feature.CANONICALIZE_FIELD_NAMES is enabled: otherwise a "bogus" instance is created.

boolean

maybeDirty()

Method called to check to quickly see if a child symbol table may have gotten additional entries.

int

primaryCount()

Method mostly needed by unit tests; calculates number of entries that are in the primary slot set.

void

release()

Method called by the using code to indicate it is done with this instance.

int

secondaryCount()

Method mostly needed by unit tests; calculates number of entries in secondary buckets

int

size()

int

spilloverCount()

Method mostly needed by unit tests; calculates number of entries in shared spill-over area

int

tertiaryCount()

Method mostly needed by unit tests; calculates number of entries in tertiary buckets

String

toString()

int

totalCount()

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

Field Details
- MAX_ENTRIES_FOR_REUSE
  
  protected static final int MAX_ENTRIES_FOR_REUSE
  
  Let's only share reasonably sized symbol tables. Max size set to 3/4 of 8k; this corresponds to 256k main hash index. This should allow for enough distinct names for almost any case, while preventing ballooning for cases where names are unique (or close thereof).
  See Also:
  
  Constant Field Values
- _parent
  
  protected final ByteQuadsCanonicalizer _parent
  
  Reference to the root symbol table, for child tables, so that they can merge table information back as necessary.
- _tableInfo
  
  protected final AtomicReference<edu.internet2.middleware.grouperClientExt.com.fasterxml.jackson.core.sym.ByteQuadsCanonicalizer.TableInfo> _tableInfo
  
  Member that is only used by the root table instance: root passes immutable state info child instances, and children may return new state if they add entries to the table. Child tables do NOT use the reference.
- _seed
  
  protected final int _seed
  
  Seed value we use as the base to make hash codes non-static between different runs, but still stable for lifetime of a single symbol table instance. This is done for security reasons, to avoid potential DoS attack via hash collisions.
- _intern
  
  protected final boolean _intern
  
  Whether canonical symbol Strings are to be intern()ed before added to the table or not.
  NOTE: non-final to allow disabling intern()ing in case of excessive collisions.
- _failOnDoS
  
  protected final boolean _failOnDoS
  
  Flag that indicates whether we should throw an exception if enough hash collisions are detected (true); or just worked around (false).
  
  Since:
  
  2.4
- _hashArea
  
  protected int[] _hashArea
  
  Primary hash information area: consists of 2 * _hashSize entries of 16 bytes (4 ints), arranged in a cascading lookup structure (details of which may be tweaked depending on expected rates of collisions).
- _hashSize
  
  protected int _hashSize
  
  Number of slots for primary entries within _hashArea; which is at most 1/8 of actual size of the underlying array (4-int slots, primary covers only half of the area; plus, additional area for longer symbols after hash area).
- _secondaryStart
  
  protected int _secondaryStart
  
  Offset within _hashArea where secondary entries start
- _tertiaryStart
  
  protected int _tertiaryStart
  
  Offset within _hashArea where tertiary entries start
- _tertiaryShift
  
  protected int _tertiaryShift
  
  Constant that determines size of buckets for tertiary entries: 1 << _tertiaryShift is the size, and shift value is also used for translating from primary offset into tertiary bucket (shift right by 4 + _tertiaryShift).
  Default value is 2, for buckets of 4 slots; grows bigger with bigger table sizes.
- _count
  
  protected int _count
  
  Total number of Strings in the symbol table; only used for child tables.
- _names
  
  protected String[] _names
  
  Array that contains String instances matching entries in _hashArea. Contains nulls for unused entries. Note that this size is twice that of _hashArea
- _spilloverEnd
  
  protected int _spilloverEnd
  
  Pointer to the offset within spill-over area where there is room for more spilled over entries (if any). Spill over area is within fixed-size portion of _hashArea.
- _longNameOffset
  
  protected int _longNameOffset
  
  Offset within _hashArea that follows main slots and contains quads for longer names (13 bytes or longer), and points to the first available int that may be used for appending quads of the next long name. Note that long name area follows immediately after the fixed-size main hash area (_hashArea).
- _hashShared
  
  protected boolean _hashShared
  
  Flag that indicates whether underlying data structures for the main hash area are shared or not. If they are, then they need to be handled in copy-on-write way, i.e. if they need to be modified, a copy needs to be made first; at this point it will not be shared any more, and can be modified.
  This flag needs to be checked both when adding new main entries, and when adding new collision list queues (i.e. creating a new collision list head entry)
Method Details
- createRoot
  
  public static ByteQuadsCanonicalizer createRoot()
  
  Factory method to call to create a symbol table instance with a randomized seed value.
  
  Returns:
  
  Root instance to use for constructing new child instances
- createRoot
  
  protected static ByteQuadsCanonicalizer createRoot(int seed)
- makeChild
  
  public ByteQuadsCanonicalizer makeChild(int flags)
  
  Factory method used to create actual symbol table instance to use for parsing.
  
  Parameters:
  
  flags - Bit flags of active JsonFactory.Features enabled.
  
  Returns:
  
  Actual canonicalizer instance that can be used by a parser
- makeChildOrPlaceholder
  
  public ByteQuadsCanonicalizer makeChildOrPlaceholder(int flags)
  
  Method similar to makeChild(int) but one that only creates real instance of JsonFactory.Feature.CANONICALIZE_FIELD_NAMES is enabled: otherwise a "bogus" instance is created.
  
  Parameters:
  
  flags - Bit flags of active JsonFactory.Features enabled.
  
  Returns:
  
  Actual canonicalizer instance that can be used by a parser if (and only if) canonicalization is enabled; otherwise a non-null "placeholder" instance.
  
  Since:
  
  2.13
- release
  
  public void release()
  
  Method called by the using code to indicate it is done with this instance. This lets instance merge accumulated changes into parent (if need be), safely and efficiently, and without calling code having to know about parent information.
- size
  
  public int size()
  
  Returns:
  
  Number of symbol entries contained by this canonicalizer instance
- bucketCount
  
  public int bucketCount()
  
  Returns:
  
  number of primary slots table has currently
- maybeDirty
  
  public boolean maybeDirty()
  
  Method called to check to quickly see if a child symbol table may have gotten additional entries. Used for checking to see if a child table should be merged into shared table.
  
  Returns:
  
  Whether main hash area has been modified
- hashSeed
  
  public int hashSeed()
- isCanonicalizing
  
  public boolean isCanonicalizing()
  
  Returns:
  
  True for "real", canonicalizing child tables; false for root table as well as placeholder "child" tables.
  
  Since:
  
  2.13
- primaryCount
  
  public int primaryCount()
  
  Method mostly needed by unit tests; calculates number of entries that are in the primary slot set. These are "perfect" entries, accessible with a single lookup
  
  Returns:
  
  Number of entries in the primary hash area
- secondaryCount
  
  public int secondaryCount()
  
  Method mostly needed by unit tests; calculates number of entries in secondary buckets
  
  Returns:
  
  Number of entries in the secondary hash area
- tertiaryCount
  
  public int tertiaryCount()
  
  Method mostly needed by unit tests; calculates number of entries in tertiary buckets
  
  Returns:
  
  Number of entries in the tertiary hash area
- spilloverCount
  
  public int spilloverCount()
  
  Method mostly needed by unit tests; calculates number of entries in shared spill-over area
  
  Returns:
  
  Number of entries in the linear spill-over areay
- totalCount
  
  public int totalCount()
- toString
  
  public String toString()
  
  Overrides:
  
  toString in class Object
- findName
  
  public String findName(int q1)
- findName
  
  public String findName(int q1, int q2)
- findName
  
  public String findName(int q1, int q2, int q3)
- findName
  
  public String findName(int[] q, int qlen)
- addName
  
  public String addName(String name, int q1)
- addName
  
  public String addName(String name, int q1, int q2)
- addName
  
  public String addName(String name, int q1, int q2, int q3)
- addName
  
  public String addName(String name, int[] q, int qlen)
- calcHash
  
  public int calcHash(int q1)
- calcHash
  
  public int calcHash(int q1, int q2)
- calcHash
  
  public int calcHash(int q1, int q2, int q3)
- calcHash
  
  public int calcHash(int[] q, int qlen)
- _reportTooManyCollisions
  
  protected void _reportTooManyCollisions()

Class ByteQuadsCanonicalizer

Field Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

MAX_ENTRIES_FOR_REUSE

_parent

_tableInfo

_seed

_intern

_failOnDoS

_hashArea

_hashSize

_secondaryStart

_tertiaryStart

_tertiaryShift

_count

_names

_spilloverEnd

_longNameOffset

_hashShared

Method Details

createRoot

createRoot

makeChild

makeChildOrPlaceholder

release

size

bucketCount

maybeDirty

hashSeed

isCanonicalizing

primaryCount

secondaryCount

tertiaryCount

spilloverCount

totalCount

toString

findName

findName

findName

findName

addName

addName

addName

addName

calcHash

calcHash

calcHash

calcHash

_reportTooManyCollisions