Class HugeGraph

  • All Implemented Interfaces:
    BatchNodeIterable, CSRGraph, Degrees, Graph, IdMapping, NodeIterator, NodeMapping, NodePropertyContainer, RelationshipAccess, RelationshipIterator, RelationshipPredicate, RelationshipProperties

    public class HugeGraph
    extends java.lang.Object
    implements CSRGraph
    Huge Graph contains two array like data structures.

    The adjacency data is stored in a ByteArray, which is a byte[] addressable by longs indices and capable of storing about 2^46 (~ 70k bn) bytes – or 64 TiB. The bytes are stored in byte[] pages of 32 KiB size.

    The data is in the format:

    degree ~ targetId1 ~ targetId2 ~ targetIdn
    The degree is stored as a fill-sized 4 byte long int (the neo kernel api returns an int for Nodes.countAll(org.neo4j.internal.kernel.api.NodeCursor)). Every target ID is first sorted, then delta encoded, and finally written as variable-length vlongs. The delta encoding does not write the actual value but only the difference to the previous value, which plays very nice with the vlong encoding.

    The seconds data structure is a LongArray, which is a long[] addressable by longs and capable of storing about 2^43 (~9k bn) longs – or 64 TiB worth of 64 bit longs. The data is the offset address into the aforementioned adjacency array, the index is the respective source node id.

    To traverse all nodes, first access to offset from the LongArray, then read 4 bytes into the degree from the ByteArray, starting from the offset, then read degree vlongs as targetId.

    Reading the degree from the offset position not only does not require the offset array to be sorted but also allows the adjacency array to be sparse. This fact is used during the import – each thread pre-allocates a local chunk of some pages (512 KiB) and gives access to this data during import. Synchronization between threads only has to happen when a new chunk has to be pre-allocated. This is similar to what most garbage collectors do with TLAB allocations.

    See Also:
    more abount vlong, more abount TLAB allocation