Class PackedInts
- java.lang.Object
-
- org.apache.lucene.util.packed.PackedInts
-
public class PackedInts extends Object
Simplistic compression for array of unsigned long values. Each value is >= 0 and <= a specified maximum value. The values are stored as packed ints, with each value consuming a fixed number of bits.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static interface
PackedInts.Decoder
A decoder for packed integers.static interface
PackedInts.Encoder
An encoder for packed integers.static class
PackedInts.Format
A format to write packed ints.static class
PackedInts.FormatAndBits
Simple class that holds a format and a number of bits per value.static class
PackedInts.Header
Header identifying the structure of a packed integer array.static class
PackedInts.Mutable
A packed integer array that can be modified.static class
PackedInts.NullReader
APackedInts.Reader
which has all its values equal to 0 (bitsPerValue = 0).static class
PackedInts.Reader
A read-only random access array of positive integers.static interface
PackedInts.ReaderIterator
Run-once iterator interface, to decode previously saved PackedInts.static class
PackedInts.Writer
A write-once Writer.
-
Field Summary
Fields Modifier and Type Field Description static String
CODEC_NAME
static float
COMPACT
No memory overhead at all, but the returned implementation may be slow.static float
DEFAULT
At most 20% memory overhead.static int
DEFAULT_BUFFER_SIZE
Default amount of memory to use for bulk operations.static float
FAST
At most 50% memory overhead, always select a reasonably fast implementation.static float
FASTEST
At most 700% memory overhead, always select a direct implementation.static int
VERSION_BYTE_ALIGNED
static int
VERSION_CURRENT
static int
VERSION_START
-
Constructor Summary
Constructors Constructor Description PackedInts()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static int
bitsRequired(long maxValue)
Returns how many bits are required to hold values up to and including maxValuestatic void
checkVersion(int version)
Check the validity of a version number.static void
copy(PackedInts.Reader src, int srcPos, PackedInts.Mutable dest, int destPos, int len, int mem)
Copysrc[srcPos:srcPos+len]
intodest[destPos:destPos+len]
using at mostmem
bytes.static PackedInts.FormatAndBits
fastestFormatAndBits(int valueCount, int bitsPerValue, float acceptableOverheadRatio)
Try to find thePackedInts.Format
and number of bits per value that would restore from disk the fastest reader whose overhead is less thanacceptableOverheadRatio
.static PackedInts.Decoder
getDecoder(PackedInts.Format format, int version, int bitsPerValue)
Get aPackedInts.Decoder
.static PackedInts.Reader
getDirectReader(IndexInput in)
Construct a directPackedInts.Reader
from anIndexInput
.static PackedInts.Reader
getDirectReaderNoHeader(IndexInput in, PackedInts.Format format, int version, int valueCount, int bitsPerValue)
Expert: Construct a directPackedInts.Reader
from a stream without reading metadata at the beginning of the stream.static PackedInts.Reader
getDirectReaderNoHeader(IndexInput in, PackedInts.Header header)
Expert: Construct a directPackedInts.Reader
from anIndexInput
without reading metadata at the beginning of the stream.static PackedInts.Encoder
getEncoder(PackedInts.Format format, int version, int bitsPerValue)
Get anPackedInts.Encoder
.static PackedInts.Mutable
getMutable(int valueCount, int bitsPerValue, float acceptableOverheadRatio)
Create a packed integer array with the given amount of values initialized to 0.static PackedInts.Mutable
getMutable(int valueCount, int bitsPerValue, PackedInts.Format format)
Same asgetMutable(int, int, float)
with a pre-computed number of bits per value and format.static PackedInts.Reader
getReader(DataInput in)
Restore aPackedInts.Reader
from a stream.static PackedInts.ReaderIterator
getReaderIterator(DataInput in, int mem)
Retrieve PackedInts as aPackedInts.ReaderIterator
static PackedInts.ReaderIterator
getReaderIteratorNoHeader(DataInput in, PackedInts.Format format, int version, int valueCount, int bitsPerValue, int mem)
Expert: Restore aPackedInts.ReaderIterator
from a stream without reading metadata at the beginning of the stream.static PackedInts.Reader
getReaderNoHeader(DataInput in, PackedInts.Format format, int version, int valueCount, int bitsPerValue)
Expert: Restore aPackedInts.Reader
from a stream without reading metadata at the beginning of the stream.static PackedInts.Reader
getReaderNoHeader(DataInput in, PackedInts.Header header)
Expert: Restore aPackedInts.Reader
from a stream without reading metadata at the beginning of the stream.static PackedInts.Writer
getWriter(DataOutput out, int valueCount, int bitsPerValue, float acceptableOverheadRatio)
Create a packed integer array writer for the given output, format, value count, and number of bits per value.static PackedInts.Writer
getWriterNoHeader(DataOutput out, PackedInts.Format format, int valueCount, int bitsPerValue, int mem)
Expert: Create a packed integer array writer for the given output, format, value count, and number of bits per value.static long
maxValue(int bitsPerValue)
Calculates the maximum unsigned long that can be expressed with the given number of bits.static PackedInts.Header
readHeader(DataInput in)
Expert: reads only the metadata from a stream.
-
-
-
Field Detail
-
FASTEST
public static final float FASTEST
At most 700% memory overhead, always select a direct implementation.- See Also:
- Constant Field Values
-
FAST
public static final float FAST
At most 50% memory overhead, always select a reasonably fast implementation.- See Also:
- Constant Field Values
-
DEFAULT
public static final float DEFAULT
At most 20% memory overhead.- See Also:
- Constant Field Values
-
COMPACT
public static final float COMPACT
No memory overhead at all, but the returned implementation may be slow.- See Also:
- Constant Field Values
-
DEFAULT_BUFFER_SIZE
public static final int DEFAULT_BUFFER_SIZE
Default amount of memory to use for bulk operations.- See Also:
- Constant Field Values
-
CODEC_NAME
public static final String CODEC_NAME
- See Also:
- Constant Field Values
-
VERSION_START
public static final int VERSION_START
- See Also:
- Constant Field Values
-
VERSION_BYTE_ALIGNED
public static final int VERSION_BYTE_ALIGNED
- See Also:
- Constant Field Values
-
VERSION_CURRENT
public static final int VERSION_CURRENT
- See Also:
- Constant Field Values
-
-
Method Detail
-
checkVersion
public static void checkVersion(int version)
Check the validity of a version number.
-
fastestFormatAndBits
public static PackedInts.FormatAndBits fastestFormatAndBits(int valueCount, int bitsPerValue, float acceptableOverheadRatio)
Try to find thePackedInts.Format
and number of bits per value that would restore from disk the fastest reader whose overhead is less thanacceptableOverheadRatio
.The
acceptableOverheadRatio
parameter makes sense for random-accessPackedInts.Reader
s. In case you only plan to perform sequential access on this stream later on, you should probably useCOMPACT
.If you don't know how many values you are going to write, use
valueCount = -1
.
-
getDecoder
public static PackedInts.Decoder getDecoder(PackedInts.Format format, int version, int bitsPerValue)
Get aPackedInts.Decoder
.- Parameters:
format
- the format used to store packed intsversion
- the compatibility versionbitsPerValue
- the number of bits per value- Returns:
- a decoder
-
getEncoder
public static PackedInts.Encoder getEncoder(PackedInts.Format format, int version, int bitsPerValue)
Get anPackedInts.Encoder
.- Parameters:
format
- the format used to store packed intsversion
- the compatibility versionbitsPerValue
- the number of bits per value- Returns:
- an encoder
-
getReaderNoHeader
public static PackedInts.Reader getReaderNoHeader(DataInput in, PackedInts.Format format, int version, int valueCount, int bitsPerValue) throws IOException
Expert: Restore aPackedInts.Reader
from a stream without reading metadata at the beginning of the stream. This method is useful to restore data from streams which have been created usinggetWriterNoHeader(DataOutput, Format, int, int, int)
.- Parameters:
in
- the stream to read data from, positioned at the beginning of the packed valuesformat
- the format used to serializeversion
- the version used to serialize the datavalueCount
- how many values the stream holdsbitsPerValue
- the number of bits per value- Returns:
- a Reader
- Throws:
IOException
- If there is a low-level I/O error- See Also:
getWriterNoHeader(DataOutput, Format, int, int, int)
-
getReaderNoHeader
public static PackedInts.Reader getReaderNoHeader(DataInput in, PackedInts.Header header) throws IOException
Expert: Restore aPackedInts.Reader
from a stream without reading metadata at the beginning of the stream. This method is useful to restore data when metadata has been previously read usingreadHeader(DataInput)
.- Parameters:
in
- the stream to read data from, positioned at the beginning of the packed valuesheader
- metadata result fromreadHeader()
- Returns:
- a Reader
- Throws:
IOException
- If there is a low-level I/O error- See Also:
readHeader(DataInput)
-
getReader
public static PackedInts.Reader getReader(DataInput in) throws IOException
Restore aPackedInts.Reader
from a stream.- Parameters:
in
- the stream to read data from- Returns:
- a Reader
- Throws:
IOException
- If there is a low-level I/O error
-
getReaderIteratorNoHeader
public static PackedInts.ReaderIterator getReaderIteratorNoHeader(DataInput in, PackedInts.Format format, int version, int valueCount, int bitsPerValue, int mem)
Expert: Restore aPackedInts.ReaderIterator
from a stream without reading metadata at the beginning of the stream. This method is useful to restore data from streams which have been created usinggetWriterNoHeader(DataOutput, Format, int, int, int)
.- Parameters:
in
- the stream to read data from, positioned at the beginning of the packed valuesformat
- the format used to serializeversion
- the version used to serialize the datavalueCount
- how many values the stream holdsbitsPerValue
- the number of bits per valuemem
- how much memory the iterator is allowed to use to read-ahead (likely to speed up iteration)- Returns:
- a ReaderIterator
- See Also:
getWriterNoHeader(DataOutput, Format, int, int, int)
-
getReaderIterator
public static PackedInts.ReaderIterator getReaderIterator(DataInput in, int mem) throws IOException
Retrieve PackedInts as aPackedInts.ReaderIterator
- Parameters:
in
- positioned at the beginning of a stored packed int structure.mem
- how much memory the iterator is allowed to use to read-ahead (likely to speed up iteration)- Returns:
- an iterator to access the values
- Throws:
IOException
- if the structure could not be retrieved.
-
getDirectReaderNoHeader
public static PackedInts.Reader getDirectReaderNoHeader(IndexInput in, PackedInts.Format format, int version, int valueCount, int bitsPerValue)
Expert: Construct a directPackedInts.Reader
from a stream without reading metadata at the beginning of the stream. This method is useful to restore data from streams which have been created usinggetWriterNoHeader(DataOutput, Format, int, int, int)
.The returned reader will have very little memory overhead, but every call to
NumericDocValues.get(int)
is likely to perform a disk seek.- Parameters:
in
- the stream to read data fromformat
- the format used to serializeversion
- the version used to serialize the datavalueCount
- how many values the stream holdsbitsPerValue
- the number of bits per value- Returns:
- a direct Reader
-
getDirectReaderNoHeader
public static PackedInts.Reader getDirectReaderNoHeader(IndexInput in, PackedInts.Header header) throws IOException
Expert: Construct a directPackedInts.Reader
from anIndexInput
without reading metadata at the beginning of the stream. This method is useful to restore data when metadata has been previously read usingreadHeader(DataInput)
.- Parameters:
in
- the stream to read data from, positioned at the beginning of the packed valuesheader
- metadata result fromreadHeader()
- Returns:
- a Reader
- Throws:
IOException
- If there is a low-level I/O error- See Also:
readHeader(DataInput)
-
getDirectReader
public static PackedInts.Reader getDirectReader(IndexInput in) throws IOException
Construct a directPackedInts.Reader
from anIndexInput
. This method is useful to restore data from streams which have been created usinggetWriter(DataOutput, int, int, float)
.The returned reader will have very little memory overhead, but every call to
NumericDocValues.get(int)
is likely to perform a disk seek.- Parameters:
in
- the stream to read data from- Returns:
- a direct Reader
- Throws:
IOException
- If there is a low-level I/O error
-
getMutable
public static PackedInts.Mutable getMutable(int valueCount, int bitsPerValue, float acceptableOverheadRatio)
Create a packed integer array with the given amount of values initialized to 0. the valueCount and the bitsPerValue cannot be changed after creation. All Mutables known by this factory are kept fully in RAM.Positive values of
acceptableOverheadRatio
will trade space for speed by selecting a faster but potentially less memory-efficient implementation. AnacceptableOverheadRatio
ofCOMPACT
will make sure that the most memory-efficient implementation is selected whereasFASTEST
will make sure that the fastest implementation is selected.- Parameters:
valueCount
- the number of elementsbitsPerValue
- the number of bits available for any given valueacceptableOverheadRatio
- an acceptable overhead ratio per value- Returns:
- a mutable packed integer array
-
getMutable
public static PackedInts.Mutable getMutable(int valueCount, int bitsPerValue, PackedInts.Format format)
Same asgetMutable(int, int, float)
with a pre-computed number of bits per value and format.
-
getWriterNoHeader
public static PackedInts.Writer getWriterNoHeader(DataOutput out, PackedInts.Format format, int valueCount, int bitsPerValue, int mem)
Expert: Create a packed integer array writer for the given output, format, value count, and number of bits per value.The resulting stream will be long-aligned. This means that depending on the format which is used, up to 63 bits will be wasted. An easy way to make sure that no space is lost is to always use a
valueCount
that is a multiple of 64.This method does not write any metadata to the stream, meaning that it is your responsibility to store it somewhere else in order to be able to recover data from the stream later on:
format
(usingPackedInts.Format.getId()
),valueCount
,bitsPerValue
,VERSION_CURRENT
.
It is possible to start writing values without knowing how many of them you are actually going to write. To do this, just pass
-1
asvalueCount
. On the other hand, for any positive value ofvalueCount
, the returned writer will make sure that you don't write more values than expected and pad the end of stream with zeros in case you have written less thanvalueCount
when callingPackedInts.Writer.finish()
.The
mem
parameter lets you control how much memory can be used to buffer changes in memory before flushing to disk. High values ofmem
are likely to improve throughput. On the other hand, if speed is not that important to you, a value of0
will use as little memory as possible and should already offer reasonable throughput.- Parameters:
out
- the data outputformat
- the format to use to serialize the valuesvalueCount
- the number of valuesbitsPerValue
- the number of bits per valuemem
- how much memory (in bytes) can be used to speed up serialization- Returns:
- a Writer
- See Also:
getReaderIteratorNoHeader(DataInput, Format, int, int, int, int)
,getReaderNoHeader(DataInput, Format, int, int, int)
-
getWriter
public static PackedInts.Writer getWriter(DataOutput out, int valueCount, int bitsPerValue, float acceptableOverheadRatio) throws IOException
Create a packed integer array writer for the given output, format, value count, and number of bits per value.The resulting stream will be long-aligned. This means that depending on the format which is used under the hoods, up to 63 bits will be wasted. An easy way to make sure that no space is lost is to always use a
valueCount
that is a multiple of 64.This method writes metadata to the stream, so that the resulting stream is sufficient to restore a
PackedInts.Reader
from it. You don't need to trackvalueCount
orbitsPerValue
by yourself. In case this is a problem, you should probably look atgetWriterNoHeader(DataOutput, Format, int, int, int)
.The
acceptableOverheadRatio
parameter controls how readers that will be restored from this stream trade space for speed by selecting a faster but potentially less memory-efficient implementation. AnacceptableOverheadRatio
ofCOMPACT
will make sure that the most memory-efficient implementation is selected whereasFASTEST
will make sure that the fastest implementation is selected. In case you are only interested in reading this stream sequentially later on, you should probably useCOMPACT
.- Parameters:
out
- the data outputvalueCount
- the number of valuesbitsPerValue
- the number of bits per valueacceptableOverheadRatio
- an acceptable overhead ratio per value- Returns:
- a Writer
- Throws:
IOException
- If there is a low-level I/O error
-
bitsRequired
public static int bitsRequired(long maxValue)
Returns how many bits are required to hold values up to and including maxValue- Parameters:
maxValue
- the maximum value that should be representable.- Returns:
- the amount of bits needed to represent values from 0 to maxValue.
-
maxValue
public static long maxValue(int bitsPerValue)
Calculates the maximum unsigned long that can be expressed with the given number of bits.- Parameters:
bitsPerValue
- the number of bits available for any given value.- Returns:
- the maximum value for the given bits.
-
copy
public static void copy(PackedInts.Reader src, int srcPos, PackedInts.Mutable dest, int destPos, int len, int mem)
Copysrc[srcPos:srcPos+len]
intodest[destPos:destPos+len]
using at mostmem
bytes.
-
readHeader
public static PackedInts.Header readHeader(DataInput in) throws IOException
Expert: reads only the metadata from a stream. This is useful to later restore a stream or open a direct reader viagetReaderNoHeader(DataInput, Header)
orgetDirectReaderNoHeader(IndexInput, Header)
.- Parameters:
in
- the stream to read data- Returns:
- packed integer metadata.
- Throws:
IOException
- If there is a low-level I/O error- See Also:
getReaderNoHeader(DataInput, Header)
,getDirectReaderNoHeader(IndexInput, Header)
-
-