Class FrontCodedStringList
- All Implemented Interfaces:
ObjectCollection<MutableString>,ObjectIterable<MutableString>,ObjectList<MutableString>,Stack<MutableString>,Serializable,Comparable<List<? extends MutableString>>,Iterable<MutableString>,Collection<MutableString>,List<MutableString>,RandomAccess
public class FrontCodedStringList extends AbstractObjectList<MutableString> implements RandomAccess, Serializable
This class stores a list of strings using front-coding
(a.k.a. prefix-omission) compression;
the compression will be reasonable only if the list is sorted, but you could
also use instances of this class just as a handy way to manage a large
amount of strings. It implements an immutable ObjectList that returns the i-th
string (as a MutableString) when the get(int) method is
called with argument i. The returned mutable string may be freely
modified.
As a commodity, this class provides a main method that reads from standard input a sequence of newline-separated strings, and writes a corresponding serialized front-coded string list.
Implementation Details
To store the list of strings, we use either a UTF-8 coded ByteArrayFrontCodedList, or a CharArrayFrontCodedList, depending on
the value of the utf8 parameter at creation time. In the first case, if the
strings are ASCII-oriented the resulting array will be much smaller, but
access times will increase manifold, as each string must be UTF-8 decoded
before being returned.
- See Also:
- Serialized Form
-
Nested Class Summary
Nested classes/interfaces inherited from class it.unimi.dsi.fastutil.objects.AbstractObjectList
AbstractObjectList.ObjectSubList<K extends Object> -
Field Summary
Fields Modifier and Type Field Description protected ByteArrayFrontCodedListbyteFrontCodedListThe underlyingByteArrayFrontCodedList, ornull.protected CharArrayFrontCodedListcharFrontCodedListThe underlyingCharArrayFrontCodedList, ornull.static longserialVersionUIDprotected booleanutf8Whether this front-coded list is UTF-8 encoded. -
Constructor Summary
Constructors Constructor Description FrontCodedStringList(Collection<? extends CharSequence> c, int ratio, boolean utf8)Creates a new front-coded string list containing the character sequences contained in the given collection.FrontCodedStringList(Iterator<? extends CharSequence> words, int ratio, boolean utf8)Creates a new front-coded string list containing the character sequences returned by the given iterator. -
Method Summary
Modifier and Type Method Description protected static char[]byte2Char(byte[] a, char[] s)protected static intcountUTF8Chars(byte[] a)MutableStringget(int index)Returns the element at the specified position in this front-coded as a mutable string.voidget(int index, MutableString s)Returns the element at the specified position in this front-coded list by storing it in a mutable string.ObjectListIterator<MutableString>listIterator(int k)static voidmain(String[] arg)intratio()Returns the ratio of the underlying front-coded list.intsize()booleanutf8()Returns whether this front-coded string list is storing its strings as UTF-8 encoded bytes.Methods inherited from class it.unimi.dsi.fastutil.objects.AbstractObjectList
add, add, addAll, addAll, addElements, addElements, clear, compareTo, contains, ensureIndex, ensureRestrictedIndex, equals, getElements, hashCode, indexOf, iterator, lastIndexOf, listIterator, peek, pop, push, remove, removeElements, set, size, subList, top, toStringMethods inherited from class java.util.AbstractCollection
containsAll, isEmpty, remove, removeAll, retainAll, toArray, toArrayMethods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, waitMethods inherited from interface java.util.List
containsAll, isEmpty, remove, removeAll, replaceAll, retainAll, sort, spliterator, toArray, toArrayMethods inherited from interface it.unimi.dsi.fastutil.objects.ObjectList
setElements, setElements, setElements, unstableSort
-
Field Details
-
serialVersionUID
public static final long serialVersionUID- See Also:
- Constant Field Values
-
byteFrontCodedList
The underlyingByteArrayFrontCodedList, ornull. -
charFrontCodedList
The underlyingCharArrayFrontCodedList, ornull. -
utf8
protected final boolean utf8Whether this front-coded list is UTF-8 encoded.
-
-
Constructor Details
-
FrontCodedStringList
Creates a new front-coded string list containing the character sequences returned by the given iterator.- Parameters:
words- an iterator returning character sequences.ratio- the desired ratio.utf8- if true, the strings will be stored as UTF-8 byte arrays.
-
FrontCodedStringList
Creates a new front-coded string list containing the character sequences contained in the given collection.- Parameters:
c- a collection containing character sequences.ratio- the desired ratio.utf8- if true, the strings will be stored as UTF-8 byte arrays.
-
-
Method Details
-
utf8
public boolean utf8()Returns whether this front-coded string list is storing its strings as UTF-8 encoded bytes.- Returns:
- true if this front-coded string list is keeping its data as an array of UTF-8 encoded bytes.
-
ratio
public int ratio()Returns the ratio of the underlying front-coded list.- Returns:
- the ratio of the underlying front-coded list.
-
get
Returns the element at the specified position in this front-coded as a mutable string.- Specified by:
getin interfaceList<MutableString>- Parameters:
index- an index in the list.- Returns:
- a
MutableStringthat will contain the string at the specified position. The string may be freely modified.
-
get
Returns the element at the specified position in this front-coded list by storing it in a mutable string.- Parameters:
index- an index in the list.s- a mutable string that will contain the string at the specified position.
-
countUTF8Chars
protected static int countUTF8Chars(byte[] a) -
byte2Char
protected static char[] byte2Char(byte[] a, char[] s) -
listIterator
- Specified by:
listIteratorin interfaceList<MutableString>- Specified by:
listIteratorin interfaceObjectList<MutableString>- Overrides:
listIteratorin classAbstractObjectList<MutableString>
-
size
public int size()- Specified by:
sizein interfaceCollection<MutableString>- Specified by:
sizein interfaceList<MutableString>- Specified by:
sizein classAbstractCollection<MutableString>
-
main
-