Class FrontCodedStringList

All Implemented Interfaces:
ObjectCollection<MutableString>, ObjectIterable<MutableString>, ObjectList<MutableString>, Stack<MutableString>, Serializable, Comparable<List<? extends MutableString>>, Iterable<MutableString>, Collection<MutableString>, List<MutableString>, RandomAccess

public class FrontCodedStringList
extends AbstractObjectList<MutableString>
implements RandomAccess, Serializable
Compact storage of strings using front-coding compression (a.k.a. compression by prefix omission).

This class stores a list of strings using front-coding (a.k.a. prefix-omission) compression; the compression will be reasonable only if the list is sorted, but you could also use instances of this class just as a handy way to manage a large amount of strings. It implements an immutable ObjectList that returns the i-th string (as a MutableString) when the get(int) method is called with argument i. The returned mutable string may be freely modified.

As a commodity, this class provides a main method that reads from standard input a sequence of newline-separated strings, and writes a corresponding serialized front-coded string list.

Implementation Details

To store the list of strings, we use either a UTF-8 coded ByteArrayFrontCodedList, or a CharArrayFrontCodedList, depending on the value of the utf8 parameter at creation time. In the first case, if the strings are ASCII-oriented the resulting array will be much smaller, but access times will increase manifold, as each string must be UTF-8 decoded before being returned.

See Also:
Serialized Form