Packages

class LetterSet extends AnyRef

LetterSet represents a set of characters.

It's logically equivalent to Set[Char] but represents its contents as an array of inclusive character ranges. In the worst case this means it uses 2x as an Array[Char] normally would. In the best case it uses 4 bytes to represent 65,536 characters (this is the set containing all characters).

Some facts related to the internal representation:

  • array.length is always even
  • ranges are inclusive, non-overlapping, and stored in order
  • the order of ranges is lowest-to-highest (by start)
  • i.e. each range is [start, end] where start <= end
  • single elements are represented as [c, c]
  • all elements found in the array is are members of the set
  • elements not found in the array might also be members

This layout means we can use binary search to quickly check for membership or determine where to add new elements. If binary search "finds" a value we know it is already contained in the set. If instead it returns a negative index, we can tell whether the element is contained in a range based on whether the calculated insertion index is even (absent) or odd (present).

For example:

let array = [2, 4, 6, 6, 10, 13] searchFor(1) -> insertion index is 0 -> absent searchFor(2) -> actual index is 0 -> present searchFor(3) -> insertion index is 1 -> present searchFor(5) -> insertion index is 2 -> absent searchFor(6) -> actual-index is 2 or 3 -> present searchFor(7) -> insertion index is 4 -> absent and so on...

This class is most efficient when dealing with sets that are represented as a relatively small number of single characters plus some (potentially very large) ranges. These often come up with regular expressions, for example:

/[0-9]/ -> 10 members represented in 4 bytes /[a-zA-Z]/ -> 52 members represented in 8 bytes /[^a-zA-Z]/ -> 65,484 members represented in 12 bytes /./ -> 65,536 members represented in 4 bytes

It is less optimal for non-contiguous sets of single characters:

/[aeiou]/ -> 5 members represented in 20 bytes /[acegikm]/ -> 7 members represented in 28 bytes

Similarly, checking for set membership is O(log(n)) where n is the number of contiguous ranges. This makes the implementation less efficient than Set[Char] (where it is O(1)) unless the caller expects this n to be small relative to the number of elements contained. The other operations (e.g. union, intersection) are likely to always be competitive since LetterSet's impementations are O(n) in the number of ranges, whereas Set[Char]'s are O(n) in the number of individual elements. Finally, LetterSet's set complement operator (~) has no direct competitor in Set[Char] since that operation would be very expensive.

You could imagine building a data structure that uses a Set[Char] for individual characters and only uses a LetterSet for contiguous ranges. This approach is not used here because the complexity of going back and forth between these representations when characters are added and removed would add considerable complexity to the (already complex) implementation.

Self Type
LetterSet
Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. LetterSet
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new LetterSet(array: Array[Char])

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. def &(rhs: LetterSet): LetterSet
  4. def +(c: Char): LetterSet
  5. def -(c: Char): LetterSet
  6. def --(rhs: LetterSet): LetterSet
  7. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  8. def ^(rhs: LetterSet): LetterSet
  9. def apply(c: Char): Boolean
  10. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  11. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  12. def contains(c: Char): Boolean
  13. def containsRange(c1: Char, c2: Char): Boolean
  14. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  15. def equals(that: Any): Boolean
    Definition Classes
    LetterSet → AnyRef → Any
  16. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  17. def forall(f: (Char) ⇒ Boolean): Boolean
  18. def get(index: Int): Char
  19. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  20. lazy val hashCode: Int
    Definition Classes
    LetterSet → AnyRef → Any
  21. def intersects(rhs: LetterSet): Boolean
  22. def isEmpty: Boolean
  23. def isFull: Boolean
  24. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  25. def isSingleton: Boolean
  26. def iterator: Iterator[Char]
  27. def maxOption: Option[Char]
  28. def minOption: Option[Char]
  29. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  30. def nonEmpty: Boolean
  31. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  32. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  33. def partialCompare(rhs: LetterSet): Double
  34. def ranges: Iterator[(Char, Char)]
  35. def singleValue: Option[Char]
  36. lazy val size: Int
  37. def subsetOf(rhs: LetterSet): Boolean
  38. def supersetOf(rhs: LetterSet): Boolean
  39. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  40. def toString(): String
    Definition Classes
    LetterSet → AnyRef → Any
  41. def unary_~: LetterSet
  42. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  43. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  44. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  45. def |(rhs: LetterSet): LetterSet

Inherited from AnyRef

Inherited from Any

Ungrouped