class LetterSet extends AnyRef
LetterSet represents a set of characters.
It's logically equivalent to Set[Char] but represents its contents as an array of inclusive character ranges. In the worst case this means it uses 2x as an Array[Char] normally would. In the best case it uses 4 bytes to represent 65,536 characters (this is the set containing all characters).
Some facts related to the internal representation:
- array.length is always even
- ranges are inclusive, non-overlapping, and stored in order
- the order of ranges is lowest-to-highest (by start)
- i.e. each range is [start, end] where start <= end
- single elements are represented as [c, c]
- all elements found in the array is are members of the set
- elements not found in the array might also be members
This layout means we can use binary search to quickly check for membership or determine where to add new elements. If binary search "finds" a value we know it is already contained in the set. If instead it returns a negative index, we can tell whether the element is contained in a range based on whether the calculated insertion index is even (absent) or odd (present).
For example:
let array = [2, 4, 6, 6, 10, 13] searchFor(1) -> insertion index is 0 -> absent searchFor(2) -> actual index is 0 -> present searchFor(3) -> insertion index is 1 -> present searchFor(5) -> insertion index is 2 -> absent searchFor(6) -> actual-index is 2 or 3 -> present searchFor(7) -> insertion index is 4 -> absent and so on...
This class is most efficient when dealing with sets that are represented as a relatively small number of single characters plus some (potentially very large) ranges. These often come up with regular expressions, for example:
/[0-9]/ -> 10 members represented in 4 bytes /[a-zA-Z]/ -> 52 members represented in 8 bytes /[^a-zA-Z]/ -> 65,484 members represented in 12 bytes /./ -> 65,536 members represented in 4 bytes
It is less optimal for non-contiguous sets of single characters:
/[aeiou]/ -> 5 members represented in 20 bytes /[acegikm]/ -> 7 members represented in 28 bytes
Similarly, checking for set membership is O(log(n)) where n is the number of contiguous ranges. This makes the implementation less efficient than Set[Char] (where it is O(1)) unless the caller expects this n to be small relative to the number of elements contained. The other operations (e.g. union, intersection) are likely to always be competitive since LetterSet's impementations are O(n) in the number of ranges, whereas Set[Char]'s are O(n) in the number of individual elements. Finally, LetterSet's set complement operator (~) has no direct competitor in Set[Char] since that operation would be very expensive.
You could imagine building a data structure that uses a Set[Char] for individual characters and only uses a LetterSet for contiguous ranges. This approach is not used here because the complexity of going back and forth between these representations when characters are added and removed would add considerable complexity to the (already complex) implementation.
- Self Type
- LetterSet
- Alphabetic
- By Inheritance
- LetterSet
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
- new LetterSet(array: Array[Char])
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
- def &(rhs: LetterSet): LetterSet
- def +(c: Char): LetterSet
- def -(c: Char): LetterSet
- def --(rhs: LetterSet): LetterSet
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- def ^(rhs: LetterSet): LetterSet
- def apply(c: Char): Boolean
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
- def contains(c: Char): Boolean
- def containsRange(c1: Char, c2: Char): Boolean
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(that: Any): Boolean
- Definition Classes
- LetterSet → AnyRef → Any
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
- def forall(f: (Char) ⇒ Boolean): Boolean
- def get(index: Int): Char
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
lazy val
hashCode: Int
- Definition Classes
- LetterSet → AnyRef → Any
- def intersects(rhs: LetterSet): Boolean
- def isEmpty: Boolean
- def isFull: Boolean
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- def isSingleton: Boolean
- def iterator: Iterator[Char]
- def maxOption: Option[Char]
- def minOption: Option[Char]
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def nonEmpty: Boolean
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- def partialCompare(rhs: LetterSet): Double
- def ranges: Iterator[(Char, Char)]
- def singleValue: Option[Char]
- lazy val size: Int
- def subsetOf(rhs: LetterSet): Boolean
- def supersetOf(rhs: LetterSet): Boolean
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- LetterSet → AnyRef → Any
- def unary_~: LetterSet
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
- def |(rhs: LetterSet): LetterSet