scala.util.matching

Regex

class Regex extends Serializable

This class provides methods for creating and using regular expressions. It is based on the regular expressions of the JDK since 1.4.

Its main goal is to extract strings that match a pattern, or the subgroups that make it up. For that reason, it is usually used with for comprehensions and matching (see methods for examples).

A Regex is created from a java.lang.String representation of the regular expression pattern1. That pattern is compiled during construction, so frequently used patterns should be declared outside loops if performance is of concern. Possibly, they might be declared on a companion object, so that they need only to be initialized once.

The canonical way of creating regex patterns is by using the method r, provided on java.lang.String through an implicit conversion into scala.collection.immutable.WrappedString. Using triple quotes to write these strings avoids having to quote the backslash character (\).

Using the constructor directly, on the other hand, makes it possible to declare names for subgroups in the pattern.

For example, both declarations below generate the same regex, but the second one associate names with the subgroups.

val dateP1 = """(\d\d\d\d)-(\d\d)-(\d\d)""".r
val dateP2 = new scala.util.matching.Regex("""(\d\d\d\d)-(\d\d)-(\d\d)""", "year", "month", "day")

There are two ways of using a Regex to find a pattern: calling methods on Regex, such as findFirstIn or findAllIn, or using it as an extractor in a pattern match.

Note, however, that when Regex is used as an extractor in a pattern match, it only succeeds if the whole text can be matched. For this reason, one usually calls a method to find the matching substrings, and then use it as an extractor to break match into subgroups.

As an example, the above patterns can be used like this:

val dateP1(year, month, day) = "2011-07-15"

// val dateP1(year, month, day) = "Date 2011-07-15" // throws an exception at runtime

val copyright: String = dateP1 findFirstIn "Date of this document: 2011-07-15" match {
  case Some(dateP1(year, month, day)) => "Copyright "+year
  case None                           => "No copyright"
}

val copyright: Option[String] = for {
  dateP1(year, month, day) <- dateP1 findFirstIn "Last modified 2011-07-15"
} yield year

def getYears(text: String): Iterator[String] = for (dateP1(year, _, _) <- dateP1 findAllIn text) yield year
def getFirstDay(text: String): Option[String] = for (m <- dateP2 findFirstMatchIn text) yield m group "day"

Regex does not provide a method that returns a scala.Boolean. One can use java.lang.String matches method, or, if Regex is preferred, either ignore the return value or test the Option for emptyness. For example:

def hasDate(text: String): Boolean = (dateP1 findFirstIn text).nonEmpty
def printLinesWithDates(lines: Traversable[String]) {
  lines foreach { line =>
    dateP1 findFirstIn line foreach { _ => println(line) }
  }
}

There are also methods that can be used to replace the patterns on a text. The substitutions can be simple replacements, or more complex functions. For example:

val months = Map( 1 -> "Jan", 2 -> "Feb", 3 -> "Mar",
                  4 -> "Apr", 5 -> "May", 6 -> "Jun",
                  7 -> "Jul", 8 -> "Aug", 9 -> "Sep",
                  10 -> "Oct", 11 -> "Nov", 12 -> "Dec")

import scala.util.matching.Regex.Match
def reformatDate(text: String) = dateP2 replaceAllIn ( text, (m: Match) =>
  "%s %s, %s" format (months(m group "month" toInt), m group "day", m group "year")
)

You can use special pattern syntax constructs like (?idmsux-idmsux)¹ to switch various regex compilation options like CASE_INSENSITIVE or UNICODE_CASE.

Self Type
Regex
Annotations
@SerialVersionUID()
Source
Regex.scala
Version

1.1, 29/01/2008

Note

¹ A detailed description is available in java.util.regex.Pattern.

See also

java.util.regex.Pattern

Linear Supertypes
Serializable, java.io.Serializable, AnyRef, Any
Known Subclasses
Type Hierarchy Learn more about scaladoc diagrams
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. Regex
  2. Serializable
  3. Serializable
  4. AnyRef
  5. Any
Implicitly
  1. by StringAdd
  2. by StringFormat
  3. by Ensuring
  4. by ArrowAssoc
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Instance Constructors

  1. new Regex(regex: String, groupNames: String*)

    regex

    A string representing a regular expression

    groupNames

    A mapping from names to indices in capture groups

Value Members

  1. final def !=(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  2. final def !=(arg0: Any): Boolean

    Definition Classes
    Any
  3. final def ##(): Int

    Definition Classes
    AnyRef → Any
  4. def +(other: String): String

    Implicit information
    This member is added by an implicit conversion from Regex to StringAdd[Regex] performed by method StringAdd in scala.Predef.
    Definition Classes
    StringAdd
  5. def ->[B](y: B): (Regex, B)

    Implicit information
    This member is added by an implicit conversion from Regex to ArrowAssoc[Regex] performed by method ArrowAssoc in scala.Predef.
    Definition Classes
    ArrowAssoc
    Annotations
    @inline()
  6. final def ==(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  7. final def ==(arg0: Any): Boolean

    Definition Classes
    Any
  8. def anchored: Regex

  9. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  10. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws()
  11. def ensuring(cond: (Regex) ⇒ Boolean, msg: ⇒ Any): Regex

    Implicit information
    This member is added by an implicit conversion from Regex to Ensuring[Regex] performed by method Ensuring in scala.Predef.
    Definition Classes
    Ensuring
  12. def ensuring(cond: (Regex) ⇒ Boolean): Regex

    Implicit information
    This member is added by an implicit conversion from Regex to Ensuring[Regex] performed by method Ensuring in scala.Predef.
    Definition Classes
    Ensuring
  13. def ensuring(cond: Boolean, msg: ⇒ Any): Regex

    Implicit information
    This member is added by an implicit conversion from Regex to Ensuring[Regex] performed by method Ensuring in scala.Predef.
    Definition Classes
    Ensuring
  14. def ensuring(cond: Boolean): Regex

    Implicit information
    This member is added by an implicit conversion from Regex to Ensuring[Regex] performed by method Ensuring in scala.Predef.
    Definition Classes
    Ensuring
  15. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  16. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  17. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws()
  18. def findAllIn(source: CharSequence): MatchIterator

    Return all matches of this regexp in given character sequence as a scala.util.matching.Regex.MatchIterator, which is a special scala.collection.Iterator that returns the matched strings, but can also be converted into a normal iterator that returns objects of type scala.util.matching.Regex.Match that can be queried for data such as the text that precedes the match, subgroups, etc.

    Return all matches of this regexp in given character sequence as a scala.util.matching.Regex.MatchIterator, which is a special scala.collection.Iterator that returns the matched strings, but can also be converted into a normal iterator that returns objects of type scala.util.matching.Regex.Match that can be queried for data such as the text that precedes the match, subgroups, etc.

    source

    The text to match against.

    returns

    A scala.util.matching.Regex.MatchIterator of all matches.

    Example:
    1. for (words <- """\w+""".r findAllIn "A simple example.") yield words
  19. def findAllMatchIn(source: CharSequence): Iterator[Match]

    Return all matches of this regexp in given character sequence as a scala.collection.Iterator of scala.util.matching.Regex.Match.

    Return all matches of this regexp in given character sequence as a scala.collection.Iterator of scala.util.matching.Regex.Match.

    source

    The text to match against.

    returns

    A scala.collection.Iterator of scala.util.matching.Regex.Match for all matches.

    Example:
    1. for (words <- """\w+""".r findAllMatchIn "A simple example.") yield words.start
  20. def findFirstIn(source: CharSequence): Option[String]

    Return optionally first matching string of this regexp in given character sequence, or None if it does not exist.

    Return optionally first matching string of this regexp in given character sequence, or None if it does not exist.

    source

    The text to match against.

    returns

    An scala.Option of the first matching string in the text.

    Example:
    1. """\w+""".r findFirstIn "A simple example." foreach println // prints "A"
  21. def findFirstMatchIn(source: CharSequence): Option[Match]

    Return optionally first match of this regexp in given character sequence, or None if it does not exist.

    Return optionally first match of this regexp in given character sequence, or None if it does not exist.

    The main difference between this method and findFirstIn is that the (optional) return type for this is scala.util.matching.Regex.Match, through which more data can be obtained about the match, such as the strings that precede and follow it, or subgroups.

    source

    The text to match against.

    returns

    A scala.Option of scala.util.matching.Regex.Match of the first matching string in the text.

    Example:
    1. ("""[a-z]""".r findFirstMatchIn "A simple example.") map (_.start) // returns Some(2), the index of the first match in the text
  22. def findPrefixMatchOf(source: CharSequence): Option[Match]

    Return optionally match of this regexp at the beginning of the given character sequence, or None if regexp matches no prefix of the character sequence.

    Return optionally match of this regexp at the beginning of the given character sequence, or None if regexp matches no prefix of the character sequence.

    The main difference from this method to findFirstMatchIn is that this method will not return any matches that do not begin at the start of the text being matched against.

    source

    The text to match against.

    returns

    A scala.Option of the scala.util.matching.Regex.Match of the matched string.

    Example:
    1. """\w+""".r findPrefixMatchOf "A simple example." map (_.after) // returns Some(" simple example.")
  23. def findPrefixOf(source: CharSequence): Option[String]

    Return optionally match of this regexp at the beginning of the given character sequence, or None if regexp matches no prefix of the character sequence.

    Return optionally match of this regexp at the beginning of the given character sequence, or None if regexp matches no prefix of the character sequence.

    The main difference from this method to findFirstIn is that this method will not return any matches that do not begin at the start of the text being matched against.

    source

    The text to match against.

    returns

    A scala.Option of the matched prefix.

    Example:
    1. """[a-z]""".r findPrefixOf "A simple example." // returns None, since the text does not begin with a lowercase letter
  24. def formatted(fmtstr: String): String

    Returns string formatted according to given format string.

    Returns string formatted according to given format string. Format strings are as for String.format (@see java.lang.String.format).

    Implicit information
    This member is added by an implicit conversion from Regex to StringFormat[Regex] performed by method StringFormat in scala.Predef.
    Definition Classes
    StringFormat
    Annotations
    @inline()
  25. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  26. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  27. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  28. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  29. final def notify(): Unit

    Definition Classes
    AnyRef
  30. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  31. val pattern: Pattern

    The compiled pattern

  32. def regex: String

  33. def replaceAllIn(target: CharSequence, replacer: (Match) ⇒ String): String

    Replaces all matches using a replacer function.

    Replaces all matches using a replacer function. The replacer function takes a scala.util.matching.Regex.Match so that extra information can be obtained from the match. For example:

    import scala.util.matching.Regex
    val datePattern = new Regex("""(\d\d\d\d)-(\d\d)-(\d\d)""", "year", "month", "day")
    val text = "From 2011-07-15 to 2011-07-17"
    val repl = datePattern replaceAllIn (text, m => m.group("month")+"/"+m.group("day"))

    In the replacement String, a dollar sign ($) followed by a number will be interpreted as a reference to a group in the matched pattern, with numbers 1 through 9 corresponding to the first nine groups, and 0 standing for the whole match. Any other character is an error. The backslash (\) character will be interpreted as an escape character, and can be used to escape the dollar sign. One can use scala.util.matching.Regex's quoteReplacement to automatically escape these characters.

    target

    The string to match.

    replacer

    The function which maps a match to another string.

    returns

    The target string after replacements.

  34. def replaceAllIn(target: CharSequence, replacement: String): String

    Replaces all matches by a string.

    Replaces all matches by a string.

    In the replacement String, a dollar sign ($) followed by a number will be interpreted as a reference to a group in the matched pattern, with numbers 1 through 9 corresponding to the first nine groups, and 0 standing for the whole match. Any other character is an error. The backslash (\) character will be interpreted as an escape character, and can be used to escape the dollar sign. One can use scala.util.matching.Regex's quoteReplacement to automatically escape these characters.

    target

    The string to match

    replacement

    The string that will replace each match

    returns

    The resulting string

    Example:
    1. """\d+""".r replaceAllIn ("July 15", "") // returns "July "
  35. def replaceFirstIn(target: CharSequence, replacement: String): String

    Replaces the first match by a string.

    Replaces the first match by a string.

    In the replacement String, a dollar sign ($) followed by a number will be interpreted as a reference to a group in the matched pattern, with numbers 1 through 9 corresponding to the first nine groups, and 0 standing for the whole match. Any other character is an error. The backslash (\) character will be interpreted as an escape character, and can be used to escape the dollar sign. One can use scala.util.matching.Regex's quoteReplacement to automatically escape these characters.

    target

    The string to match

    replacement

    The string that will replace the match

    returns

    The resulting string

  36. def replaceSomeIn(target: CharSequence, replacer: (Match) ⇒ Option[String]): String

    Replaces some of the matches using a replacer function that returns an scala.Option.

    Replaces some of the matches using a replacer function that returns an scala.Option. The replacer function takes a scala.util.matching.Regex.Match so that extra information can be btained from the match. For example:

    import scala.util.matching.Regex._
    
    val map = Map("x" -> "a var", "y" -> """some $ and \ signs""")
    val text = "A text with variables %x, %y and %z."
    val varPattern = """%(\w+)""".r
    val mapper = (m: Match) => map get (m group 1) map (quoteReplacement(_))
    val repl = varPattern replaceSomeIn (text, mapper)

    In the replacement String, a dollar sign ($) followed by a number will be interpreted as a reference to a group in the matched pattern, with numbers 1 through 9 corresponding to the first nine groups, and 0 standing for the whole match. Any other character is an error. The backslash (\) character will be interpreted as an escape character, and can be used to escape the dollar sign. One can use scala.util.matching.Regex's quoteReplacement to automatically escape these characters.

    target

    The string to match.

    replacer

    The function which optionally maps a match to another string.

    returns

    The target string after replacements.

  37. def runMatcher(m: Matcher): Boolean

    Attributes
    protected
  38. def split(toSplit: CharSequence): Array[String]

    Splits the provided character sequence around matches of this regexp.

    Splits the provided character sequence around matches of this regexp.

    toSplit

    The character sequence to split

    returns

    The array of strings computed by splitting the input around matches of this regexp

  39. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  40. def toString(): String

    The string defining the regular expression

    The string defining the regular expression

    Definition Classes
    Regex → AnyRef → Any
  41. def unanchored: UnanchoredRegex

    Create a new Regex with the same pattern, but no requirement that the entire String matches in extractor patterns.

    Create a new Regex with the same pattern, but no requirement that the entire String matches in extractor patterns. For instance, the strings shown below lead to successful matches, where they would not otherwise.

    val dateP1 = """(\d\d\d\d)-(\d\d)-(\d\d)""".r.unanchored
    
    val dateP1(year, month, day) = "Date 2011-07-15"
    
    val copyright: String = "Date of this document: 2011-07-15" match {
      case dateP1(year, month, day) => "Copyright "+year
      case _                        => "No copyright"
    }
    returns

    The new unanchored regex

  42. def unapplySeq(m: Match): Option[Seq[String]]

    Tries to match on a scala.util.matching.Regex.Match.

    Tries to match on a scala.util.matching.Regex.Match. A previously failed match results in None. If a successful match was made against the current pattern, then that result is used. Otherwise, this Regex is applied to the previously matched input, and the result of that match is used.

  43. def unapplySeq(s: CharSequence): Option[Seq[String]]

    Tries to match a java.lang.CharSequence.

    Tries to match a java.lang.CharSequence. If the match succeeds, the result is a list of the matching groups (or a null element if a group did not match any input). If the pattern specifies no groups, then the result will be an empty list on a successful match.

    This method attempts to match the entire input by default; to find the next matching subsequence, use an unanchored Regex.

    For example:

    val p1 = "ab*c".r
    val p1Matches = "abbbc" match {
      case p1() => true
      case _    => false
    }
    val p2 = "a(b*)c".r
    val numberOfB = "abbbc" match {
      case p2(b) => Some(b.length)
      case _     => None
    }
    val p3 = "b*".r.unanchored
    val p3Matches = "abbbc" match {
      case p3() => true
      case _    => false
    }
    s

    The string to match

    returns

    The matches

  44. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws()
  45. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws()
  46. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws()
  47. def [B](y: B): (Regex, B)

    Implicit information
    This member is added by an implicit conversion from Regex to ArrowAssoc[Regex] performed by method ArrowAssoc in scala.Predef.
    Definition Classes
    ArrowAssoc

Inherited from Serializable

Inherited from java.io.Serializable

Inherited from AnyRef

Inherited from Any

Inherited by implicit conversion StringAdd from Regex to StringAdd[Regex]

Inherited by implicit conversion StringFormat from Regex to StringFormat[Regex]

Inherited by implicit conversion Ensuring from Regex to Ensuring[Regex]

Inherited by implicit conversion ArrowAssoc from Regex to ArrowAssoc[Regex]

Ungrouped