Wednesday, June 17, 2009

Regex – Phone Numbers

Here is a useful RegEx (.Net) expression to find U.S. phone numbers:

^(?:(?:[\+]?(?<CountryCode>[\d]{1,3}(?:[ ]+|[\-.])))?[(]?(?<AreaCode>[\d]{3})[\-/)]?(?:[ ]+)?)?
(?<Number>(?<Number1>[a-zA-Z0-9]{3})
(?:[\-. ]?)
(?<Number2>[a-zA-Z0-9]{4,}))
(?:[ \-](?:(?:[xX]|ext)[ \-](?<extn>\d{2,5})))?
$

It finds the following formats of numbers:

313-625-6860
(301) 621-6862
1234567890
111 222 3333
(111)-222-3333
1-800-333-4444
327-6116
1-800-REGEXLIB
(610) 310-5555 x 55
(610) 310-5555 ext 55
1 610 310 5555 ext-555
+1 610 310 5555 ext-555

The regex breaks down the phone number into the following groups:

image

Here is how the Regulator’s RegEx analyzer explains it:

^ (anchor to start of string)
Non-capturing Group
  Non-capturing Group
    Any character in "\+"
    ? (zero or one time)
    Capture to <CountryCode>
      Any character in "\d"
      At least 1, but not more than 3 times
      Non-capturing Group
        Any character in " "
        + (one or more times)
                or
        Any character in "\-."
      End Capture
    End Capture
  End Capture
  ? (zero or one time)
  Any character in "("
  ? (zero or one time)
  Capture to <AreaCode>
    Any character in "\d"
    Exactly 3 times
  End Capture
  Any character in "\-/)"
  ? (zero or one time)
  Non-capturing Group
    Any character in " "
    + (one or more times)
  End Capture
  ? (zero or one time)
End Capture
? (zero or one time)


Capture to <Number1>
  Any character in "a-zA-Z0-9"
  Exactly 3 times
End Capture


Non-capturing Group
  Any character in "\-. "
  ? (zero or one time)
End Capture


Capture to <Number2>
  Any character in "a-zA-Z0-9"
  At least 4 times
End Capture


Non-capturing Group
  Any character in " \-"
  Non-capturing Group
    Non-capturing Group
      Any character in "xX"
            or
      ext
    End Capture
    Any character in " \-"
    Capture to <extn>
      Any digit 
      At least 2, but not more than 5 times
    End Capture
  End Capture
End Capture
? (zero or one time)

$ (anchor to end of string)

And here is the code to use it:

private void Test()
        {
            string regex = "^(?:(?:[\\+]?(?<CountryCode>[\\d]{1,3}(?:[ ]+|[\\-.])))?[(]?(?<AreaCode>[\\d]{3})[\\-/" +
")]?(?:[ ]+)?)?\r\n(?<Number>(?<Number1>[a-zA-Z0-9]{3})\r\n(?:[\\-. ]?)\r\n(?<Number2>[a" +
"-zA-Z0-9]{4,}))\r\n(?:[ \\-](?:(?:[xX]|ext)[ \\-](?<extn>\\d{2,5})))?\r\n$";
            System.Text.RegularExpressions.RegexOptions options = ((System.Text.RegularExpressions.RegexOptions.IgnorePatternWhitespace | System.Text.RegularExpressions.RegexOptions.Multiline) 
                        | System.Text.RegularExpressions.RegexOptions.IgnoreCase);
            System.Text.RegularExpressions.Regex reg = new System.Text.RegularExpressions.Regex(regex, options);
        }

2 comments:

jdauie said...

That's a pretty good one, although it could be improved with some more validation. For instance, from Wikipedia, there are several limitations on valid North American area codes and exchanges.

Raj Rao said...

RegEx for RegularExpressionValidator in ASP.Net which allows for an optional extension
^[01]?[- .]?(\(\d{3}\)|\d{3})[- .]?(\d{3})[- .]?(\d{4})([- .](([xX]|[eE]|([Ee](xt)(n)?))?[- .])?\d{1,5})?$