Meaning of regular expressions like - \\d , \\D, ^ , $ etc

62,316

From ?regexp, in the Extended Regular Expressions section:

The caret ‘^’ and the dollar sign ‘$’ are metacharacters that respectively match the empty string at the beginning and end of a line. The symbols ‘\<’ and ‘>’ match the empty string at the beginning and end of a word. The symbol ‘\b’ matches the empty string at either edge of a word, and ‘\B’ matches the empty string provided it is not at an edge of a word. (The interpretation of ‘word’ depends on the locale and implementation: these are all extensions.)

From Perl-like Regular Expressions:

The escape sequences ‘\d’, ‘\s’ and ‘\w’ represent any decimal digit, space character and ‘word’ character (letter, digit or underscore in the current locale: in UTF-8 mode only ASCII letters and digits are considered) respectively, and their upper-case versions represent their negation. Vertical tab was not regarded as a space character in a ‘C’ locale before PCRE 8.34 (included in R 3.0.3). Sequences ‘\h’, ‘\v’, ‘\H’ and ‘\V’ match horizontal and vertical space or the negation. (In UTF-8 mode, these do match non-ASCII Unicode code points.)

Note that backslashes usually need to be doubled/protected in R input, e.g. you would use "\\h" to match horizontal space.

From ?Quotes:

Backslash is used to start an escape sequence inside character constants. Escaping a character not in the following table is an error.
\n newline
\r carriage return
\t tab

As others comment above, you may need a little more help if you're getting started with regular expressions for the first time. This is a little bit off-topic for StackOverflow (links to off-site resources), but there are some links to regular expression resources at the bottom of the gsubfn package overview. Or Google "regular expression tutorial" ...

Share:
62,316
Pankaj Kaundal
Author by

Pankaj Kaundal

Updated on May 03, 2020

Comments

  • Pankaj Kaundal
    Pankaj Kaundal about 4 years

    What do these expressions mean? Where can I learn about their usage?

    \\d 
    \\D 
    \\s 
    \\S 
    \\w 
    \\W
    \\t 
    \\n 
    ^   
    $   
    \   
    |  etc..
    

    I need to use the stringr package and i have absolutely no idea how to use these .

  • Richie Cotton
    Richie Cotton about 8 years
    \n and \t are described in the "Character constants" section of the ?Quotes help page.
  • Ben Bolker
    Ben Bolker about 8 years
    @RichieCotton, feel free to edit if you like. (Should this answer be made community wiki?)