Regular expression "empty range in char class error"

13,748

I can replicate this error on Ruby 1.9.3p194 (2012-04-20 revision 35410) [i686-linux], installed on Ubuntu 12.04.1 LTS using rvm 1.13.4. However, this should not be a version-specific error. In fact, I'm surprised it worked on the other machines at all.

A a simpler demonstration that fails just as well:

"abcd" =~ /[\w- ]/

This is because [\w- ] is interpreted as "a range beginning with any word character up to space (or blank)", rather than a character class containing a word, a hyphen, or a space, which is what you had intended.

Per Ruby's regular expression documentation:

Within a character class the hyphen (-) is a metacharacter denoting an inclusive range of characters. [abcd] is equivalent to [a-d]. A range can be followed by another range, so [abcdwxyz] is equivalent to [a-dw-z]. The order in which ranges or individual characters appear inside a character class is irrelevant.

As you saw, prepending a backslash escaped the hyphen, thus changing the nature of the regexp from a range to a character class, removing the error. However, escaping the hyphen in the middle of character class is not recommended, since it's easy to confuse the intended meaning of the hyphen in such cases. As m.buettner pointed out, always place hyphens either at the beginning or the end of a character class:

"abcd" =~ /[-\w ]/
Share:
13,748

Related videos on Youtube

Steve
Author by

Steve

Updated on September 15, 2022

Comments

  • Steve
    Steve over 1 year

    I got a regex in my code, which is to match pattern of url and threw error:

    /^(http|https):\/\/([\w-]+\.)+[\w-]+([\w- .\/?%&=]*)?$/
    

    The error was "empty range in char class error". I found the cause of that is in ([\w- .\/?%&=]*)? part. Ruby seems to recognize - in \w- . as an operator for range instead of a literal -. After adding escape to the dash, the problem was solved.

    But the original regular expression ran well on my co-workers' machines. We use the same version of osx, rails and ruby: Ruby version is ruby 1.9.3p194, rails is 3.1.6 and osx is 10.7.5. And after we deployed code to our Heroku server, everything worked fine too. Why did only my environment have error regarding this regex? What is the mechanism of Ruby regex interpreting?

    • Martin Ender
      Martin Ender over 11 years
      I don't know why it worked on one machine and not on another, but hyphens in character classes should always be either escaped or at the beginning or end of the character class. Otherwise the engine might decide to make it a range. Hyphens are also allowed directly after other ranges (like [A-Z-_]) but this is rather discouraged, too, I'd say.
    • Dave Newton
      Dave Newton over 11 years
      What version of Ruby? Is it an earlier version with the optional regex support compiled in? Without provided any details regarding at least versioning, possibly OS, etc. it's impossible to help.
    • Mark Thomas
      Mark Thomas over 11 years
      It's standard regex practice to place the dash at the end of the character class.