How to match accented characters with a regex?

32,510

Instead of \w, use the POSIX bracket expression [:alpha:]:

"blåbær dèjá vu".scan /[[:alpha:]]+/  # => ["blåbær", "dèjá", "vu"]

"blåbær dèjá vu".scan /\w+/  # => ["bl", "b", "r", "d", "j", "vu"]

In your particular case, change the regex to this:

NAME_REGEX = /^[[:alpha:]\s'"\-_&@!?()\[\]-]*$/u

This does match much more than just accented characters, though. Which is a good thing. Make sure you read this blog entry about common misconceptions regarding names in software applications.

Share:
32,510
user502052
Author by

user502052

Updated on July 10, 2022

Comments

  • user502052
    user502052 almost 2 years

    I am running Ruby on Rails 3.0.10 and Ruby 1.9.2. I am using the following Regex in order to match names:

    NAME_REGEX = /^[\w\s'"\-_&@!?()\[\]-]*$/u
    
    validates :name,
      :presence   => true,
      :format     => {
        :with     => NAME_REGEX,
        :message  => "format is invalid"
      }
    

    However, if I try to save some words like the followings:

    Oilalà
    Pì
    Rùby
    ...
    
    # In few words, those with accented characters
    

    I have a validation error "Name format is invalid..

    How can I change the above Regex so to match also accented characters like à, è, é, ì, ò, ù, ...?