How to escape a string for use in Boost Regex

15,432

Solution 1

. ^ $ | ( ) [ ] { } * + ? \

Ironically, you could use a regex to escape your URL so that it can be inserted into a regex.

const boost::regex esc("[.^$|()\\[\\]{}*+?\\\\]");
const std::string rep("\\\\&");
std::string result = regex_replace(url_to_escape, esc, rep,
                                   boost::match_default | boost::format_sed);

(The flag boost::format_sed specifies to use the replacement string format of sed. In sed, an escape & will output whatever matched by the whole expression)

Or if you are not comfortable with sed's replacement string format, just change the flag to boost::format_perl, and you can use the familiar $& to refer to whatever matched by the whole expression.

const std::string rep("\\\\$&");
std::string result = regex_replace(url_to_escape, esc, rep,
                                   boost::match_default | boost::format_perl);

Solution 2

Using code from Dav (+ a fix from comments), I created ASCII/Unicode function regex_escape():

std::wstring regex_escape(const std::wstring& string_to_escape) {
    static const boost::wregex re_boostRegexEscape( _T("[.^$|()\\[\\]{}*+?\\\\]") );
    const std::wstring rep( _T("\\\\&") );
    std::wstring result = regex_replace(string_to_escape, re_boostRegexEscape, rep, boost::match_default | boost::format_sed);
    return result;
}

For ASCII version, use std::string/boost::regex instead of std::wstring/boost::wregex.

Solution 3

Same with boost::xpressive:

const boost::xpressive::sregex re_escape_text = boost::xpressive::sregex::compile("([\\^\\.\\$\\|\\(\\)\\[\\]\\*\\+\\?\\/\\\\])");

std::string regex_escape(std::string text){
    text = boost::xpressive::regex_replace( text, re_escape_text, std::string("\\$1") );
    return text;
}

Solution 4

In C++11, you can use raw string literals to avoid escaping the regex string:

std::string myRegex = R"(something\.com)";

See http://en.cppreference.com/w/cpp/language/string_literal, item (6).

Share:
15,432
Gerald
Author by

Gerald

Updated on June 11, 2022

Comments

  • Gerald
    Gerald almost 2 years

    I'm just getting my head around regular expressions, and I'm using the Boost Regex library.

    I have a need to use a regex that includes a specific URL, and it chokes because obviously there are characters in the URL that are reserved for regex and need to be escaped.

    Is there any function or method in the Boost library to escape a string for this kind of usage? I know there are such methods in most other regex implementations, but I don't see one in Boost.

    Alternatively, is there a list of all characters that would need to be escaped?