How do I match any character across multiple lines in a regular expression?

782,912

Solution 1

It depends on the language, but there should be a modifier that you can add to the regex pattern. In PHP it is:

/(.*)<FooBar>/s

The s at the end causes the dot to match all characters including newlines.

Solution 2

Try this:

((.|\n)*)<FooBar>

It basically says "any character or a newline" repeated zero or more times.

Solution 3

The question is, can the . pattern match any character? The answer varies from engine to engine. The main difference is whether the pattern is used by a POSIX or non-POSIX regex library.

A special note about : they are not considered regular expressions, but . matches any character there, the same as POSIX-based engines.

Another note on and : the . matches any character by default (demo): str = "abcde\n fghij<Foobar>"; expression = '(.*)<Foobar>*'; [tokens,matches] = regexp(str,expression,'tokens','match'); (tokens contain a abcde\n fghij item).

Also, in all of 's regex grammars the dot matches line breaks by default. Boost's ECMAScript grammar allows you to turn this off with regex_constants::no_mod_m (source).

As for (it is POSIX based), use the n option (demo): select regexp_substr('abcde' || chr(10) ||' fghij<Foobar>', '(.*)<Foobar>', 1, 1, 'n', 1) as results from dual

POSIX-based engines:

A mere . already matches line breaks, so there isn't a need to use any modifiers, see (demo).

The (demo), (demo), (TRE, base R default engine with no perl=TRUE, for base R with perl=TRUE or for stringr/stringi patterns, use the (?s) inline modifier) (demo) also treat . the same way.

However, most POSIX-based tools process input line by line. Hence, . does not match the line breaks just because they are not in scope. Here are some examples how to override this:

  • - There are multiple workarounds. The most precise, but not very safe, is sed 'H;1h;$!d;x; s/\(.*\)><Foobar>/\1/' (H;1h;$!d;x; slurps the file into memory). If whole lines must be included, sed '/start_pattern/,/end_pattern/d' file (removing from start will end with matched lines included) or sed '/start_pattern/,/end_pattern/{{//!d;};}' file (with matching lines excluded) can be considered.
  • - perl -0pe 's/(.*)<FooBar>/$1/gs' <<< "$str" (-0 slurps the whole file into memory, -p prints the file after applying the script given by -e). Note that using -000pe will slurp the file and activate 'paragraph mode' where Perl uses consecutive newlines (\n\n) as the record separator.
  • - grep -Poz '(?si)abc\K.*?(?=<Foobar>)' file. Here, z enables file slurping, (?s) enables the DOTALL mode for the . pattern, (?i) enables case insensitive mode, \K omits the text matched so far, *? is a lazy quantifier, (?=<Foobar>) matches the location before <Foobar>.
  • - pcregrep -Mi "(?si)abc\K.*?(?=<Foobar>)" file (M enables file slurping here). Note pcregrep is a good solution for macOS grep users.

See demos.

Non-POSIX-based engines:

  • - Use the s modifier PCRE_DOTALL modifier: preg_match('~(.*)<Foobar>~s', $s, $m) (demo)

  • - Use RegexOptions.Singleline flag (demo):
    - var result = Regex.Match(s, @"(.*)<Foobar>", RegexOptions.Singleline).Groups[1].Value;
    - var result = Regex.Match(s, @"(?s)(.*)<Foobar>").Groups[1].Value;

  • - Use the (?s) inline option: $s = "abcde`nfghij<FooBar>"; $s -match "(?s)(.*)<Foobar>"; $matches[1]

  • - Use the s modifier (or (?s) inline version at the start) (demo): /(.*)<FooBar>/s

  • - Use the re.DOTALL (or re.S) flags or (?s) inline modifier (demo): m = re.search(r"(.*)<FooBar>", s, flags=re.S) (and then if m:, print(m.group(1)))

  • - Use Pattern.DOTALL modifier (or inline (?s) flag) (demo): Pattern.compile("(.*)<FooBar>", Pattern.DOTALL)

  • - Use RegexOption.DOT_MATCHES_ALL : "(.*)<FooBar>".toRegex(RegexOption.DOT_MATCHES_ALL)

  • - Use (?s) in-pattern modifier (demo): regex = /(?s)(.*)<FooBar>/

  • - Use (?s) modifier (demo): "(?s)(.*)<Foobar>".r.findAllIn("abcde\n fghij<Foobar>").matchData foreach { m => println(m.group(1)) }

  • - Use [^] or workarounds [\d\D] / [\w\W] / [\s\S] (demo): s.match(/([\s\S]*)<FooBar>/)[1]

  • (std::regex) Use [\s\S] or the JavaScript workarounds (demo): regex rex(R"(([\s\S]*)<FooBar>)");

  • - Use the same approach as in JavaScript, ([\s\S]*)<Foobar>. (NOTE: The MultiLine property of the RegExp object is sometimes erroneously thought to be the option to allow . match across line breaks, while, in fact, it only changes the ^ and $ behavior to match start/end of lines rather than strings, the same as in JavaScript regex) behavior.)

  • - Use the /m MULTILINE modifier (demo): s[/(.*)<Foobar>/m, 1]

  • - Base R PCRE regexps - use (?s): regmatches(x, regexec("(?s)(.*)<FooBar>",x, perl=TRUE))[[1]][2] (demo)

  • - in stringr/stringi regex funtions that are powered with the ICU regex engine. Also use (?s): stringr::str_match(x, "(?s)(.*)<FooBar>")[,2] (demo)

  • - Use the inline modifier (?s) at the start (demo): re: = regexp.MustCompile(`(?s)(.*)<FooBar>`)

  • - Use dotMatchesLineSeparators or (easier) pass the (?s) inline modifier to the pattern: let rx = "(?s)(.*)<Foobar>"

  • - The same as Swift. (?s) works the easiest, but here is how the option can be used: NSRegularExpression* regex = [NSRegularExpression regularExpressionWithPattern:pattern options:NSRegularExpressionDotMatchesLineSeparators error:&regexError];

  • , - Use the (?s) modifier (demo): "(?s)(.*)<Foobar>" (in Google Spreadsheets, =REGEXEXTRACT(A2,"(?s)(.*)<Foobar>"))

NOTES ON (?s):

In most non-POSIX engines, the (?s) inline modifier (or embedded flag option) can be used to enforce . to match line breaks.

If placed at the start of the pattern, (?s) changes the bahavior of all . in the pattern. If the (?s) is placed somewhere after the beginning, only those .s will be affected that are located to the right of it unless this is a pattern passed to Python's re. In Python re, regardless of the (?s) location, the whole pattern . is affected. The (?s) effect is stopped using (?-s). A modified group can be used to only affect a specified range of a regex pattern (e.g., Delim1(?s:.*?)\nDelim2.* will make the first .*? match across newlines and the second .* will only match the rest of the line).

POSIX note:

In non-POSIX regex engines, to match any character, [\s\S] / [\d\D] / [\w\W] constructs can be used.

In POSIX, [\s\S] is not matching any character (as in JavaScript or any non-POSIX engine), because regex escape sequences are not supported inside bracket expressions. [\s\S] is parsed as bracket expressions that match a single character, \ or s or S.

Solution 4

If you're using Eclipse search, you can enable the "DOTALL" option to make '.' match any character including line delimiters: just add "(?s)" at the beginning of your search string. Example:

(?s).*<FooBar>

Solution 5

In many regex dialects, /[\S\s]*<Foobar>/ will do just what you want. Source

Share:
782,912
Doug Blank
Author by

Doug Blank

Updated on July 08, 2022

Comments

  • Doug Blank
    Doug Blank almost 2 years

    For example, this regex

    (.*)<FooBar>
    

    will match:

    abcde<FooBar>
    

    But how do I get it to match across multiple lines?

    abcde
    fghij<FooBar>
    
  • Ben Doom
    Ben Doom over 15 years
    This is dependent on the language and/or tool you are using. Please let us know what you are using, eg Perl, PHP, CF, C#, sed, awk, etc.
  • Alan Moore
    Alan Moore about 15 years
    No, don't do that. If you need to match anything including line separators, use the DOTALL (a.k.a. /s or SingleLine) modifier. Not only does the (.|\n) hack make the regex less efficient, it's not even correct. At the very least, it should match \r (carriage return) as well as \n (linefeed). There are other line separator characters, too, albeit rarely used. But if you use the DOTALL flag, you don't have to worry about them.
  • opyate
    opyate over 14 years
    \R is the platform-independent match for newlines in Eclipse.
  • Grace
    Grace about 13 years
    and what if i wanted just a new line and not all characters ?
  • Paige Ruten
    Paige Ruten about 13 years
    @Grace: use \n to match a newline
  • Potherca
    Potherca about 12 years
    Depending on your line endings you might need ((.|\n|\r)*)<FooBar>
  • Danubian Sailor
    Danubian Sailor about 12 years
    He said he is using Eclipse. This is correct solution in my opinion. I have same problem and this solved it.
  • acme
    acme almost 12 years
    Right - the question is about eclipse and so are the tags. But the accepted solution is a PHP solution. Yours should be the accepted solution...
  • J. Costa
    J. Costa over 11 years
    This solve the problem if you are using the Objective-C [text rangeOfString:regEx options:NSRegularExpressionSearch]. Thanks!
  • jeckhart
    jeckhart over 11 years
    @opyate You should post this as an answer as this little gem is incredibly useful.
  • Josef Sábl
    Josef Sábl about 11 years
  • Allen
    Allen almost 11 years
    Seems like this is invalid (Chrome): text.match(/a/s) SyntaxError: Invalid flags supplied to RegExp constructor 's'
  • Allen
    Allen almost 11 years
    From that link: "JavaScript and VBScript do not have an option to make the dot match line break characters. In those languages, you can use a character class such as [\s\S] to match any character." Instead of the . use [\s\S] (match spaces and non-spaces) instead.
  • Allen
    Allen almost 11 years
    The s flag is (now?) invalid, at least in Chrome/V8. Instead use /([\s\S]*)<FooBar>/ character class (match space and non-space] instead of the period matcher. See other answers for more info.
  • Paul Draper
    Paul Draper over 10 years
    Shouldn't those be backslashes?
  • RandomInsano
    RandomInsano over 10 years
    They go at the end of the Regular Expression, not within in. Example: /blah/s
  • Derek 朕會功夫
    Derek 朕會功夫 almost 9 years
    @Allen - JavaScript doesn't support the s modifier. Instead, do [^]* for the same effect.
  • Ryan Buckley
    Ryan Buckley almost 9 years
    In Ruby, use the m modifier
  • barclay
    barclay over 8 years
    This works in intelliJ's find&replace regex, thanks.
  • frIT
    frIT over 8 years
    \R matches line endings in a platform-independent manner. In eclipse, at least, and some other tools.
  • Manolis Agkopian
    Manolis Agkopian over 8 years
    Very funny, I tried this on gedit and I got a segmentation fault. Murphy's law at its finest.
  • Morgan Touverey Quilling
    Morgan Touverey Quilling about 8 years
    Because it is unsupported in JavaScript RegEx engines. The s flags exists in PCRE, the most complete engine (available in Perl and PHP). PCRE has 10 flags (and a lot of other features) while JavaScript has only 3 flags (gmi).
  • Wiktor Stribiżew
    Wiktor Stribiżew almost 8 years
    This is the worst regex for matching multiple line input. Please never use it unless you are using ElasticSearch. Use [\s\S]* or (?s).*.
  • Wiktor Stribiżew
    Wiktor Stribiżew almost 8 years
    Not anywhere, only in regex flavors supporting inline modifiers, and certainly not in Ruby where (?s) => (?m)
  • ssc-hrep3
    ssc-hrep3 over 7 years
    You could try this instead. It won't match the inner brackets and also consider the optional\r.: ((?:.|\r?\n)*)<foobar>
  • Ozkan
    Ozkan over 6 years
    This works. But it needs to be the first occurrence of <FooBar>
  • Jan
    Jan over 6 years
    You should link to this excellent overview from your profile page or something (+1).
  • Mohamad Hamouday
    Mohamad Hamouday about 6 years
    If there are multiple values of <FooBar>, it will ignore all the values in the middle and only match the last <FooBar>
  • Admin
    Admin about 6 years
    You may want to add this to the boost item: In the regex_constants namespace, flag_type_'s : perl = ECMAScript = JavaScript = JScript = ::boost::regbase::normal = 0 which defaults to Perl. Programmers will set a base flag definition #define MOD regex_constants::perl | boost::regex::no_mod_s | boost::regex::no_mod_m for thier regex flags to reflect that. And the arbitor is always the inline modifiers. Where (?-sm)(?s).* resets.
  • Vamshi Krishna
    Vamshi Krishna almost 6 years
    I am using .NET too and (\s|\S) seems to do the trick for me!
  • NealWalters
    NealWalters over 5 years
    What to use for Powershell?
  • Wiktor Stribiżew
    Wiktor Stribiżew over 5 years
    @VamshiKrishna In .NET, use (?s) to make . match any chars. Do not use (\s|\S) that will slow down performance.
  • 3limin4t0r
    3limin4t0r over 5 years
    I guess you mean JavaScript, not Java? Since you can just add the s flag to the pattern in Java and JavaScript doesn't have the s flag.
  • Pasupathi Rajamanickam
    Pasupathi Rajamanickam over 5 years
    Anything for bash?
  • Pasupathi Rajamanickam
    Pasupathi Rajamanickam over 5 years
    Can you also add for bash please?
  • Wiktor Stribiżew
    Wiktor Stribiżew over 5 years
    @PasupathiRajamanickam Bash uses a POSIX regex engine, the . matches any char there (including line breaks). See this online Bash demo.
  • Snow
    Snow about 5 years
    Such needless alternation can result in catastrophic backtracking in some situations. This isn't a good general pattern.
  • Adam Wenger
    Adam Wenger about 4 years
    Thanks for the ?s modifier for PowerShell; cleaned up my fragile regex to something a bit sturdier.
  • Sebastián Espinosa
    Sebastián Espinosa about 4 years
    you are a legend
  • Gwyneth Llewelyn
    Gwyneth Llewelyn almost 4 years
    You rock — this is the most exhaustive mini-tutorial on (relatively) complex regexp's that I've ever seen. You deserve that your answer becomes the accepted one! Kudos and extra votes for including Go in the answer!
  • Wiktor Stribiżew
    Wiktor Stribiżew almost 4 years
    Never use (.*?|\n)*? unless you want to end up with a catastrophic backtracking.
  • lkahtz
    lkahtz over 3 years
    I like this. This is more general.
  • Nolwennig
    Nolwennig about 3 years
    In xml files, I use : ((.|\n|\r|\t)*)<FooBar> pattern
  • Peter Mortensen
    Peter Mortensen over 2 years
    @Wiktor Stribiżew: Why is it the worst? Will the other match newlines without modifiers?
  • Peter Mortensen
    Peter Mortensen over 2 years
  • Peter Mortensen
    Peter Mortensen over 2 years
    This is specific to a particular platform. What programming language and platform is it? C# / .NET?
  • Peter Mortensen
    Peter Mortensen over 2 years
    What is the underlying regular expression engine for Eclipse? Something in Java/JDK?
  • Peter Mortensen
    Peter Mortensen over 2 years
    The first link somehow redirects to www.facebook.com (which I have blocked in the hosts file). Is that link broken or not?
  • Peter Mortensen
    Peter Mortensen over 2 years
    What programming language? Java?
  • Wiktor Stribiżew
    Wiktor Stribiżew over 2 years
    @PeterMortensen Too many people have already reported peformance issues and even stack overflow errors when using this pattern, and I have even recorded a YT video with explanation of why it is that bad.
  • Peter Mortensen
    Peter Mortensen over 2 years
    It doesn't look right. Why two times ".*"? This may work for the sample input in the question, but what if "<FooBar>" is on line 42?
  • Sian Lerk Lau
    Sian Lerk Lau over 2 years
    I guess the owner decided to redirect it to the facebook page. I will remove it.
  • Just Me
    Just Me about 2 years
    (\r\n)* - super answer. thanks