Matching any digit, word character, or a space 46 or more times before a less than sign

10,094

Solution 1

It doesn't match because the input contains commas and hyphens, which are not part of any of the three character classes you include.

This would match:

^.*<Elem1>[\d\w\s,-]{46,}?

Additionally, it only makes sense to include the start of input anchor and then go on to say "oh, ignore any characters you find before an <Elem1>" if the regex runs in multiline mode. Otherwise, the same effect can be achieved with just

<Elem1>[\d\w\s,-]{46,}?

Solution 2

use this regex <Elem1>[\w\s]{46,}

Solution 3

The backslash groups only contain digits, word characters and whitespace, so commas and dashes (as in your example) are not included. Also, if you really want to match everything between the tags, you should drop the ? to make it greedy, and maybe add the closing tag to it too. You can then just use a capturing group to get the inner content: (Example)

^.*<Elem1>([\d\w\s,-]{46,})</

Alternatively, if you want to make sure you catch other characters as well, you could just accept any character other than the < symbol inside the tag:

^.*<Elem1>([^<]{46,})</
Share:
10,094
Mike Perrenoud
Author by

Mike Perrenoud

Mike Perrenoud is a 19-year veteran developer, polyglot, mentor and all around nice guy. He enjoys helping people and making a difference in their lives.

Updated on June 05, 2022

Comments

  • Mike Perrenoud
    Mike Perrenoud almost 2 years

    Objective

    I want to match any digit, word character, or space 46 or more times before a < sign.

    One note is that I'm trying to use this RegEx in Notepad++ before plugging it into the C# code.

    Data

    <Elem1>123 ABC Street</Elem1> // should NOT match
    <Elem1>123637 ABC Street Suite 1, Kalamzoo, FL 15264-8574</Elem1>
    

    RegEx

    I currently have the following RegEx:

    ^.*<Elem1>[\d\w\s]{46,}?
    

    and I can't figure out why this [\d\w\s]{46,}? won't match the inner portion of the element.

    I look forward to your answers!