How to do regular expressions in vb.net

19,839

Solution 1

The first pattern is the correct one to use. The second pattern will return true if just one character in the string matches. The third pattern will return true if zero or more characters in the beginning of the string matches, which it always does.

I don't know what you did to make it not work, but using it like this works:

Dim _fontJIAdieuxRegEx As String = "^[a-zA-Z0-9 ]*$"

Dim r As New Regex(_fontJIAdieuxRegEx)

Console.WriteLine(r.IsMatch("darren"))
Console.WriteLine(r.IsMatch("da-rren"))
Console.WriteLine(r.IsMatch("da rren"))

Output:

True
False
True

Solution 2

The regex classes are located in the namespace System.Text.RegularExpressions. To make them available, place Imports System.Text.RegularExpressions at the start of your source code.

Regex.IsMatch("subject", "regex") 

checks if the regular expression matches the subject string.

Regex.Replace("subject", "regex", "replacement") 

performs a search-and-replace.

Regex.Split("subject", "regex") 

splits the subject string into an array of strings as described above. All these methods accept an optional additional parameter of type RegexOptions, like the constructor.

Source / more information: http://www.regular-expressions.info/dotnet.html

Solution 3

Your 2nd expression matches anything. The * character at the end of the character class tell the regular expression engine to match that character class zero or more times. Since there are not other conditions in the expression, any string is valid. The third expression matches anything that has at least one valid character.

The first expression should work, but I'm not a fan of the start and end anchors (^ and $) if you don't need them. What I would do here instead is invert the expression... look for characters that are not valid. The expression would look like this:

[^A-Za-z0-9 ]

In this case, the ^ character used as part of the character class means to negate the class: this will match any character that is not in that class, and since we don't have any anchors it will match if such a character occurs anywhere in the string. Now, of course, I must also invert the result in the VB.Net code:

Dim r As New Regex("[^A-Za-z0-9 ]")
Dim supported = Not r.IsMatch(fontName)
Share:
19,839
Darren Wainwright
Author by

Darren Wainwright

Updated on June 04, 2022

Comments

  • Darren Wainwright
    Darren Wainwright almost 2 years

    So there are many questions and answers here around the subject of regular expressions. The downside is that the vast majority of answers are simply the regular expression...

    I have also googled - and found hundreds of sites. Trying to wade through everything for a quick-to-understand and implement answer isn't too easy. they are either in a different language - which maybe shouldn't make any difference, though you escape differently in C# to VB and that leads to confusion as to what is an escape character vs a regex switch.

    The part I am struggling with is understanding them so I can implement some, apparently, simple expressions.

    My scenario:

    I have to check every character in a given string, and if the regular expression doesn't allow any of the characters then it should return false.

    Example:

    I have tried the following expressions (copy/pasted from various answers here....)

    Dim r As New Regex("^[a-zA-Z0-9 ]*$")
    

    also tried

    Dim r As New Regex("[a-zA-Z0-9\s]")
    

    also tried

    Dim r as New Regex("^[A-Za-z0-9 ]*")
    

    I have been implementing this like:

    Dim r As New Regex(_fontJIAdieuxRegEx) '' where _fontJIAdieuxRegEx is one of the above regex strings.
    Dim supported = r.IsMatch(fontName)
    

    I have been trying to validate something like the following:

    darren should return True

    da-rren should return False due to the - hyphen

    da rren should return True

    Now, simply put, any of these expressions will either return True for all of the strings or False for all of the strings; so i am clearly doing something wrong.

    what I would really appreciate is someone pointing out where I am going wrong and also explain a little about the make-up of the regular expression.

    Once I understand them a little more I need to be able to have different expressions to allow other characters, such as ! @ " ' . etc. So please don't just paste an expression to solve the simple example above.

  • Darren Wainwright
    Darren Wainwright over 11 years
    I now get my first pattern to match, thanks :) - seems I had a typo in my source code that wasn't on the question. Would you mind explaining, roughly, what the ^,* and $ do? and how would I escape for things like ' and " ?
  • Darren Wainwright
    Darren Wainwright over 11 years
    Thanks Daan - and welcome to SO. I'm more trying to understand how the regex is put together rather than how to call the code; I've got that part working. it's my patterns that suck :)
  • Joel Coehoorn
    Joel Coehoorn over 11 years
    The ^ character is an anchor to the beginning of the string. Be careful, because this character can have another meaning when used in a different place. The * character here means to match the preceding part of the expression zero or more times, and the $ character is another anchor, this time to the end of the string. Taken all together, they mean to match this set at the start of the string zero or more times, all the way to the end of the string.
  • Darren Wainwright
    Darren Wainwright over 11 years
    Ah, gotcha. It's starting to click a little more now i think. I just tried adding a bunch of characters to the character class and ended up with an ArgumentException - parsing "^[a-zA-Z0-9 '_-()&#!+:;=]*$" - [x-y] range in reverse order. I figure it's the ' character causing the issue. how to i escape in VB for characters like ' and " and other reserved characters? Thanks again for your help
  • Darren Wainwright
    Darren Wainwright over 11 years
    Thanks Joel, certainly looks a lot tidier!
  • Guffa
    Guffa over 11 years
    @Darren: The apostrophe is not a problem, but the dash is. You have the range _-( which means all characters from _ to (, but you get that parsing error as _ comes after (. As you don't want that to be a range at all, you should escape the dash using \-.
  • Simon Halsey
    Simon Halsey over 11 years
    The other thing you have to remember is * is greedy by default, meaning it matches as much as possible. if you add a ? after it, it becomes non-greedy, matching as few as possible.
  • Darren Wainwright
    Darren Wainwright over 11 years
    @SimonHalsey - so in this case, where i need to make sure everything in the string is valid i would want to use *, wouldn't i?
  • Darren Wainwright
    Darren Wainwright over 11 years
    @Guffa - excellent, thanks, added that in and it worked :D gaining a much better understand of it now, thanks again :) - marking this as answered as you have provided just what I was after; an easy explanation for the fluff-and-nonsense that is RegEx :)
  • Simon Halsey
    Simon Halsey over 11 years
    @Darren, yes. Greediness is something you need to take into account when you're only checking part of a string
  • Guffa
    Guffa over 11 years
    You only need to worry about greediness if you are capturing something that you match. When you are just checking for a match, it doesn't come into play.