Optional Whitespace Regex

87,453

Add a \s? if a space can be allowed.

\s stands for white space

? says the preceding character may occur once or not occur.

If more than one spaces are allowed and is optional, use \s*.

* says preceding character can occur zero or more times.

'#<a href\s?="(.*?)" title\s?="(.*?)"><img alt\s?="(.*?)" src\s?="(.*?)"[\s*]width\s?="150"[\s*]height\s?="(.*?)"></a>#'

allows an optional space between attribute name and =.

If you want an optional space after the = also, add a \s? after it also.

Likewise, wherever you have optional characters, you can use ? if the maximum occurrence is 1 or * if the maximum occurrence is unlimited, following the optional character.

And your actual problem was [\s*] which causes occurrence of a whitespace or a * as characters enclosed in [ and ] is a character class. A character class allows occurrence of any of its members once (so remove * from it) and if you append a quantifier (?, +, * etc) after the ] any character(s) in the character class can occur according to the quantifier.

Share:
87,453
jameslfc19
Author by

jameslfc19

Updated on August 17, 2020

Comments

  • jameslfc19
    jameslfc19 almost 4 years

    I'm having a problem trying to ignore whitespace in-between certain characters. I've been Googling around for a few days and can't seem to find the right solution.

    Here's my code:

    // Get Image data
    preg_match('#<a href="(.*?)" title="(.*?)"><img alt="(.*?)" src="(.*?)"[\s*]width="150"[\s*]height="(.*?)"></a>#', $data, $imagematch);
    $image = $imagematch[4];
    

    Basically these are some of the scenarios I have:

     <a href="/wiki/File:Sky1.png" title="File:Sky1.png"><img alt="Sky1.png" src="http://media-mcw.cursecdn.com/thumb/5/56/Sky1.png/150px-Sky1.png"width="150" height="84"></a>
    

    (Notice the lack of a space between width="" and src="")

    And

    <a href="/wiki/File:TallGrass.gif" title="File:TallGrass.gif"><img alt="TallGrass.gif" src="http://media-mcw.cursecdn.com/3/34/TallGrass.gif" width="150"height="150"></a>
    

    (Notice the lack of a space in between width="" and height="".)

    Is there anyway to ignore the whitespace in between those characters? As I am not a Regex expert.

  • jameslfc19
    jameslfc19 over 11 years
    Thanks! I changed [\s*] to \s? and it works now! :) Thank you!
  • cryptic ツ
    cryptic ツ over 11 years
    @jameslfc19 \s? means 0 or 1 whitespace characters. However, what if there are more than 1 whitespace characters? You want \s* so it will match 0 or more. Btw you do not want to use regex to parse HTML. You want to use one of these methods.
  • HenonoaH
    HenonoaH over 3 years
    @naveed-s I'm having an issue with trailing space in named capturing but couldn't make it working can you please guide me on what I'm missing? Link to RegExp The word "contact" must include in the match searchTerm that's what I'm trying to achieve.