Regex to find integers and decimals in string

15,198

Solution 1

If you just want to grab the data, you can just use a loose regex:

([\d.]+)\s+(\S+)
  • ([\d.]+): [\d.]+ will match a sequence of strictly digits and . (it means 4.5.6 or .... will match, but those cases are not common, and this is just for grabbing data), and the parentheses signify that we will capture the matched text. The . here is inside character class [], so no need for escaping.

  • Followed by arbitrary spaces \s+ and maximum sequence (due to greedy quantifier) of non-space character \S+ (non-space really is non-space: it will match almost everything in Unicode, except for space, tab, new line, carriage return characters).

You can get the number in the first capturing group, and the unit in the 2nd capturing group.

You can be a bit stricter on the number:

(\d+(?:\.\d*)?|\.\d+)\s+(\S+)
  • The only change is (\d+(?:\.\d*)?|\.\d+), so I will only explain this part. This is a bit stricter, but whether stricter is better depending on the input domain and your requirement. It will match integer 34, number with decimal part 3.40000 and allow .5 and 34. cases to pass. It will reject number with excessive ., or only contain a .. The | acts as OR which separate 2 different pattern: \.\d+ and \d+(?:\.\d*)?.
  • \d+(?:\.\d*)?: This will match and (implicitly) assert at least one digit in integer part, followed by optional . (which needs to be escaped with \ since . means any character) and fractional part (which can be 0 or more digits). The optionality is indicated by ? at the end. () can be used for grouping and capturing - but if capturing is not needed, then (?:) can be used to disable capturing (save memory).
  • \.\d+: This will match for the case such as .78. It matches . followed by at least one (signified by +) digit.

This is not a good solution if you want to make sure you get something meaningful out of the input string. You need to define all expected units before you can write a regex that only captures valid data.

Solution 2

use this regular expression \b\d+([\.,]\d+)?

Share:
15,198
HWD
Author by

HWD

Updated on June 23, 2022

Comments

  • HWD
    HWD almost 2 years

    I have a string like:

    $str1 = "12 ounces";
    $str2 = "1.5 ounces chopped;
    

    I'd like to get the amount from the string whether it is a decimal or not (12 or 1.5), and then grab the immediately preceding measurement (ounces).

    I was able to use a pretty rudimentary regex to grab the measurement, but getting the decimal/integer has been giving me problems.

    Thanks for your help!

  • HWD
    HWD almost 12 years
    Thank you, this is exactly what I was looking for. If you have the chance, a breakdown of what exactly is happening in your regex would be helpful to me. Thanks again!
  • HWD
    HWD almost 12 years
    Great explanation. A useful answer!