How to match a long with Java regex?

13,997

Solution 1

The minimum avlue of a long is -9,223,372,036,854,775,808, and the maximum value is 9,223,372,036,854,775,807. So, a maximum of 19 digits. So, \d{1,19} should get you there, perhaps with an optional -, and with ^ and $ to match the ends of the string.

So roughly:

Pattern LONG_PATTERN = Pattern.compile("^-?\\d{1,19}$");

...or something along those lines, and assuming you don't allow commas (or have already removed them).

As gexicide points out in the comments, the above allows a small (in comparison) range of invalid values, such as 9,999,999,999,999,999,999. You can get more complex with your regex, or just accept that the above will weed out the vast majority of invalid numbers and so you reduce the number of parsing exceptions you get.

Solution 2

This regular expression should do what you need:

^(-9223372036854775808|0)$|^((-?)((?!0)\d{1,18}|[1-8]\d{18}|9[0-1]\d{17}|92[0-1]\d{16}|922[0-2]\d{15}|9223[0-2]\d{14}|92233[0-6]\d{13}|922337[0-1]\d{12}|92233720[0-2]\d{10}|922337203[0-5]\d{9}|9223372036[0-7]\d{8}|92233720368[0-4]\d{7}|922337203685[0-3]\d{6}|9223372036854[0-6]\d{5}|92233720368547[0-6]\d{4}|922337203685477[0-4]\d{3}|9223372036854775[0-7]\d{2}|922337203685477580[0-7]))$

But this regexp doesn't validate additional symbols like +, L, _ and etc. And if you need to validate all possible Long values you need to upgrade this regexp.

Solution 3

Simply catch the NumberFormatException, unless this case happens very often.

Another way would be to use a pattern which only allows long literals. Such pattern might be quite complex.

A third way would be to parse the number as BigInt first. Then you can compare it to Long.MAX_VALUE and Long.MIN_VALUE to check whether it is in the bounds of long. However, this might be costly as well.

Also note: Parsing the long is quite fast, it is a very optimized method (that, for example, tries to parse two digits in one step). Applying pattern matching might be even more costly than performing the parsing. The only thing which is slow about the parsing is throwing the NumberFormatException. Thus, simply catching the exception is the best way to go if the exceptional case does not happen too often

Share:
13,997
Sebastien Lorber
Author by

Sebastien Lorber

React expert & early adopter (January 2014) Freelance, working for Facebook/Meta as Docusaurus maintainer since 2020. Author of ThisWeekInReact.com, the best newsletter to stay up-to-date with the React ecosystem:

Updated on June 15, 2022

Comments

  • Sebastien Lorber
    Sebastien Lorber almost 2 years

    I know i can match numbers with Pattern.compile("\\d*");

    But it doesn't handle the long min/max values.

    For performence issues related to exceptions i do not want to try to parse the long unless it is really a long.

    if ( LONG_PATTERN.matcher(timestampStr).matches() ) {
        long timeStamp = Long.parseLong(timestampStr);
        return new Date(timeStamp);
    } else {
        LOGGER.error("Can't convert " + timestampStr + " to a Date because it is not a timestamp! -> ");
        return null;
    }
    

    I mean i do not want any try/catch block and i do not want to get exceptions raised for a long like "564654954654464654654567879865132154778" which is out of the size of a regular Java long.

    Does someone has a pattern to handle this kind of need for the primitive java types? Does the JDK provide something to handle it automatically? Is there a fail-safe parsing mecanism in Java?

    Thanks


    Edit: Please assume that the "bad long string" is not an exceptionnal case. I'm not asking for a benchmark, i'm here for a regex representing a long and nothing more. I'm aware of the additionnal time required by the regex check, but at least my long parsing will always be constant and never be dependent of the % of "bad long strings"

    I can't find the link again but there is a nice parsing benchmark on StackOverflow which clearly shows that reusing the sams compiled regex is really fast, a LOT faster than throwing an exception, thus only a small threshold of exceptions whould make the system slower than with the additionnal regex check.