How to parse a string to an int in C++?

242,213

Solution 1

In the new C++11 there are functions for that: stoi, stol, stoll, stoul and so on.

int myNr = std::stoi(myString);

It will throw an exception on conversion error.

Even these new functions still have the same issue as noted by Dan: they will happily convert the string "11x" to integer "11".

See more: http://en.cppreference.com/w/cpp/string/basic_string/stol

Solution 2

What not to do

Here is my first piece of advice: do not use stringstream for this. While at first it may seem simple to use, you'll find that you have to do a lot of extra work if you want robustness and good error handling.

Here is an approach that intuitively seems like it should work:

bool str2int (int &i, char const *s)
{
    std::stringstream ss(s);
    ss >> i;
    if (ss.fail()) {
        // not an integer
        return false;
    }
    return true;
}

This has a major problem: str2int(i, "1337h4x0r") will happily return true and i will get the value 1337. We can work around this problem by ensuring there are no more characters in the stringstream after the conversion:

bool str2int (int &i, char const *s)
{
    char              c;
    std::stringstream ss(s);
    ss >> i;
    if (ss.fail() || ss.get(c)) {
        // not an integer
        return false;
    }
    return true;
}

We fixed one problem, but there are still a couple of other problems.

What if the number in the string is not base 10? We can try to accommodate other bases by setting the stream to the correct mode (e.g. ss << std::hex) before trying the conversion. But this means the caller must know a priori what base the number is -- and how can the caller possibly know that? The caller doesn't know what the number is yet. They don't even know that it is a number! How can they be expected to know what base it is? We could just mandate that all numbers input to our programs must be base 10 and reject hexadecimal or octal input as invalid. But that is not very flexible or robust. There is no simple solution to this problem. You can't simply try the conversion once for each base, because the decimal conversion will always succeed for octal numbers (with a leading zero) and the octal conversion may succeed for some decimal numbers. So now you have to check for a leading zero. But wait! Hexadecimal numbers can start with a leading zero too (0x...). Sigh.

Even if you succeed in dealing with the above problems, there is still another bigger problem: what if the caller needs to distinguish between bad input (e.g. "123foo") and a number that is out of the range of int (e.g. "4000000000" for 32-bit int)? With stringstream, there is no way to make this distinction. We only know whether the conversion succeeded or failed. If it fails, we have no way of knowing why it failed. As you can see, stringstream leaves much to be desired if you want robustness and clear error handling.

This leads me to my second piece of advice: do no use Boost's lexical_cast for this. Consider what the lexical_cast documentation has to say:

Where a higher degree of control is required over conversions, std::stringstream and std::wstringstream offer a more appropriate path. Where non-stream-based conversions are required, lexical_cast is the wrong tool for the job and is not special-cased for such scenarios.

What?? We've already seen that stringstream has a poor level of control, and yet it says stringstream should be used instead of lexical_cast if you need "a higher level of control". Also, because lexical_cast is just a wrapper around stringstream, it suffers from the same problems that stringstream does: poor support for multiple number bases and poor error handling.

The best solution

Fortunately, somebody has already solved all of the above problems. The C standard library contains strtol and family which have none of these problems.

enum STR2INT_ERROR { SUCCESS, OVERFLOW, UNDERFLOW, INCONVERTIBLE };
STR2INT_ERROR str2int (int &i, char const *s, int base = 0)
{
    char *end;
    long  l;
    errno = 0;
    l = strtol(s, &end, base);
    if ((errno == ERANGE && l == LONG_MAX) || l > INT_MAX) {
        return OVERFLOW;
    }
    if ((errno == ERANGE && l == LONG_MIN) || l < INT_MIN) {
        return UNDERFLOW;
    }
    if (*s == '\0' || *end != '\0') {
        return INCONVERTIBLE;
    }
    i = l;
    return SUCCESS;
}

Pretty simple for something that handles all the error cases and also supports any number base from 2 to 36. If base is zero (the default) it will try to convert from any base. Or the caller can supply the third argument and specify that the conversion should only be attempted for a particular base. It is robust and handles all errors with a minimal amount of effort.

Other reasons to prefer strtol (and family):

  • It exhibits much better runtime performance
  • It introduces less compile-time overhead (the others pull in nearly 20 times more SLOC from headers)
  • It results in the smallest code size

There is absolutely no good reason to use any other method.

Solution 3

This is a safer C way than atoi()

const char* str = "123";
int i;
if(sscanf(str, "%d", &i)  == EOF )
{
   /* error */
}

C++ with standard library stringstream: (thanks CMS )

int str2int (const string &str) {
  stringstream ss(str);
  int num;
  if((ss >> num).fail())
  { 
      //ERROR 
  }
  return num;
}

With boost library: (thanks jk)

#include <boost/lexical_cast.hpp>
#include <string>
try
{
    std::string str = "123";
    int number = boost::lexical_cast< int >( str );
}
catch( const boost::bad_lexical_cast & )
{
    // Error
}

Edit: Fixed the stringstream version so that it handles errors. (thanks to CMS's and jk's comment on original post)

Solution 4

The good 'old C way still works. I recommend strtol or strtoul. Between the return status and the 'endPtr', you can give good diagnostic output. It also handles multiple bases nicely.

Solution 5

You can use Boost's lexical_cast, which wraps this in a more generic interface. lexical_cast<Target>(Source) throws bad_lexical_cast on failure.

Share:
242,213

Related videos on Youtube

Eugene Yokota
Author by

Eugene Yokota

Hi, I'm Eugene (eed3si9n). I am a software engineer and an open source contributor mostly around Scala. As the core maintainer of sbt, a build tool used in Scala community, I like helping debug and explain sbt. Other projects I contribute to: scalaxb, an XML databinding tool for Scala (author) treehugger.scala (author) scopt/scopt (maintainer) Twitter: @eed3si9n Github: @eed3si9n

Updated on March 17, 2022

Comments

  • Eugene Yokota
    Eugene Yokota about 1 year

    What's the C++ way of parsing a string (given as char *) into an int? Robust and clear error handling is a plus (instead of returning zero).

  • jk. over 14 years
    But this does not handle any errors. You have to check the stream for failures.
  • Christian C. Salvadó
    Christian C. Salvadó over 14 years
    Right you have to check the stream if((ss >> num).fail()){ //ERROR }
  • jk. over 14 years
    Oh please don't use this old C stuff when programming C++. There are better/easier/cleaner/more modern/safer ways to do this in C++!
  • jk. over 14 years
    please updated your stringstream version to include a check for stringstream::fail() (as requested by the questioner "Robust and clear error handling")
  • fostiguy over 14 years
    It's funny when people are concerned about "more modern" ways to solve a problem.
  • Eugene Yokota
    Eugene Yokota over 14 years
    @Jason, IMO stronger type safety and error handling is more modern idea compared to that of C.
  • Efren Narvaez over 14 years
    I've looked at the other answers, and so far nothing is obviously better/easier/cleaner or safer. The poster said he had a char *. That limits the amount of safety you are going to get :)
  • Johannes Schaub - litb
    Johannes Schaub - litb over 14 years
    Your stringstream version will accept stuff like "10haha" without complaining
  • Johannes Schaub - litb
    Johannes Schaub - litb over 14 years
    change it to (!(ss >> num).fail() && (ss >> ws).eof()) from ((ss >> num).fail()) if you want the same handling like lexical_cast
  • captonssj over 13 years
    The C++ with standard library stringstream method doesn't work for strings such as "12-SomeString" even with the .fail() check.
  • captonssj over 13 years
    The C++ stringstream method doesn't work for strings such as "12-SomeString" even with the 'stream state' check.
  • captonssj over 13 years
    The C++ stringstream method doesn't work for strings such as "12-SomeString" even with the 'stream state' check
  • Admin
    Admin over 12 years
    Boost lexical_cast is extremely slow and painfully inefficient.
  • James Dunne
    James Dunne about 11 years
    strtol is not thread-safe because of the use of global variables for error state.
  • Dan Moulding
    Dan Moulding about 11 years
    @JamesDunne: POSIX requires strtol to be thread-safe. POSIX also requires errno to use thread-local storage. Even on non-POSIX systems, nearly all implementations of errno on multithreaded systems use thread-local storage. The latest C++ standard requires errno to be POSIX compliant. The latest C standard also requires errno to have thread-local storage. Even on Windows, which is definitely not POSIX compliant, errno is thread-safe and, by extension, so is strtol.
  • fhd about 11 years
    I can't really follow your reasoning against using boost::lexical_cast. As they say, std::stringstream does indeed offer a lot of control - you do everything from error checking to determining base youfrself. The current documentation puts it like this: "For more involved conversions, such as where precision or formatting need tighter control than is offered by the default behavior of lexical_cast, the conventional std::stringstream approach is recommended."
  • flies
    flies about 11 years
    @Matthieu Updates to Boost have made improved performance quite a bit: boost.org/doc/libs/1_49_0/doc/html/boost_lexical_cast/… (see also stackoverflow.com/questions/1250795/… )
  • Grault
    Grault over 10 years
    Note, the identifiers OVERFLOW and UNDERFLOW are used for macros by gcc (and therefore g++) for compatibility with System V. To disable extensions to the standard such as this one, pass -ansi to g++ at the command-line or makefile. sourceware.org/bugzilla/show_bug.cgi?id=5407 gcc.gnu.org/onlinedocs/gcc-4.7.1/gcc/C-Dialect-Options.html
  • Grault
    Grault over 10 years
    Follow-up: The bug report at the first link indicates that -D_ISOC99_SOURCE will disable the extension, but the second link says -ansi will disable all. I posted too soon and have not yet gotten either to work.
  • Grault
    Grault over 10 years
    I'm running Mac OS X 10.6 (llvm-gcc 4.2) and had to define _POSIX_C_SOURCE to disable the extension. This is likely not necessary on other systems.
  • Zharf over 10 years
    But they accept arguments than that, one of them being a point to size_t that's, if not null, is set to the first unconverted character
  • CC. over 10 years
    Yes, using the second parameter of std::stoi you can detect invalid input. You still have to roll your own conversion function though...
  • Zharf over 10 years
    Just like the accepted answer did, but with these standard functions that would be much cleaner, imo
  • Vinnie Falco
    Vinnie Falco almost 10 years
    WARNING This implementation looks nice, but doesn't handle overflows as far as I can tell.
  • fuzzyTew
    fuzzyTew almost 10 years
    This is inappropriate C coding within C++. The standard library contains std::stol for this, which will appropriately throw exceptions rather than returning constants.
  • fuzzyTew
    fuzzyTew almost 10 years
    C++11 includes standard fast functions for this now
  • Eugene Yokota
    Eugene Yokota almost 10 years
    As I linked to atoi in the question, I'm aware of it. The question is clearly not about C, but about C++. -1
  • Dan Moulding
    Dan Moulding over 9 years
    @fuzzyTew I wrote this answer before std::stol was even added to the C++ language. That said, I don't think it's fair to say that this is "C coding within C++". It's silly to say that std::strtol is C coding when it is explicitly part of the C++ language. My answer applied perfectly to C++ when it was written and it still does apply even with the new std::stol. Calling functions that may throw exceptions isn't always the best for every programming situation.
  • fuzzyTew
    fuzzyTew over 9 years
    @DanMoulding , true, I didn't realize when writing that stol was c++11 only. However, exceptions are the standard way of handling exceptional conditions in C++. Returning error constants is a C paradigm that is generally frowned upon and unexpected in C++, adding barriers to debugging and sharing of code, and spreading visually bloating conditional checks. It's needed in C because C has no exceptions. Return values do often compile to faster code, so may be the best if the code is profiled to have a bottleneck around many exceptional conditions.
  • Eugene Yokota
    Eugene Yokota over 9 years
    Not sure how prevalent C++11 is, but as the idiomatic C++ way of parsing int, I'm going to switch my accepted answer to std::stol.
  • Eugene Yokota
    Eugene Yokota over 8 years
    idk. Writing a define macro around atoi doesn't seem like "the C++ way," in light of other answers like the accepted std::stoi().
  • Boris over 8 years
    I find it more fun using pre-defined methods :P
  • Ben Voigt
    Ben Voigt over 8 years
    @fuzzyTew: Running out of disk space is an exceptional condition. Malformatted data files which are computer produced are an exceptional condition. But typos in user input are not exceptional. It's good to have a parsing approach which is able to handle normal, non-exceptional parsing failures.
  • cp.engr
    cp.engr about 7 years
    Your strtol() wrapper is very similar to BSD's strtonum().
  • rtmh
    rtmh almost 7 years
    @Grault and @DanMoulding, is there a reason not to place a qualifying 'tag' before each of the STR2INT_ERROR enumerators (i.e. enum STR2INT_ERROR { S2I_SUCCESS, S2I_OVERFLOW, S2I_UNDERFLOW, S2I_INCONVERTIBLE };? Would this not be a simpler solution to the conflict?
  • Grault
    Grault almost 7 years
    @rtmh Yes. If I recall my state of mind correctly, I was upset at Apple for polluting the macro space by not doing something similar in their extension. Namespacing the macros without fuss, as you suggested, would probably be most practical.
  • rtmh
    rtmh almost 7 years
    @Grault It's funny you mention apple because I ran into it with MSVS as well. Irksome indeed, haha.
  • chux - Reinstate Monica
    chux - Reinstate Monica over 6 years
    Should calling code use errno for error discrimination, changing code to if ((errno == ERANGE && l == LONG_MAX) || l > INT_MAX) { errno = ERANGE; return OVERFLOW; } would be a small amount of extra code to accommodate that.
  • chux - Reinstate Monica
    chux - Reinstate Monica over 6 years
    Code does not handle overflow. v = (10 * v) + digit; overflows needlessly with string input with the text value of INT_MIN. Table is of questionable value vs simply digit >= '0' && digit <= '9'
  • chux - Reinstate Monica
    chux - Reinstate Monica over 6 years
    "use largest type"--> why long long instead of intmax_t?
  • chux - Reinstate Monica
    chux - Reinstate Monica over 6 years
    Confident you want if (ePtr != str). Further, use isspace((unsigned char) *ePtr) to properly handle negative values of *ePtr.
  • pellucide
    pellucide over 6 years
    @chux added code to take care of the concerns you mentioned.
  • chux - Reinstate Monica
    chux - Reinstate Monica over 6 years
    1) Still fails to detect error with input like " ". strtol() is not specified to set errno when no conversion occurs. Better to use if (s == end) return INCONVERTIBLE; to detect no conversion. And then if (*s == '\0' || *end != '\0') can simplify to if (*end) 2) || l > LONG_MAX and || l < LONG_MIN serve no purpose - they are never true.
  • pellucide
    pellucide over 6 years
    @chux On a mac, the errno is set for parsing errors, but on linux the errno is not set. Changed code to depend on the "end" pointer to detect that.
  • Yankee almost 3 years
    You assign 0 to errno, but I don't see any other place where it gets an assignment.
  • Zoe stands with Ukraine
    Zoe stands with Ukraine almost 3 years
    Keep in mind that the second argument in these functions can be used to tell whether the entire string was converted or not. If the resulting size_t isn't equal to the length of the string, then it stopped early. It'll still return 11 in that case, but pos will be 2 instead of the string length 3. coliru.stacked-crooked.com/a/cabe25d64d2ffa29
  • Dan Moulding
    Dan Moulding almost 3 years
    @Yankee Most C library functions return a special value, such as -1, to indicate that an error occurred. Once you know an error has occurred, all you need to do is check errno to find out what the error code is. But no C library function may set errno to 0. And there are no return values from strtol which, by themselves, indicate that strtol returned an error. So errno alone must be used to determine if an error has occurred. Since strtol cannot set errno to 0 on success, errno must be explicitly set to 0 before calling strtol, then checked to see if it is still 0 afterward, to detect errors.
  • Steven Lu
    Steven Lu about 2 years
    It sounds like this answer is recommending the use of strtol. I'm just confused by the str2int code sample given. Is this supposed to show the implementation of strtol? If so, then why is it named str2int?
  • mtraceur
    mtraceur over 1 year
    Careful! Missed a case. strtol will allow arbitrary amounts of leading whitespace, but no trailing whitespace. I think we should probably disallow leading whitespace (add isspace(*s) to the inconvertible case) unless we consciously see a good reason for that not being handled at a higher level, and I think if we allow leading whitespace we should also allow trailing whitespace (do for(; isspace(*end); end += 1); after the call to strtol) unless we see a good reason for that inconsistency.
  • mtraceur
    mtraceur over 1 year
    One other thing I dislike about strtol is that it treats leading zero as signalling octal when base is zero. By now humanity has enough experience with it to know that zero implicitly switching interpretation from decimal to octal causes far more bad than good. Some things are naturally octal (umask for example) but notably those things you never want to interpret as decimal. So I'd also do if(base == 0) for(; s[0] == '0' && s[1] != '\0’; s += 1); before the call to strtol.
  • mtraceur
    mtraceur over 1 year
    One other final polishing touch I would personally probably do is int saved_errno = errno: before calling strtol and errno = saved_errno; in the success return path. This would make the behavior more consistent with the standard library functions, which might clobber a previous errno but never clear it back to zero.
  • mtraceur
    mtraceur over 1 year
    Anyway, this answer is extremely good and should still be the accepted answer to this day.
  • Vladimir Gamalyan
    Vladimir Gamalyan over 1 year
    What this (good one) solution lacks is the ability to work with non-zero-terminated strings (such as string_view).
  • Pharap
    Pharap over 1 year
    @JMiller In fairness, in the case of C and C++ that's because the old fashioned ways usually have distinct flaws, ranging from the awkward, to the inefficient, to the downright unsafe. In std::strtol's case, you have no way of knowing if you've successfully parsed a 0 or if the function failed unless you manually check if the string resolves to 0, and by the time you've done that you're unnecessarily repeating work. The more modern approach (std::from_chars) not only tells you when the function fails, but why it failed as well, which helps provide feedback to the end user.
  • Efren Narvaez over 1 year
    13 years later I am looking up this question, just to find my own answer. And I'll agree with @Pharap, std::from_chars, which didn't exist in 2008, is probably the better answer in C++ now.