Convert Pandas column containing NaNs to dtype `int`
Solution 1
The lack of NaN rep in integer columns is a pandas "gotcha".
The usual workaround is to simply use floats.
Solution 2
In version 0.24.+ pandas has gained the ability to hold integer dtypes with missing values.
Pandas can represent integer data with possibly missing values using arrays.IntegerArray. This is an extension types implemented within pandas. It is not the default dtype for integers, and will not be inferred; you must explicitly pass the dtype into array() or Series:
arr = pd.array([1, 2, np.nan], dtype=pd.Int64Dtype())
pd.Series(arr)
0 1
1 2
2 NaN
dtype: Int64
For convert column to nullable integers use:
df['myCol'] = df['myCol'].astype('Int64')
Solution 3
My use case is munging data prior to loading into a DB table:
df[col] = df[col].fillna(-1)
df[col] = df[col].astype(int)
df[col] = df[col].astype(str)
df[col] = df[col].replace('-1', np.nan)
Remove NaNs, convert to int, convert to str and then reinsert NANs.
It's not pretty but it gets the job done!
Solution 4
It is now possible to create a pandas column containing NaNs as dtype int, since it is now officially added on pandas 0.24.0
pandas 0.24.x release notes Quote: "Pandas has gained the ability to hold integer dtypes with missing values
Solution 5
Whether your pandas series is object datatype or simply float datatype the below method will work
df = pd.read_csv("data.csv")
df['id'] = df['id'].astype(float).astype('Int64')
Related videos on Youtube
Zhubarb
When you stare long into the thesys, the thesys stares back into you.
Updated on July 08, 2022Comments
-
Zhubarb 12 monthsI read data from a .csv file to a Pandas dataframe as below. For one of the columns, namely
id, I want to specify the column type asint. The problem is theidseries has missing/empty values.When I try to cast the
idcolumn to integer while reading the .csv, I get:df= pd.read_csv("data.csv", dtype={'id': int}) error: Integer column has NA valuesAlternatively, I tried to convert the column type after reading as below, but this time I get:
df= pd.read_csv("data.csv") df[['id']] = df[['id']].astype(int) error: Cannot convert NA to integerHow can I tackle this?
-
Alvaro Fuentes over 9 yearsCould you post the content of your file? -
Zhubarb over 9 years@xndrme, the file itself is too large. I will see if I can create a small test case. But essentially the situation is that theidcolumn has many integer values and some empty/missing cells. -
EdChum over 9 yearsI think that integer values cannot be converted or stored in a series/dataframe if there are missing/NaN values. This I think is to do with numpy compatibility (I'm guessing here), if you want missing value compatibility then I would store the values as floats -
Jeff over 9 yearssee here: pandas.pydata.org/pandas-docs/dev/…; you must have a float dtype when u have missing values (or technically object dtype but that is inefficient); what is your goal of using int type? -
Jeff over 9 yearsFYI, if you don't specify a dtype, then pandas will infer float for the column, no conversion needed. -
ely over 9 yearsI believe this is a NumPy issue, not specific to Pandas. It's a shame since there are so many cases when having an int type that allows for the possibility of null values is much more efficient than a large column of floats. -
dermen almost 8 yearsI have a problem with this too. I have multiple dataframes which I want to merge based on a string representation of several "integer" columns. However, when one of those integer columns has a np.nan, the string casting produces a ".0", which throws off the merge. Just makes things slightly more complicated, would be nice if there was simple work-around. -
mork over 4 years@Rhubarb, Optional Nullable Integer Support is now officially added on pandas 0.24.0 - finally :) - please find an updated answer bellow. pandas 0.24.x release notes
-
-
NumenorForLife about 8 yearsAre there any other workarounds besides treating them like floats? -
Andy Hayden about 8 years@jsc123 you can use the object dtype. This comes with a small health warning but for the most part works well. -
MikeyE almost 7 yearsCan you provide an example of how to use object dtype? I've been looking through the pandas docs and googling, and I've read it's the recommended method. But, I haven't found an example of how to use the object dtype. -
Chris Decker over 4 yearsI have been pulling my hair out trying to load serial numbers where some are null and the rest are floats, this saved me. -
Rishab Gupta over 4 yearsThe OP wants a column of integers. Converting it to string does not meet the condition. -
cs95 about 4 yearsIn v0.24, you can now dodf = df.astype(pd.Int32Dtype())(to convert the entire dataFrame, or)df['col'] = df['col'].astype(pd.Int32Dtype()). Other accepted nullable integer types arepd.Int16Dtypeandpd.Int64Dtype. Pick your poison. -
Winston almost 4 yearsIt is NaN value but isnan checking doesn't work at all :( -
Viacheslav Zhukov over 3 yearsNote that dtype must be"Int64"and not"int64"(first 'i' must be capitalized) -
Sharvari Gc over 3 yearsWorks only if col doesn't already have -1. Otherwise, it will mess with the data -
LoMaPh over 3 yearsdf.myCol = df.myCol.astype('Int64')ordf['myCol'] = df['myCol'].astype('Int64') -
abdoulsn over 3 yearsthen how to get back to int..?? -
Jeremy Caney about 3 yearsIs there a reason you prefer this formulation over that proposed in the accepted answer? If so, it'd be useful to edit your answer to provide that explanation—and especially since there are ten additional answers that are competing for attention. -
SherylHohman about 3 yearsWhile this code may resolve the OP's issue, it is best to include an explanation as to how/why your code addresses it. In this way, future visitors can learn from your post, and apply it to their own code. SO is not a coding service, but a resource for knowledge. Also, high quality, complete answers are more likely to be upvoted. These features, along with the requirement that all posts are self-contained, are some of the strengths of SO as a platform differentiates it from forums. You caneditto add additional info &/or to supplement your explanations with source documentation. -
Bradon almost 3 yearsIf there are NaN values in the column, pd.to_numeric will convert the dtype to float not int because NaN is considered a float. -
wfgeo almost 3 yearsIt may be obvious to some but it I think it is still worth noting that you can use any Int (e.g.Int16,Int32) and indeed probably should if the dataframe is very large to save memory. -
Newbielp over 2 years@jezrael, in which cases this does not work...? It does not work for me and I have failed to find a generic solution. -
jezrael over 2 years@Newbielp - First idea is test ifnanare missing values or strings'nan' -
Newbielp over 2 years@jezrael, it gives me type float for specific element. Whole column is type object. -
zelusp over 2 yearsthis approach can add a lot of memory overhead, especially on larger dataframes -
BERA almost 2 yearsI'm gettingTypeError: cannot safely cast non-equivalent float64 to int64 -
PatrickT over 1 year -
PatrickT over 1 yearThis produces a column of strings!! For a solution with current versions ofpandas, see stackoverflow.com/questions/58029359/… -
Henrique Brisola over 1 year@cs95 I am getting erroobject cannot be converted to an IntegerDtype -
robertspierre over 1 year@Zhubarb please consider changing your accepted answer in light of this development -
buhtz over 1 yearPlease explain your solution. -
Jane Kathambi over 1 yearThank you @Abhishek Bhatia this worked for me. -
Jane Kathambi over 1 yearWorks but I think replacing NaN with 0 changes the meaning of the data. -
bsauce over 1 yearcaution with this approach... if any of your data really is -1, it will be overwritten.