Leave dates as strings using read_excel function from pandas in python
Solution 1
I ran into an identical problem, except pandas was oddly converting only some cells into datetimes. I ended up manually converting each cell into a string like so:
def undate(x):
if pd.isnull(x):
return x
try:
return x.strftime('%d/%m/%Y')
except AttributeError:
return x
except Exception:
raise
for i in list_of_possible_date_columns:
df[i] = df[i].apply(undate)
Solution 2
- Using converters{'Date': str} option inside the pandas.read_excel which helps.
pandas.read_excel(xlsx, sheet, converters={'Date': str})
- you can try convert your timestamp back to the original format
df['Date'][0].strftime('%Y/%m/%d')
Solution 3
I tried saving the file in a CSV UTF-8 format
(manually) and used pd.read_csv()
and worked fine.
I tried a bunch of things to figure the same thing with read_excel
. Did not work anything for me. So, I am guessing read_excel
is probably updating your string in a datetime object which you can not control.
Related videos on Youtube
Comments
-
MattB over 1 year
Python 2.7.10
Tried pandas 0.17.1 -- function read_excel
Tried pyexcel 0.1.7 + pyexcel-xlsx 0.0.7 -- function get_records()When using pandas in Python is it possible to read excel files (formats: xls, xlsx) and leave columns containing date or date + time values as strings rather than auto-converting to
datetime.datetime
ortimestamp
types?If this is not possible using pandas can someone suggest an alternate method/library to read xls, xlsx files and leave date column values as strings?
For the pandas solution attempts the
df.info()
and resulting date column types are shown below:>>> df.info() <class 'pandas.core.frame.DataFrame'> Int64Index: 117 entries, 0 to 116 Columns: 176 entries, Mine to Index dtypes: datetime64[ns](2), float64(145), int64(26), object(3) memory usage: 161.8+ KB >>> type(df['Start Date'][0]) Out[6]: pandas.tslib.Timestamp >>> type(df['End Date'][0]) Out[7]: pandas.tslib.Timestamp
Attempt/Approach 1:
def read_as_dataframe(filename, ext): import pandas as pd if ext in ('xls', 'xlsx'): # problem: date columns auto converted to datetime.datetime or timestamp! df = pd.read_excel(filename) # unwanted - date columns converted! return df, name, ext
Attempt/Approach 2:
import pandas as pd # import datetime as datetime # parse_date = lambda x: datetime.strptime(x, '%Y%m%d %H') parse_date = lambda x: x elif ext in ('xls', 'xlsx', ): df = pd.read_excel(filename, parse_dates=False) date_cols = [df.columns.get_loc(c) for c in df.columns if c in ('Start Date', 'End Date')] # problem: date columns auto converted to datetime.datetime or timestamp! df = pd.read_excel(filename, parse_dates=date_cols, date_parser=parse_date)
And have also tried pyexcel library but it does the same auto-magic convert behavior:
Attempt/Approach 3:
import pyexcel as pe import pyexcel.ext.xls import pyexcel.ext.xlsx t0 = time.time() if ext == 'xlsx': records = pe.get_records(file_name=filename) for record in records: print("start date = %s (type=%s), end date = %s (type=%s)" % (record['Start Date'], str(type(record['Start Date'])), record['End Date'], str(type(record['End Date']))) )
-
MattB over 8 yearsAs you can see from my code snippets I am not passing date_cols or date_parser for first style approach above and pandas is auto-magically (and silently) converting the columns to datetime values. I need to find a way to stop that and leave any column containing a date value as a string.
-
MattB over 8 yearsThanks @EdChum will explore csv approach further. The spreadsheets themselves are out of my control (format + content) so I need to work with formats provided to me and I am trying to develop a robust custom method for trying different date formats (time zones) and presenting user with best fit.
-
joris over 8 yearsI think @EdChum is correct. The type you get depends on how the data is typed in the excel file itself. If the column is formatted as datetime data there, you will get datetime data from
read_excel
.
-
-
Amon about 4 yearsI get why he didn't accept this as the answer but this works for me and is really easy. I don't think there's another method that pandas allows which is weird to me
-
FObersteiner about 2 yearsto make this a useful answer, could you please add some more details? for which specific input (excel file) does this give what desired output?