Leave dates as strings using read_excel function from pandas in python

python excel datetime pandas

16,464

Solution 1

I ran into an identical problem, except pandas was oddly converting only some cells into datetimes. I ended up manually converting each cell into a string like so:

def undate(x):
    if pd.isnull(x):
        return x
    try:
        return x.strftime('%d/%m/%Y')
    except AttributeError:
        return x
    except Exception:
        raise

for i in list_of_possible_date_columns:
    df[i] = df[i].apply(undate)

Solution 2

Using converters{'Date': str} option inside the pandas.read_excel which helps. pandas.read_excel(xlsx, sheet, converters={'Date': str})
you can try convert your timestamp back to the original format
df['Date'][0].strftime('%Y/%m/%d')

Solution 3

I tried saving the file in a CSV UTF-8 format (manually) and used pd.read_csv() and worked fine.

I tried a bunch of things to figure the same thing with read_excel. Did not work anything for me. So, I am guessing read_excel is probably updating your string in a datetime object which you can not control.

16,464

MattB

Web Analytics Java / Javascript coder.

Updated on September 09, 2022

Comments

MattB over 1 year
Python 2.7.10
Tried pandas 0.17.1 -- function read_excel
Tried pyexcel 0.1.7 + pyexcel-xlsx 0.0.7 -- function get_records()

When using pandas in Python is it possible to read excel files (formats: xls, xlsx) and leave columns containing date or date + time values as strings rather than auto-converting to datetime.datetime or timestamp types?

If this is not possible using pandas can someone suggest an alternate method/library to read xls, xlsx files and leave date column values as strings?

For the pandas solution attempts the df.info() and resulting date column types are shown below:
```
>>> df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 117 entries, 0 to 116
Columns: 176 entries, Mine to Index
dtypes: datetime64[ns](2), float64(145), int64(26), object(3)
memory usage: 161.8+ KB
>>> type(df['Start Date'][0])
Out[6]: pandas.tslib.Timestamp
>>> type(df['End Date'][0])
Out[7]: pandas.tslib.Timestamp
```
Attempt/Approach 1:
```
def read_as_dataframe(filename, ext):
   import pandas as pd
   if ext in ('xls', 'xlsx'):
      # problem: date columns auto converted to datetime.datetime or timestamp!
      df = pd.read_excel(filename) # unwanted - date columns converted!

   return df, name, ext
```
Attempt/Approach 2:
```
import pandas as pd
# import datetime as datetime
# parse_date = lambda x: datetime.strptime(x, '%Y%m%d %H')
parse_date = lambda x: x
elif ext in ('xls', 'xlsx', ):
    df = pd.read_excel(filename, parse_dates=False)
    date_cols = [df.columns.get_loc(c) for c in df.columns if c in ('Start Date', 'End Date')]
    # problem: date columns auto converted to datetime.datetime or timestamp!
    df = pd.read_excel(filename, parse_dates=date_cols, date_parser=parse_date)
```
And have also tried pyexcel library but it does the same auto-magic convert behavior:

Attempt/Approach 3:
```
import pyexcel as pe
import pyexcel.ext.xls
import pyexcel.ext.xlsx

t0 = time.time()
if ext == 'xlsx':
    records = pe.get_records(file_name=filename)
    for record in records:
        print("start date = %s (type=%s), end date = %s (type=%s)" %
              (record['Start Date'],
               str(type(record['Start Date'])),
               record['End Date'],
               str(type(record['End Date'])))
              )
```
- MattB over 8 years
  
  As you can see from my code snippets I am not passing date_cols or date_parser for first style approach above and pandas is auto-magically (and silently) converting the columns to datetime values. I need to find a way to stop that and leave any column containing a date value as a string.
- MattB over 8 years
  
  Thanks @EdChum will explore csv approach further. The spreadsheets themselves are out of my control (format + content) so I need to work with formats provided to me and I am trying to develop a robust custom method for trying different date formats (time zones) and presenting user with best fit.
- joris over 8 years
  
  I think @EdChum is correct. The type you get depends on how the data is typed in the excel file itself. If the column is formatted as datetime data there, you will get datetime data from read_excel.
Amon about 4 years

I get why he didn't accept this as the answer but this works for me and is really easy. I don't think there's another method that pandas allows which is weird to me
FObersteiner about 2 years

to make this a useful answer, could you please add some more details? for which specific input (excel file) does this give what desired output?