How to remove timezone from a Timestamp column in a pandas dataframe

75,369

Solution 1

The column must be a datetime dtype, for example after using pd.to_datetime. Then, you can use tz_localize to change the time zone, a naive timestamp corresponds to time zone None:

testdata['time'].dt.tz_localize(None)

Unless the column is an index (DatetimeIndex), the .dt accessor must be used to access pandas datetime functions.

Solution 2

When your data contains datetimes spanning different timezones or prior and after application of daylight saving time e.g. obtained from postges database with psycopg2, depending on pandas version you might end up in some of the scenarios where best method of conversion is:

testdata['time'].apply(lambda x: x.replace(tzinfo=None))

Scenarios when this works (note the usage of FixedOffsetTimezone with different offset) while usage of .dt.tz_localize(None) does not:

df = pd.DataFrame([
    datetime.datetime(2018, 5, 17, 21, 40, 20, 775854, 
                      tzinfo=psycopg2.tz.FixedOffsetTimezone(offset=120, name=None)),
    datetime.datetime(2021, 3, 17, 14, 36, 13, 902741, 
                      tzinfo=psycopg2.tz.FixedOffsetTimezone(offset=60, name=None))
])

pd.__version__
'0.24.2'


df[0].dt.tz_localize(None)

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/pandas/core/arrays/datetimes.py", line 1861, in objects_to_datetime64ns
    values, tz_parsed = conversion.datetime_to_datetime64(data)
  File "pandas/_libs/tslibs/conversion.pyx", line 185, in pandas._libs.tslibs.conversion.datetime_to_datetime64
ValueError: Array must be all same time zone
pd.__version__
'1.1.2'


df[0].dt.tz_localize(None)

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.8/site-packages/pandas/core/generic.py", line 5132, in __getattr__
    return object.__getattribute__(self, name)
  File "/usr/local/lib/python3.8/site-packages/pandas/core/accessor.py", line 187, in __get__
    accessor_obj = self._accessor(obj)
  File "/usr/local/lib/python3.8/site-packages/pandas/core/indexes/accessors.py", line 480, in __new__
    raise AttributeError("Can only use .dt accessor with datetimelike values")
AttributeError: Can only use .dt accessor with datetimelike values

Solution 3

I know that you mentioned that your timestamps are already in UTC, but just to be defensive, you might as well make your code impervious to the case where timestamps (some or all of them) were in a different timezone. This doesn't cost anything, and will be more robust:

newcol = testdata['time'].dt.tz_convert(None)

As per the docs:

A tz of None will convert to UTC and remove the timezone information.

This is safer than just dropping any timezone the timestamps may contain.

Share:
75,369
Dave X
Author by

Dave X

merge keep

Updated on July 17, 2022

Comments

  • Dave X
    Dave X almost 2 years

    I read Pandas change timezone for forex DataFrame but I'd like to make the time column of my dataframe timezone naive for interoperability with an sqlite3 database.

    The data in my pandas dataframe is already converted to UTC data, but I do not want to have to maintain this UTC timezone information in the database.

    Given a sample of the data derived from other sources, it looks like this:

    print(type(testdata))
    print(testdata)
    print(testdata.applymap(type))
    

    gives:

    <class 'pandas.core.frame.DataFrame'>
                            time  navd88_ft  station_id  new
    0  2018-03-07 01:31:02+00:00  -0.030332          13    5
    1  2018-03-07 01:21:02+00:00  -0.121653          13    5
    2  2018-03-07 01:26:02+00:00  -0.072945          13    5
    3  2018-03-07 01:16:02+00:00  -0.139917          13    5
    4  2018-03-07 01:11:02+00:00  -0.152085          13    5
                                         time        navd88_ft     station_id  \
    0  <class 'pandas._libs.tslib.Timestamp'>  <class 'float'>  <class 'int'>   
    1  <class 'pandas._libs.tslib.Timestamp'>  <class 'float'>  <class 'int'>   
    2  <class 'pandas._libs.tslib.Timestamp'>  <class 'float'>  <class 'int'>   
    3  <class 'pandas._libs.tslib.Timestamp'>  <class 'float'>  <class 'int'>   
    4  <class 'pandas._libs.tslib.Timestamp'>  <class 'float'>  <class 'int'>   
    
                 new  
    0  <class 'int'>  
    1  <class 'int'>  
    2  <class 'int'>  
    3  <class 'int'>  
    4  <class 'int'>  
    

    but

    newstamp = testdata['time'].tz_convert(None)
    

    gives an eventual error:

    TypeError: index is not a valid DatetimeIndex or PeriodIndex
    

    What do I do to replace the column with a timezone naive timestamp?