Change timezone of date-time column in pandas and add as hierarchical index

56,160

Solution 1

By now this has been fixed. For example, you can now call:

dataframe.tz_localize('UTC', level=0)

You'll have to call it twice for the given example, though. (I.e., once for each level.)

Solution 2

If you set it as the index, it's automatically converted to an Index:

In [11]: dat.index = pd.to_datetime(dat.pop('datetime'), utc=True)

In [12]: dat
Out[12]:
                    label  value
datetime
2011-07-19 07:00:00     a      0
2011-07-19 08:00:00     a      1
2011-07-19 09:00:00     a      2
2011-07-19 07:00:00     b      3
2011-07-19 08:00:00     b      4
2011-07-19 09:00:00     b      5

Then do the tz_localize:

In [12]: dat.index = dat.index.tz_localize('UTC').tz_convert('US/Pacific')

In [13]: dat
Out[13]:
                          label  value
datetime
2011-07-19 00:00:00-07:00     a      0
2011-07-19 01:00:00-07:00     a      1
2011-07-19 02:00:00-07:00     a      2
2011-07-19 00:00:00-07:00     b      3
2011-07-19 01:00:00-07:00     b      4
2011-07-19 02:00:00-07:00     b      5

And then you can append the label column to the index:

Hmmm this is definitely a bug!

In [14]: dat.set_index('label', append=True).swaplevel(0, 1)
Out[14]:
                           value
label datetime
a     2011-07-19 07:00:00      0
      2011-07-19 08:00:00      1
      2011-07-19 09:00:00      2
b     2011-07-19 07:00:00      3
      2011-07-19 08:00:00      4
      2011-07-19 09:00:00      5

A hacky workaround is to convert the (datetime) level directly (when it's already a MultiIndex):

In [15]: dat.index.levels[1] = dat.index.get_level_values(1).tz_localize('UTC').tz_convert('US/Pacific')

In [16]: dat1
Out[16]:
                                 value
label datetime
a     2011-07-19 00:00:00-07:00      0
      2011-07-19 01:00:00-07:00      1
      2011-07-19 02:00:00-07:00      2
b     2011-07-19 00:00:00-07:00      3
      2011-07-19 01:00:00-07:00      4
      2011-07-19 02:00:00-07:00      5

Solution 3

An other workaround which works in pandas 0.13.1, and solves the FrozenList can not be assigned problem:

index.levels = pandas.core.base.FrozenList([
    index.levels[0].tz_localize('UTC').tz_convert(tz),
    index.levels[1].tz_localize('UTC').tz_convert(tz)
])

Struggling a lot with this issue, MultiIndex loses tz in many other conditions too.

Share:
56,160
Erik Shilts
Author by

Erik Shilts

Director of Data Science and Analytics Experience: Microsoft Opower Freddie Mac

Updated on July 09, 2022

Comments

  • Erik Shilts
    Erik Shilts almost 2 years

    I have data with a time-stamp in UTC. I'd like to convert the timezone of this timestamp to 'US/Pacific' and add it as a hierarchical index to a pandas DataFrame. I've been able to convert the timestamp as an Index, but it loses the timezone formatting when I try to add it back into the DataFrame, either as a column or as an index.

    >>> import pandas as pd
    >>> dat = pd.DataFrame({'label':['a', 'a', 'a', 'b', 'b', 'b'], 'datetime':['2011-07-19 07:00:00', '2011-07-19 08:00:00', '2011-07-19 09:00:00', '2011-07-19 07:00:00', '2011-07-19 08:00:00', '2011-07-19 09:00:00'], 'value':range(6)})
    >>> dat.dtypes
    #datetime    object
    #label       object
    #value        int64
    #dtype: object
    

    Now if I try to convert the Series directly I run into an error.

    >>> times = pd.to_datetime(dat['datetime'])
    >>> times.tz_localize('UTC')
    #Traceback (most recent call last):
    #  File "<stdin>", line 1, in <module>
    #  File "/Users/erikshilts/workspace/schedule-detection/python/pysched/env/lib/python2.7/site-packages/pandas/core/series.py", line 3170, in tz_localize
    #    raise Exception('Cannot tz-localize non-time series')
    #Exception: Cannot tz-localize non-time series
    

    If I convert it to an Index then I can manipulate it as a timeseries. Notice that the index now has the Pacific timezone.

    >>> times_index = pd.Index(times)
    >>> times_index_pacific = times_index.tz_localize('UTC').tz_convert('US/Pacific')
    >>> times_index_pacific
    #<class 'pandas.tseries.index.DatetimeIndex'>
    #[2011-07-19 00:00:00, ..., 2011-07-19 02:00:00]
    #Length: 6, Freq: None, Timezone: US/Pacific
    

    However, now I run into problems adding the index back to the dataframe as it loses its timezone formatting:

    >>> dat_index = dat.set_index([dat['label'], times_index_pacific])
    >>> dat_index
    #                                      datetime label  value
    #label                                                      
    #a     2011-07-19 07:00:00  2011-07-19 07:00:00     a      0
    #      2011-07-19 08:00:00  2011-07-19 08:00:00     a      1
    #      2011-07-19 09:00:00  2011-07-19 09:00:00     a      2
    #b     2011-07-19 07:00:00  2011-07-19 07:00:00     b      3
    #      2011-07-19 08:00:00  2011-07-19 08:00:00     b      4
    #      2011-07-19 09:00:00  2011-07-19 09:00:00     b      5
    

    You'll notice the index is back on the UTC timezone instead of the converted Pacific timezone.

    How can I change the timezone and add it as an index to a DataFrame?

  • Erik Shilts
    Erik Shilts almost 11 years
    There are two problems that I'm encountering with this: 1) I can't call tz_localize or tz_convert on a MultiIndex; 2) Accessing the hour field from a single Index still gives me the array [7,8,9,7,8,9] when I'd like the Pacific values (i.e. [0, 1, 2, 0, 1, 2]).
  • Andy Hayden
    Andy Hayden almost 11 years
    Sorry about that, this is definitely a bug (thanks for finding it)! I've added a workaround (which is to convert the datetime level once it's a MultiIndex)...