Python, Pandas ; ValueError('window must be an integer',)

14,131

Solution 1

This is an error from Pandas. You are passing a string to df.rolling, but it expects only integer values. You probably want to pass int(new) instead.

Edit: as noted below, evidently the Pandas documentation is incomplete, and the real ultimate problem in this case is probably the lack of a time index, since creating a naive Dataframe and passing values like "10d" definitely raises the indicated error:

In [2]: df = pd.DataFrame({'B': [0, 1, 2, 10, 4]})

In [3]: df.rolling('10d')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-2a9875316cd7> in <module>
----> 1 df.rolling('10d')

~/anaconda/lib/python3.7/site-packages/pandas/core/generic.py in rolling(self, window, min_periods, center, win_type, on, axis, closed)
   8906                                    min_periods=min_periods,
   8907                                    center=center, win_type=win_type,
-> 8908                                    on=on, axis=axis, closed=closed)
   8909
   8910         cls.rolling = rolling

~/anaconda/lib/python3.7/site-packages/pandas/core/window.py in rolling(obj, win_type, **kwds)
   2467         return Window(obj, win_type=win_type, **kwds)
   2468
-> 2469     return Rolling(obj, **kwds)
   2470
   2471

~/anaconda/lib/python3.7/site-packages/pandas/core/window.py in __init__(self, obj, window, min_periods, center, win_type, axis, on, closed, **kwargs)
     78         self.win_freq = None
     79         self.axis = obj._get_axis_number(axis) if axis is not None else None
---> 80         self.validate()
     81
     82     @property

~/anaconda/lib/python3.7/site-packages/pandas/core/window.py in validate(self)
   1476
   1477         elif not is_integer(self.window):
-> 1478             raise ValueError("window must be an integer")
   1479         elif self.window < 0:
   1480             raise ValueError("window must be non-negative")

ValueError: window must be an integer

Solution 2

As of today, the documentation states as follows:

window : int, or offset

Size of the moving window. This is the number of observations used for calculating the statistic. Each window will be a fixed size.

If its an offset then this will be the time period of each window. Each window will be a variable sized based on the observations included in the time-period. This is only valid for datetimelike indexes. This is new in 0.19.0

It is not clear from me whether the time information is a column in your dataframe or part of a MultiIndex. For the first case, you can use .set_index('time').

For MultiIndex, currently, you cannot use offsets. See the related issue. If that works, you can use .reset_index() to transform it into a single index dataframe (see here).

Update: you can also pass datetime columns for offset-based rolling metrics with the on parameter (and, therefore, you do not have to have an index).

Solution 3

df.rolling can also handle time periods. Make sure the date time is in pandas format. If not, convert as such -

data['col'] = pd.to_datetime(data['col'])
Share:
14,131
Christophe Foyer
Author by

Christophe Foyer

Updated on July 17, 2022

Comments

  • Christophe Foyer
    Christophe Foyer almost 2 years

    I seem to be having this issue with Pandas code inside a Bokeh callback.

    Here's part of the output before the error. My dataframe seems normal and I'm not sure why it's upset

                         time  temperature
    0 2016-03-17 11:00:00        4.676
    1 2016-03-17 11:30:00        4.633
    2 2016-03-17 12:00:00        4.639
    3 2016-03-17 12:30:00        4.603
    4 2016-03-17 13:00:00        4.615
    5 2016-03-17 13:30:00        4.650
    6 2016-03-17 14:00:00        4.678
    7 2016-03-17 14:30:00        4.698
    8 2016-03-17 15:00:00        4.753
    9 2016-03-17 15:30:00        4.847
    ERROR:bokeh.server.protocol_handler:error handling message Message 'PATCH-DOC' (
    revision 1): ValueError('window must be an integer',)
    

    And here's the code I changed from the flask embed example (link here):

    def callback(attr, old, new):
            df = pd.DataFrame.from_dict(source.data.copy())
            print df[:10]
            if new == 0:
                data = df
            else:
                data = df.rolling('{0}D'.format(new)).mean()
            source.data = ColumnDataSource(data=data).data
    
        slider = Slider(start=0, end=30, value=0, step=1, title="Smoothing by N Days")
        slider.on_change('value', callback)
    

    I can also include the full code if that help, but the main change I have is just a doc.add_periodic_callback() that fetches new data periodically.

  • Christophe Foyer
    Christophe Foyer almost 6 years
    thanks for the reply, I'm still not sure why pandas isn't happy since I didn't change anything from the source code but I'll try and figure it out. I didn't realize bokeh was just passing the error message
  • tillmo
    tillmo over 5 years
    The answer is just wrong. df.rolling can also handle time periods like '10D' (which will be the case here if new is 10). So the error must have a different cause.
  • tillmo
    tillmo over 5 years
    Maybe the problem is that the newly fetched data does not have a time index?
  • bigreddot
    bigreddot over 5 years
    Perhaps you should make a helpful PR to update the Pandas documentation, since it currently clearly states that an integer is expected for window.
  • pbaranski
    pbaranski about 5 years
    .set_index('time') fixed my dataframe and problem gone then with mean()
  • Oer
    Oer about 3 years
    Make sure you convert to datetime type and then run "set_index" on that column. That fixed it for me.
  • Harald Thomson
    Harald Thomson about 3 years
    This happens if the index is not a pandas time index. See @madhurs answer