Error when trying to apply log method to pandas data frame column in Python

19,911

This happens when the datatype of the column is not numeric. Try

arr['retlog'] = log(arr['close'].astype('float64')/arr['close'].astype('float64').shift(1))

I suspect that the numbers are stored as generic 'object' types, which I know causes log to throw that error. Here is a simple illustration of the problem:

In [15]: np.log(Series([1,2,3,4], dtype='object'))
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-15-25deca6462b7> in <module>()
----> 1 np.log(Series([1,2,3,4], dtype='object'))

AttributeError: log

In [16]: np.log(Series([1,2,3,4], dtype='float64'))
Out[16]: 
0    0.000000
1    0.693147
2    1.098612
3    1.386294
dtype: float64

Your attempt with math.log did not work because that function is designed for single numbers (scalars) only, not lists or arrays.

For what it's worth, I think this is a confusing error message; it once stumped me for awhile, anyway. I wonder if it can be improved.

Share:
19,911

Related videos on Youtube

user2460677
Author by

user2460677

Updated on September 15, 2022

Comments

  • user2460677
    user2460677 over 1 year

    So, I am very new to Python and Pandas (and programming in general), but am having trouble with a seemingly simple function. So I created the following dataframe using data pulled with a SQL query (if you need to see the SQL query, let me know and I'll paste it)

    spydata = pd.DataFrame(row,columns=['date','ticker','close', 'iv1m', 'iv3m'])
    tickerlist = unique(spydata[spydata['date'] == '2013-05-31'])
    

    After that, I have written a function to create some new columns in the dataframe using the data already held in it:

    def demean(arr):
        arr['retlog'] = log(arr['close']/arr['close'].shift(1))
    
        arr['10dvol'] = sqrt(252)*sqrt(pd.rolling_std(arr['ret'] , 10 ))  
        arr['60dvol'] = sqrt(252)*sqrt(pd.rolling_std(arr['ret'] , 10 ))  
        arr['90dvol'] = sqrt(252)*sqrt(pd.rolling_std(arr['ret'] , 10 ))  
        arr['1060rat'] = arr['10dvol']/arr['60dvol']
        arr['1090rat'] = arr['10dvol']/arr['90dvol']
        arr['60dis'] = (arr['1060rat'] - arr['1060rat'].mean())/arr['1060rat'].std()
        arr['90dis'] = (arr['1090rat'] - arr['1090rat'].mean())/arr['1090rat'].std()
        return arr
    

    The only part that I'm having a problem with is the first line of the function:

    arr['retlog'] = log(arr['close']/arr['close'].shift(1))
    

    Which, when I run, with this command, I get an error:

    result = spydata.groupby(['ticker']).apply(demean)
    

    Error:

        ---------------------------------------------------------------------------
    AttributeError                            Traceback (most recent call last)
    <ipython-input-196-4a66225e12ea> in <module>()
    ----> 1 result = spydata.groupby(['ticker']).apply(demean)
          2 results2 = result[result.date == result.date.max()]
          3 
    
    C:\Python27\lib\site-packages\pandas-0.11.0-py2.7-win32.egg\pandas\core\groupby.pyc in apply(self, func, *args, **kwargs)
        323         func = _intercept_function(func)
        324         f = lambda g: func(g, *args, **kwargs)
    --> 325         return self._python_apply_general(f)
        326 
        327     def _python_apply_general(self, f):
    
    C:\Python27\lib\site-packages\pandas-0.11.0-py2.7-win32.egg\pandas\core\groupby.pyc in _python_apply_general(self, f)
        326 
        327     def _python_apply_general(self, f):
    --> 328         keys, values, mutated = self.grouper.apply(f, self.obj, self.axis)
        329 
        330         return self._wrap_applied_output(keys, values,
    
    C:\Python27\lib\site-packages\pandas-0.11.0-py2.7-win32.egg\pandas\core\groupby.pyc in apply(self, f, data, axis, keep_internal)
        632             # group might be modified
        633             group_axes = _get_axes(group)
    --> 634             res = f(group)
        635             if not _is_indexed_like(res, group_axes):
        636                 mutated = True
    
    C:\Python27\lib\site-packages\pandas-0.11.0-py2.7-win32.egg\pandas\core\groupby.pyc in <lambda>(g)
        322         """
        323         func = _intercept_function(func)
    --> 324         f = lambda g: func(g, *args, **kwargs)
        325         return self._python_apply_general(f)
        326 
    
    <ipython-input-195-47b6faa3f43c> in demean(arr)
          1 def demean(arr):
    ----> 2     arr['retlog'] = log(arr['close']/arr['close'].shift(1))
          3     arr['10dvol'] = sqrt(252)*sqrt(pd.rolling_std(arr['ret'] , 10 ))
          4     arr['60dvol'] = sqrt(252)*sqrt(pd.rolling_std(arr['ret'] , 10 ))
          5     arr['90dvol'] = sqrt(252)*sqrt(pd.rolling_std(arr['ret'] , 10 ))
    
    AttributeError: log
    

    I have tried changing the function to np.log as well as math.log, in which case I get the error

    TypeError: only length-1 arrays can be converted to Python scalars
    

    I've tried looking this up, but haven't found anything directly applicable. Any clues?

  • Jeff
    Jeff almost 11 years
    @Dan why don't you open an issue on seeing if there are situations where this error can be trapped / improved
  • Andy Hayden
    Andy Hayden almost 11 years
    @Jeff looks like wes posted this on numpy over four years ago... github.com/numpy/numpy/issues/1611 (!)
  • patricksurry
    patricksurry over 4 years
    I run into this fairly regularly when I create a column from a list containing a mixture of float and None values. As long as there's at least one number, s = pd.Series([... values ...]) has a numeric type, so something like np.log(s) works, but s = pd.Series([None, None, ...]) has type object and np.log(s) fails. A recent example of mine boiled down to: s = pd.Series([None, None]); np.log(s.where(s, 1)) which fails with AttributeError: 'int' object has no attribute 'log': even tho s.where(s, 1) is a column of 1s it maintains dtype Object :(