Checking whether data frame is copy or view in Pandas

17,521

Solution 1

Answers from HYRY and Marius in comments!

One can check either by:

  • testing equivalence of the values.base attribute rather than the values attribute, as in:

    df.values.base is df2.values.base instead of df.values is df2.values.

  • or using the (admittedly internal) _is_view attribute (df2._is_view is True).

Thanks everyone!

Solution 2

I've elaborated on this example with pandas 1.0.1. There's not only a boolean _is_view attribute, but also _is_copy which can be None or a reference to the original DataFrame:

df = pd.DataFrame([[1,2,3,4],[5,6,7,8]], index = ['row1','row2'], 
        columns = ['a','b','c','d'])
df2 = df.iloc[0:2, :]
df3 = df.loc[df['a'] == 1, :]

# df is neither copy nor view
df._is_view, df._is_copy
Out[1]: (False, None)

# df2 is a view AND a copy
df2._is_view, df2._is_copy
Out[2]: (True, <weakref at 0x00000236635C2228; to 'DataFrame' at 0x00000236635DAA58>)

# df3 is not a view, but a copy
df3._is_view, df3._is_copy
Out[3]: (False, <weakref at 0x00000236635C2228; to 'DataFrame' at 0x00000236635DAA58>)

So checking these two attributes should tell you not only if you're dealing with a view or not, but also if you have a copy or an "original" DataFrame.

See also this thread for a discussion explaining why you can't always predict whether your code will return a view or not.

Share:
17,521
nick_eu
Author by

nick_eu

Just a social scientist trying to find his way through the world of applied data analysis!

Updated on June 06, 2022

Comments

  • nick_eu
    nick_eu almost 2 years

    Is there an easy way to check whether two data frames are different copies or views of the same underlying data that doesn't involve manipulations? I'm trying to get a grip on when each is generated, and given how idiosyncratic the rules seem to be, I'd like an easy way to test.

    For example, I thought "id(df.values)" would be stable across views, but they don't seem to be:

    # Make two data frames that are views of same data.
    df = pd.DataFrame([[1,2,3,4],[5,6,7,8]], index = ['row1','row2'], 
           columns = ['a','b','c','d'])
    df2 = df.iloc[0:2,:]
    
    # Demonstrate they are views:
    df.iloc[0,0] = 99
    df2.iloc[0,0]
    Out[70]: 99
    
    # Now try and compare the id on values attribute
    # Different despite being views! 
    
    id(df.values)
    Out[71]: 4753564496
    
    id(df2.values)
    Out[72]: 4753603728
    
    # And we can of course compare df and df2
    df is df2
    Out[73]: False
    

    Other answers I've looked up that try to give rules, but don't seem consistent, and also don't answer this question of how to test:

    And of course: - http://pandas.pydata.org/pandas-docs/stable/indexing.html#returning-a-view-versus-a-copy

    UPDATE: Comments below seem to answer the question -- looking at the df.values.base attribute rather than df.values attribute does it, as does a reference to the df._is_copy attribute (though the latter is probably very bad form since it's an internal).