Pandas slicing FutureWarning with 0.21.0

42,117

Solution 1

TL;DR: There is likely a typo or spelling error in the column header names.

This is a change introduced in v0.21.1, and has been explained in the docs at length -

Previously, selecting with a list of labels, where one or more labels were missing would always succeed, returning NaN for missing labels. This will now show a FutureWarning. In the future this will raise a KeyError (GH15747). This warning will trigger on a DataFrame or a Series for using .loc[] or [[]] when passing a list-of-labels with at least 1 missing label.

For example,

df

     A    B  C
0  7.0  NaN  8
1  3.0  3.0  5
2  8.0  1.0  7
3  NaN  0.0  3
4  8.0  2.0  7

Try some kind of slicing as you're doing -

df.loc[df.A.gt(6), ['A', 'C']]

     A  C
0  7.0  8
2  8.0  7
4  8.0  7

No problem. Now, try replacing C with a non-existent column label -

df.loc[df.A.gt(6), ['A', 'D']]
FutureWarning: Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.
     
     A   D
0  7.0 NaN
2  8.0 NaN
4  8.0 NaN

So, in your case, the error is because of the column labels you pass to loc. Take another look at them.

Solution 2

This error also occurs with .append call when the list contains new columns. To avoid this

Use:

df=df.append(pd.Series({'A':i,'M':j}), ignore_index=True)

Instead of,

df=df.append([{'A':i,'M':j}], ignore_index=True)

Full error message:

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py:1472: FutureWarning: Passing list-likes to .loc or with any missing label will raise KeyError in the future, you can use .reindex() as an alternative.

Thanks to https://stackoverflow.com/a/50230080/207661

Solution 3

If you want to retain the index you can pass list comprehension instead of a column list:

loan_data_inputs_train.loc[:,[i for i in List_col_without_reference_cat]]
Share:
42,117
QuinRiva
Author by

QuinRiva

Updated on July 13, 2022

Comments

  • QuinRiva
    QuinRiva almost 2 years

    I'm trying to select a subset of a subset of a dataframe, selecting only some columns, and filtering on the rows.

    df.loc[df.a.isin(['Apple', 'Pear', 'Mango']), ['a', 'b', 'f', 'g']]
    

    However, I'm getting the error:

    Passing list-likes to .loc or [] with any missing label will raise
    KeyError in the future, you can use .reindex() as an alternative.
    

    What 's the correct way to slice and filter now?

  • Oren Ben-Kiki
    Oren Ben-Kiki over 4 years
    What if I want a KeyError to be raised if there are any missing labels? That is, if I actually want the new behavior? Right now .loc[list_of_names] will give this warning, which I do not want to see. Any way to disable it?
  • cs95
    cs95 over 4 years
    @OrenBen-Kiki Simplest way is to update to the latest version, it throws a KeyError in the latest versions.
  • Oren Ben-Kiki
    Oren Ben-Kiki over 4 years
    Yes, but it also gives the warning... Currently my choices are to use reindex (and lose the safety that loc gives me), or use loc and get a ton of warnings. Is there a third option (get the safety and not get the warnings)?
  • cs95
    cs95 over 4 years
    @OrenBen-Kiki Like I said, in more recent versions (from at least 0.25, possibly earlier versions), it throws a KeyError straight away. Now I'm not sure what you mean by "warnings", are you referring to the traceback?
  • Oren Ben-Kiki
    Oren Ben-Kiki over 4 years
    Ah, got it, the warning is only generated if there is an actual missing key. Which is the only behavior which makes sense ;-) It would have been so nice if KeyError actually specified what the value of the key was...
  • cs95
    cs95 over 4 years
    @OrenBen-Kiki it does... At the bottom of the traceback message, in IPython at least. Don't think you can save or print it when catching an error, it's just a readable traceback dump.
  • Dave Bost - MSFT
    Dave Bost - MSFT over 4 years
    @OrenBen-Kiki, @cs95 - When using JupyterLab, I needed to use warnings.simplefilter('error', FutureWarning) in order to get a Traceback and actually see what piece of my code caused the FutureWarning. Reference
  • datariel
    datariel almost 4 years
    Similarly, if you need to append two dataframes you can use df = df.append(pd.DataFrame([dict]), ignore_index=True)
  • MERose
    MERose over 3 years
    As the question title says, this is a warning, not an error