How to check if an element is an empty list in pandas?

14,132

Solution 1

You can do this:

df[df["col"].str.len() != 0]

Example:

import pandas as pd
df = pd.DataFrame({"col": [[1], [2, 3], [], [4, 5, 6], []]}, dtype=object)
print(df[df["col"].str.len() != 0])
#          col
# 0        [1]
# 1     [2, 3]
# 3  [4, 5, 6]

Solution 2

This is probably the most efficient solution.

df[df["col"].astype(bool)]

Solution 3

Try this:

df[df['col'].apply(len).gt(0)]

Solution 4

bool

An empty list in a boolean context is False. An empty list is what we call falsey. It does a programmer well to know what objects are falsey and truthy.

You can also slice a dataframe with a boolean list (not just a boolean series). And so, I'll use a comprehension to speed up the checking.

df[[bool(x) for x in df.col]]

Or with even less characters

df[[*map(bool, df.col)]]
Share:
14,132

Related videos on Youtube

Blaszard
Author by

Blaszard

I'm here to gain knowledge and insights on a variety of fields I'm interested in. Specifically, Programming & Software Development (Python and R; no longer use Swift and JavaScript/node.js) Data Science, Machine Learning, AI, & statistics Travel (started in 2016) Language (普通话, français, español, italiano, русский, 한국어) Politics, Economics, and Finance Currently (in 2020), my primary interest is Korean and Russian😈 PS: I'm not a native-English speaker. If you find any errors in my grammar and expressions, don't hesitate to edit it. I'll appreciate it👨🏻‍💼

Updated on October 31, 2022

Comments

  • Blaszard
    Blaszard less than a minute

    One of the column in my df stores a list, and some of the raws have empty items in the list. For example:

    []

    ["X", "Y"]

    []

    etc...

    How can only take the raw whose list is not empty?

    The following code does not work.

    df[df["col"] != []] # ValueError: Lengths must match to compare
    df[pd.notnull(df["col"])] # The code doesn't issue an error but the result includes an empty list
    df[len(df["col"]) != 0] # KeyError: True
    
  • Blaszard
    Blaszard over 3 years
    The code works, thanks. But could you give me more explanation, especially why you need .str here? It is very unintuitive and near impossible to get to the code, unless you read the official doc from the top to the bottom.
  • jdehesa
    jdehesa over 3 years
    @Blaszard It is a bit of a "trick". functions under .str are meant to be used with string data. They are not really vectorized, it's just application of functions to each data item. In the case of len, it just applies the function len to each object, so it works fine for strings, lists, or any other object to which len can be applied. Quan Hoang's answer may be more meaningful.
  • Quang Hoang
    Quang Hoang over 3 years
    @piRSquared Thanks. Certainly, and some other answers say exactly that.
  • piRSquared
    piRSquared over 3 years
    Ahh, I see that now
  • weezilla over 1 year
    Agree with jdehesa, this str len method is a good easy trick to remember. But beware of execution times on large dataframes. The Quang Hoang method seems to be vectorized and is MUCH faster.