How to read merged Excel cells with NaN into Pandas DataFrame

10,243

The referenced link you attempted needed to forward fill only the index column. For your use case, you need to fillna for all dataframe columns. So, simply forward fill entire dataframe:

df = pd.read_excel("Input.xlsx")
print(df)

#    Order_ID Customer_name            Album_Name           Artist  Quantity
# 0       NaN           NaN            RadioShake              NaN       NaN
# 1       1.0       John H.         The Bodyguard  Whitney Houston       2.0
# 2       NaN           NaN              Lemonade          Beyonce       1.0
# 3       NaN           NaN  The Thrill Of It All        Sam Smith       2.0
# 4       NaN           NaN              Thriller  Michael Jackson      11.0
# 5       NaN           NaN                Divide       Ed Sheeran       4.0
# 6       NaN           NaN            Reputation     Taylor Swift       3.0
# 7       NaN           NaN        Red Pill Blues         Maroon 5       5.0

df = df.fillna(method='ffill')
print(df)

#    Order_ID Customer_name            Album_Name           Artist  Quantity
# 0       NaN           NaN            RadioShake              NaN       NaN
# 1       1.0       John H.         The Bodyguard  Whitney Houston       2.0
# 2       1.0       John H.              Lemonade          Beyonce       1.0
# 3       1.0       John H.  The Thrill Of It All        Sam Smith       2.0
# 4       1.0       John H.              Thriller  Michael Jackson      11.0
# 5       1.0       John H.                Divide       Ed Sheeran       4.0
# 6       1.0       John H.            Reputation     Taylor Swift       3.0
# 7       1.0       John H.        Red Pill Blues         Maroon 5       5.0
Share:
10,243

Related videos on Youtube

CPU
Author by

CPU

Updated on June 04, 2022

Comments

  • CPU
    CPU almost 2 years

    I would like to read an Excel sheet into Pandas DataFrame. However, there are merged Excel cells as well as Null rows (full/partial NaN filled), as shown below. To clarify, John H. has made an order to purchase all the albums from "The Bodyguard" to "Red Pill Blues".

    Excel sheet capture

    When I read this Excel sheet into a Pandas DataFrame, the Excel data does not get transferred correctly. Pandas considers a merged cell as one cell. The DataFrame looks like the following: (Note: Values in () are the desired values that I would like to have there)

    Dataframe capture

    Please note that the last row does not contain merged cells; it only carries a value for Artist column.


    EDIT: I did try the following to forward-fill in the NaN values:(Pandas: Reading Excel with merged cells)
    df.index = pd.Series(df.index).fillna(method='ffill')  
    

    However, the NaN values remain. What strategy or method could I use to populate the DataFrame correctly? Is there a Pandas method of unmerging the cells and duplicating the corresponding contents?