Create multiindex from existing dataframe

22,328

Solution 1

You could simply use groupby in this case, which will create the multi-index automatically when it sums the sales along the requested columns.

df.groupby(['user_id', 'account_num', 'dates']).sales.sum().to_frame()

You should also be able to simply do this:

df.set_index(['user_id', 'account_num', 'dates'])

Although you probably want to avoid any duplicates (e.g. two or more rows with identical user_id, account_num and date values but different sales figures) by summing them, which is why I recommended using groupby.

If you need the multi-index, you can simply access viat new_df.index where new_df is the new dataframe created from either of the two operations above.

And user_id will be level 0 and account_num will be level 1.

Solution 2

For clarification of future users I would like to add the following:

As said by Alexander,

df.set_index(['user_id', 'account_num', 'dates'])

with a possible inplace=True does the job.

The type(df) gives

pandas.core.frame.DataFrame

whereas type(df.index) is indeed the expected

pandas.core.indexes.multi.MultiIndex

Solution 3

Use pd.MultiIndex.from_arrays

lvl0 = currentDataFrame.user_id.values
lvl1 = currentDataFrame.account_num.values

midx = pd.MultiIndex.from_arrays([lvl0, lvl1], names=['level 0', 'level 1'])

Solution 4

There are two ways to do it, albeit not exactly like you have shown, but it works.
Say you have the following df:

      A   B    C      D
0   nil one    1    NaN
1   bar one    5    5.0
2   foo two    3    8.0
3   bar three  2    1.0
4   foo two    4    2.0
5   bar two    6    NaN

1. Workaround 1:

df.set_index('A', append = True, drop = False).reorder_levels(order = [1,0]).sort_index()

This will return:

enter image description here

2. Workaround 2:

df.set_index(['A', 'B']).sort_index()

This will return:
enter image description here

Share:
22,328
puifais
Author by

puifais

I am a data scientist with strong background in image analysis, statistics, and neuroscience :)

Updated on March 23, 2021

Comments

  • puifais
    puifais about 3 years

    I've spent hours browsing everywhere now to try to create a multiindex from dataframe in pandas. This is the dataframe I have (posting excel sheet mockup. I do have this in pandas dataframe):

    have

    And this is what I want:

    want

    I have tried

    newmulti = currentDataFrame.set_index(['user_id','account_num'])
    

    But it returns a dataframe, not a multiindex. Also, I could not figure out how to make 'user_id' level 0 and 'account_num' level 1. I think this must be trivial but I've read so many posts, tutorials, etc. and still could not figure it out. Partly because I'm a very visual person and most posts are not. Please help!