Pandas long to wide reshape, by two variables
Solution 1
A simple pivot might be sufficient for your needs but this is what I did to reproduce your desired output:
df['idx'] = df.groupby('Salesman').cumcount()
Just adding a within group counter/index will get you most of the way there but the column labels will not be as you desired:
print df.pivot(index='Salesman',columns='idx')[['product','price']]
product price
idx 0 1 2 0 1 2
Salesman
Knut bat ball wand 5 1 3
Steve pen NaN NaN 2 NaN NaN
To get closer to your desired output I added the following:
df['prod_idx'] = 'product_' + df.idx.astype(str)
df['prc_idx'] = 'price_' + df.idx.astype(str)
product = df.pivot(index='Salesman',columns='prod_idx',values='product')
prc = df.pivot(index='Salesman',columns='prc_idx',values='price')
reshape = pd.concat([product,prc],axis=1)
reshape['Height'] = df.set_index('Salesman')['Height'].drop_duplicates()
print reshape
product_0 product_1 product_2 price_0 price_1 price_2 Height
Salesman
Knut bat ball wand 5 1 3 6
Steve pen NaN NaN 2 NaN NaN 5
Edit: if you want to generalize the procedure to more variables I think you could do something like the following (although it might not be efficient enough):
df['idx'] = df.groupby('Salesman').cumcount()
tmp = []
for var in ['product','price']:
df['tmp_idx'] = var + '_' + df.idx.astype(str)
tmp.append(df.pivot(index='Salesman',columns='tmp_idx',values=var))
reshape = pd.concat(tmp,axis=1)
@Luke said:
I think Stata can do something like this with the reshape command.
You can but I think you also need a within group counter to get the reshape in stata to get your desired output:
+-------------------------------------------+
| salesman idx height product price |
|-------------------------------------------|
1. | Knut 0 6 bat 5 |
2. | Knut 1 6 ball 1 |
3. | Knut 2 6 wand 3 |
4. | Steve 0 5 pen 2 |
+-------------------------------------------+
If you add idx
then you could do reshape in stata
:
reshape wide product price, i(salesman) j(idx)
Solution 2
Here's another solution more fleshed out, taken from Chris Albon's site.
Create "long" dataframe
raw_data = {'patient': [1, 1, 1, 2, 2],
'obs': [1, 2, 3, 1, 2],
'treatment': [0, 1, 0, 1, 0],
'score': [6252, 24243, 2345, 2342, 23525]}
df = pd.DataFrame(raw_data, columns = ['patient', 'obs', 'treatment', 'score'])
Make a "wide" data
df.pivot(index='patient', columns='obs', values='score')
Solution 3
Karl D's solution gets at the heart of the problem. But I find it's far easier to pivot everything (with .pivot_table
because of the two index columns) and then sort
and assign the columns to collapse the MultiIndex
:
df['idx'] = df.groupby('Salesman').cumcount()+1
df = df.pivot_table(index=['Salesman', 'Height'], columns='idx',
values=['product', 'price'], aggfunc='first')
df = df.sort_index(axis=1, level=1)
df.columns = [f'{x}_{y}' for x,y in df.columns]
df = df.reset_index()
Output:
Salesman Height price_1 product_1 price_2 product_2 price_3 product_3
0 Knut 6 5.0 bat 1.0 ball 3.0 wand
1 Steve 5 2.0 pen NaN NaN NaN NaN
Solution 4
A bit old but I will post this for other people.
What you want can be achieved, but you probably shouldn't want it ;) Pandas supports hierarchical indexes for both rows and columns. In Python 2.7.x ...
from StringIO import StringIO
raw = '''Salesman Height product price
Knut 6 bat 5
Knut 6 ball 1
Knut 6 wand 3
Steve 5 pen 2'''
dff = pd.read_csv(StringIO(raw), sep='\s+')
print dff.set_index(['Salesman', 'Height', 'product']).unstack('product')
Produces a probably more convenient representation than what you were looking for
price
product ball bat pen wand
Salesman Height
Knut 6 1 5 NaN 3
Steve 5 NaN NaN 2 NaN
The advantage of using set_index and unstacking vs a single function as pivot is that you can break the operations down into clear small steps, which simplifies debugging.
Solution 5
pivoted = df.pivot('salesman', 'product', 'price')
pg. 192 Python for Data Analysis
Related videos on Youtube
Luke
Updated on February 02, 2022Comments
-
Luke over 2 years
I have data in long format and am trying to reshape to wide, but there doesn't seem to be a straightforward way to do this using melt/stack/unstack:
Salesman Height product price Knut 6 bat 5 Knut 6 ball 1 Knut 6 wand 3 Steve 5 pen 2
Becomes:
Salesman Height product_1 price_1 product_2 price_2 product_3 price_3 Knut 6 bat 5 ball 1 wand 3 Steve 5 pen 2 NA NA NA NA
I think Stata can do something like this with the reshape command.
-
smci about 6 yearsDo you really care that the two pivoted variables be interlaced:
product_1 price_1 product_2 price_2 product_3 price_3
? Can they just beproduct_1 product_2 ... price_1 price_2 ...
? -
Luke about 6 yearsYeah that doesn't matter.
-
Toto Lele over 2 yearsThis is a similar long-to-wide question that uses
pivot
andjoin
: stackoverflow.com/a/65839968/7434285.
-
-
Luke about 10 yearsWorks well. This would be a nice feature for pandas. There's already wide_to_long, why not the other direction.
-
Karl D. about 10 yearsAgreed ... that kind of reshape is one of the more useful tools in stata.
-
Luke about 10 yearsYeah, that's basically what I ended up doing, although you also have to separate out the columns that don't change, like height, drop duplicates and then concat those later.
-
d8aninja over 9 yearsWhen using this method (from the book) I get "ValueError: Index contains duplicate entries, cannot reshape" even after I've used df.drop_duplicates()
-
devinbost over 6 yearsWhy are you still using Python 2.7? How about in Python 3?
-
Nate over 6 yearsFor python3, you do
from io import StringIO
and use print as a function and all is well. Basic idea of setting the index and unstacking works identically. -
jlplenio about 4 yearsThank you so much. Though I already had the idx col in my dataframe, with your solution is was able to bring repeated measures from long to wide format. Pandas has this for wide_to_long() but doesn't offer it for long_to_wide. Sad.
-
Harshit Kakkar about 3 yearsHello , could you please help me with stackoverflow.com/questions/66964780/…
-
Moses about 3 yearsQuite relatable from Stata background.
-
deeenes over 2 yearsI find this the only working solution.
pivot
fails because it is unable to carry over columns that are neither index, variable or value and is unable to make use of a multi index.pivot_table
fails due to its low performance, it can't handle anything larger than a few thousands rows. -
Harshal Gajare over 2 yearssimple and elegant!