reshape a pandas dataframe

58,864

Solution 1

The pd.wide_to_long function is built almost exactly for this situation, where you have many of the same variable prefixes that end in a different digit suffix. The only difference here is that your first set of variables don't have a suffix, so you will need to rename your columns first.

The only issue with pd.wide_to_long is that it must have an identification variable, i, unlike melt. reset_index is used to create a this uniquely identifying column, which is dropped later. I think this might get corrected in the future.

df1 = df.rename(columns={'A':'A1', 'B':'B1', 'A1':'A2', 'B1':'B2'}).reset_index()
pd.wide_to_long(df1, stubnames=['A', 'B'], i='index', j='id')\
  .reset_index()[['A', 'B', 'id']]

    A   B id
0   1   2  1
1   5   6  1
2   9  10  1
3   3   4  2
4   7   8  2
5  11  12  2

Solution 2

You can use lreshape, for column id numpy.repeat:

a = [col for col in df.columns if 'A' in col]
b = [col for col in df.columns if 'B' in col]
df1 = pd.lreshape(df, {'A' : a, 'B' : b})

df1['id'] = np.repeat(np.arange(len(df.columns) // 2), len (df.index))  + 1
print (df1)
    A   B  id
0   1   2   1
1   5   6   1
2   9  10   1
3   3   4   2
4   7   8   2
5  11  12   2

EDIT:

lreshape is currently undocumented, but it is possible it might be removed(with pd.wide_to_long too).

Possible solution is merging all 3 functions to one - maybe melt, but now it is not implementated. Maybe in some new version of pandas. Then my answer will be updated.

Solution 3

I solved this in 3 steps:

  1. Make a new dataframe df2 holding only the data you want to be added to the initial dataframe df.
  2. Delete the data from df that will be added below (and that was used to make df2.
  3. Append df2 to df.

Like so:

# step 1: create new dataframe
df2 = df[['A1', 'B1']]
df2.columns = ['A', 'B']

# step 2: delete that data from original
df = df.drop(["A1", "B1"], 1)

# step 3: append
df = df.append(df2, ignore_index=True)

Note how when you do df.append() you need to specify ignore_index=True so the new columns get appended to the index rather than keep their old index.

Your end result should be your original dataframe with the data rearranged like you wanted:

In [16]: df
Out[16]:
    A   B
0   1   2
1   5   6
2   9  10
3   3   4
4   7   8
5  11  12

Solution 4

Use pd.concat() like so:

#Split into separate tables
df_1 = df[['A', 'B']]
df_2 = df[['A1', 'B1']]
df_2.columns = ['A', 'B'] # Make column names line up

# Add the ID column
df_1 = df_1.assign(id=1)
df_2 = df_2.assign(id=2)

# Concatenate
pd.concat([df_1, df_2])
Share:
58,864
Moritz
Author by

Moritz

By day: former PhD student (biotechnology) By night: sleeping former PhD student

Updated on July 28, 2022

Comments

  • Moritz
    Moritz over 1 year

    suppose a dataframe like this one:

    df = pd.DataFrame([[1,2,3,4],[5,6,7,8],[9,10,11,12]], columns = ['A', 'B', 'A1', 'B1'])
    

    enter image description here

    I would like to have a dataframe which looks like:

    enter image description here

    what does not work:

    new_rows = int(df.shape[1]/2) * df.shape[0]
    new_cols = 2
    df.values.reshape(new_rows, new_cols, order='F')
    

    of course I could loop over the data and make a new list of list but there must be a better way. Any ideas ?

  • Matthew
    Matthew about 7 years
    @Moritz - I see. Personally I'd just do that in a for loop. Though perhaps @jezrael's lreshape solution is better for that case.
  • Ted Petrou
    Ted Petrou over 6 years
    This is a poor solution. Why not use pd.wide_to_long? It is built for this situation.