Melt the Upper Triangular Matrix of a Pandas Dataframe

22,276

Solution 1

First I convert lower values of df to NaN by where and numpy.triu and then stack, reset_index and set column names:

import numpy as np

print df
     a    b    c
a  1.0  0.5  0.3
b  0.5  1.0  0.4
c  0.3  0.4  1.0

print np.triu(np.ones(df.shape)).astype(np.bool)
[[ True  True  True]
 [False  True  True]
 [False False  True]]

df = df.where(np.triu(np.ones(df.shape)).astype(np.bool))
print df
    a    b    c
a   1  0.5  0.3
b NaN  1.0  0.4
c NaN  NaN  1.0

df = df.stack().reset_index()
df.columns = ['Row','Column','Value']
print df

  Row Column  Value
0   a      a    1.0
1   a      b    0.5
2   a      c    0.3
3   b      b    1.0
4   b      c    0.4
5   c      c    1.0

Solution 2

Building from solution by @jezrael, boolean indexing would be a more explicit approach:

import numpy
from pandas import DataFrame

df = DataFrame({'a':[1,.5,.3],'b':[.5,1,.4],'c':[.3,.4,1]},index=list('abc'))
print df,'\n'
keep = np.triu(np.ones(df.shape)).astype('bool').reshape(df.size)
print df.stack()[keep]

output:

     a    b    c
a  1.0  0.5  0.3
b  0.5  1.0  0.4
c  0.3  0.4  1.0 

a  a    1.0
   b    0.5
   c    0.3
b  b    1.0
   c    0.4
c  c    1.0
dtype: float64

Solution 3

Also buildin on solution by @jezrael, here's a version adding a function to do the inverse operation (from xy to matrix), usefull in my case to work with covariance / correlation matrices.

def matrix_to_xy(df, columns=None, reset_index=False):
    bool_index = np.triu(np.ones(df.shape)).astype(bool)
    xy = (
        df.where(bool_index).stack().reset_index()
        if reset_index
        else df.where(bool_index).stack()
    )
    if reset_index:
        xy.columns = columns or ["row", "col", "val"]
    return xy


def xy_to_matrix(xy):
    df = xy.pivot(*xy.columns).fillna(0)
    df_vals = df.to_numpy()
    df = pd.DataFrame(
        np.triu(df_vals, 1) + df_vals.T, index=df.index, columns=df.index
    )
    return df
df = pd.DataFrame(
    {"a": [1, 0.5, 0.3], "b": [0.5, 1, 0.4], "c": [0.3, 0.4, 1]},
    index=list("abc"),
)
print(df)
xy = matrix_to_xy(df, reset_index=True)
print(xy)
mx = xy_to_matrix(xy)
print(mx)

output:

     a    b    c
a  1.0  0.5  0.3
b  0.5  1.0  0.4
c  0.3  0.4  1.0

  row col  val
0   a   a  1.0
1   a   b  0.5
2   a   c  0.3
3   b   b  1.0
4   b   c  0.4
5   c   c  1.0

row    a    b    c
row
a    1.0  0.5  0.3
b    0.5  1.0  0.4
c    0.3  0.4  1.0
Share:
22,276
Ramón J Romero y Vigil
Author by

Ramón J Romero y Vigil

Data engineer, cloud architect, and quantitative programmer working with AWS, Scala + Apache Spark, Java 8, Python 3 (numpy/scipy/scikit-learn), and terraform. Areas of expertise: equity portfolio optimization, functional programming, reactive streaming, infrastructure-as-code, and serverless architecture. Happy hacking!

Updated on July 19, 2022

Comments

  • Ramón J Romero y Vigil
    Ramón J Romero y Vigil almost 2 years

    Given a square pandas DataFrame of the following form:

       a  b  c
    a  1 .5 .3
    b .5  1 .4
    c .3 .4  1
    

    How can the upper triangle be melted to get a matrix of the following form

     Row     Column    Value
      a        a       1
      a        b       .5 
      a        c       .3
      b        b       1
      b        c       .4
      c        c       1 
    
    #Note the combination a,b is only listed once.  There is no b,a listing     
    

    I'm more interested in an idiomatic pandas solution, a custom indexer would be easy enough to write by hand...

    Thank you in advance for your consideration and response.

  • Alex Riley
    Alex Riley over 8 years
    I the only thing to watch out for is if you have any NaN values that you want to preserve in the upper triangle (stack will drop them all). You might have to explicitly construct the multi-index and then reindex if that is the case.
  • Shadi
    Shadi about 6 years
    Very useful in combination with pandas.DataFrame.corr
  • Sos
    Sos almost 6 years
    @jezrael how would you go back from the last df to the triangular one? I've built the triangular matrix, converted to long, processed, and now I want to have it back to triangular, but some NAs go to the upper triangular
  • jezrael
    jezrael almost 6 years
    @Sosi - I think need pivot like df = df.pivot('Row', 'Column', 'Value')
  • Sos
    Sos almost 6 years
    @jezrael thank you. however, this still yields a non-triangular matrix. Maybe I will create a new thread and ping you
  • jezrael
    jezrael almost 6 years
    @Sosi - Yes, because for me it working nice, so problem is something else.
  • Sos
    Sos almost 6 years
    @jezrael I'll be honest: I feel dumb. With the working example, it was working fine, but not with my real case. However, I noticed now it was a matter of index/column order... sorry for having disturbed you. I will deleted my latest comments
  • jezrael
    jezrael almost 6 years
    @Sosi - No problem :) pivot always sorting data :)
  • Emanuel
    Emanuel over 4 years
    @AlexRiley using unstack instead seems to preserve the NaNs