Melt the Upper Triangular Matrix of a Pandas Dataframe
Solution 1
First I convert lower values of df
to NaN
by where
and numpy.triu
and then stack
, reset_index
and set column names:
import numpy as np
print df
a b c
a 1.0 0.5 0.3
b 0.5 1.0 0.4
c 0.3 0.4 1.0
print np.triu(np.ones(df.shape)).astype(np.bool)
[[ True True True]
[False True True]
[False False True]]
df = df.where(np.triu(np.ones(df.shape)).astype(np.bool))
print df
a b c
a 1 0.5 0.3
b NaN 1.0 0.4
c NaN NaN 1.0
df = df.stack().reset_index()
df.columns = ['Row','Column','Value']
print df
Row Column Value
0 a a 1.0
1 a b 0.5
2 a c 0.3
3 b b 1.0
4 b c 0.4
5 c c 1.0
Solution 2
Building from solution by @jezrael, boolean indexing would be a more explicit approach:
import numpy
from pandas import DataFrame
df = DataFrame({'a':[1,.5,.3],'b':[.5,1,.4],'c':[.3,.4,1]},index=list('abc'))
print df,'\n'
keep = np.triu(np.ones(df.shape)).astype('bool').reshape(df.size)
print df.stack()[keep]
output:
a b c
a 1.0 0.5 0.3
b 0.5 1.0 0.4
c 0.3 0.4 1.0
a a 1.0
b 0.5
c 0.3
b b 1.0
c 0.4
c c 1.0
dtype: float64
Solution 3
Also buildin on solution by @jezrael, here's a version adding a function to do the inverse operation (from xy to matrix), usefull in my case to work with covariance / correlation matrices.
def matrix_to_xy(df, columns=None, reset_index=False):
bool_index = np.triu(np.ones(df.shape)).astype(bool)
xy = (
df.where(bool_index).stack().reset_index()
if reset_index
else df.where(bool_index).stack()
)
if reset_index:
xy.columns = columns or ["row", "col", "val"]
return xy
def xy_to_matrix(xy):
df = xy.pivot(*xy.columns).fillna(0)
df_vals = df.to_numpy()
df = pd.DataFrame(
np.triu(df_vals, 1) + df_vals.T, index=df.index, columns=df.index
)
return df
df = pd.DataFrame(
{"a": [1, 0.5, 0.3], "b": [0.5, 1, 0.4], "c": [0.3, 0.4, 1]},
index=list("abc"),
)
print(df)
xy = matrix_to_xy(df, reset_index=True)
print(xy)
mx = xy_to_matrix(xy)
print(mx)
output:
a b c
a 1.0 0.5 0.3
b 0.5 1.0 0.4
c 0.3 0.4 1.0
row col val
0 a a 1.0
1 a b 0.5
2 a c 0.3
3 b b 1.0
4 b c 0.4
5 c c 1.0
row a b c
row
a 1.0 0.5 0.3
b 0.5 1.0 0.4
c 0.3 0.4 1.0
Ramón J Romero y Vigil
Data engineer, cloud architect, and quantitative programmer working with AWS, Scala + Apache Spark, Java 8, Python 3 (numpy/scipy/scikit-learn), and terraform. Areas of expertise: equity portfolio optimization, functional programming, reactive streaming, infrastructure-as-code, and serverless architecture. Happy hacking!
Updated on July 19, 2022Comments
-
Ramón J Romero y Vigil almost 2 years
Given a square pandas DataFrame of the following form:
a b c a 1 .5 .3 b .5 1 .4 c .3 .4 1
How can the upper triangle be melted to get a matrix of the following form
Row Column Value a a 1 a b .5 a c .3 b b 1 b c .4 c c 1 #Note the combination a,b is only listed once. There is no b,a listing
I'm more interested in an idiomatic pandas solution, a custom indexer would be easy enough to write by hand...
Thank you in advance for your consideration and response.
-
Alex Riley over 8 yearsI the only thing to watch out for is if you have any
NaN
values that you want to preserve in the upper triangle (stack
will drop them all). You might have to explicitly construct the multi-index and then reindex if that is the case. -
Shadi about 6 yearsVery useful in combination with pandas.DataFrame.corr
-
Sos almost 6 years@jezrael how would you go back from the last
df
to the triangular one? I've built the triangular matrix, converted to long, processed, and now I want to have it back to triangular, but someNA
s go to the upper triangular -
jezrael almost 6 years@Sosi - I think need pivot like
df = df.pivot('Row', 'Column', 'Value')
-
Sos almost 6 years@jezrael thank you. however, this still yields a non-triangular matrix. Maybe I will create a new thread and ping you
-
jezrael almost 6 years@Sosi - Yes, because for me it working nice, so problem is something else.
-
Sos almost 6 years@jezrael I'll be honest: I feel dumb. With the working example, it was working fine, but not with my real case. However, I noticed now it was a matter of index/column order... sorry for having disturbed you. I will deleted my latest comments
-
jezrael almost 6 years@Sosi - No problem :)
pivot
always sorting data :) -
Emanuel over 4 years@AlexRiley using unstack instead seems to preserve the NaNs