Simultaneously diagonalize matrices with numpy
Solution 1
I am sure there is significant room for improvement in my solution, but I have come up with the following set of three functions doing the calculation for me in a semi-robust way.
def clusters(array,
orig_indices = None,
start = 0,
rtol=numpy.allclose.__defaults__[0],
atol=numpy.allclose.__defaults__[1]):
"""For an array, return a permutation that sorts the numbers and the sizes of the resulting blocks of identical numbers."""
array = numpy.asarray(array)
if not len(array):
return numpy.array([]),[]
if orig_indices is None:
orig_indices = numpy.arange(len(array))
x = array[0]
close = abs(array-x) <= (atol + rtol*abs(x))
first = sum(close)
r_perm, r_sizes = clusters(
array[~close],
orig_indices[~close],
start+first,
rtol, atol)
r_sizes.insert(0, first)
return numpy.concatenate((orig_indices[close], r_perm)), r_sizes
def permutation_matrix(permutation, dtype=dtype):
n = len(permutation)
P = numpy.zeros((n,n), dtype)
for i,j in enumerate(permutation):
P[j,i]=1
return P
def simultaneously_diagonalize(tensor, atol=numpy.allclose.__defaults__[1]):
tensor = numpy.asarray(tensor)
old_shape = tensor.shape
size = old_shape[-1]
tensor = tensor.reshape((-1, size, size))
diag_mask = 1-numpy.eye(size)
eigvalues, diagonalizer = numpy.linalg.eig(tensor[0])
diagonalization = numpy.dot(
numpy.dot(
matrix.linalg.inv(diagonalizer),
tensor).swapaxes(0,-2),
diagonalizer)
if numpy.allclose(diag_mask*diagonalization, 0):
return diagonalization.diagonal(axis1=-2, axis2=-1).reshape(old_shape[:-1])
else:
perm, cluster_sizes = clusters(diagonalization[0].diagonal())
perm_matrix = permutation_matrix(perm)
diagonalization = numpy.dot(
numpy.dot(
perm_matrix.T,
diagonalization).swapaxes(0,-2),
perm_matrix)
mask = 1-scipy.linalg.block_diag(
*list(
numpy.ones((blocksize, blocksize))
for blocksize in cluster_sizes))
print(diagonalization)
assert(numpy.allclose(
diagonalization*mask,
0)) # Assert that the matrices are co-diagonalizable
blocks = numpy.cumsum(cluster_sizes)
start = 0
other_part = []
for block in blocks:
other_part.append(
simultaneously_diagonalize(
diagonalization[1:, start:block, start:block]))
start = block
return numpy.vstack(
(diagonalization[0].diagonal(axis1=-2, axis2=-1),
numpy.hstack(other_part)))
Solution 2
There is a fairly simple and very elegant simultaneous diagonalization algorithm based on Givens rotation that was published by Cardoso and Soulomiac in 1996:
Cardoso, J., & Souloumiac, A. (1996). Jacobi Angles for Simultaneous Diagonalization. SIAM Journal on Matrix Analysis and Applications, 17(1), 161–164. doi:10.1137/S0895479893259546
I've attached a numpy implementation of the algorithm at the end of this response. Caveat: It turns out simultaneous diagonalization is a bit of a tricky numerical problem, with no algorithm (to the best of my knowledge) that guarantees global convergence. However, the cases in which it does not work (see the paper) are degenerate and in practice I have never had the Jacobi angles algorithm fail on me.
#!/usr/bin/env python2.7
# -*- coding: utf-8 -*-
"""
Routines for simultaneous diagonalization
Arun Chaganty <[email protected]>
"""
import numpy as np
from numpy import zeros, eye, diag
from numpy.linalg import norm
def givens_rotate( A, i, j, c, s ):
"""
Rotate A along axis (i,j) by c and s
"""
Ai, Aj = A[i,:], A[j,:]
A[i,:], A[j,:] = c * Ai + s * Aj, c * Aj - s * Ai
return A
def givens_double_rotate( A, i, j, c, s ):
"""
Rotate A along axis (i,j) by c and s
"""
Ai, Aj = A[i,:], A[j,:]
A[i,:], A[j,:] = c * Ai + s * Aj, c * Aj - s * Ai
A_i, A_j = A[:,i], A[:,j]
A[:,i], A[:,j] = c * A_i + s * A_j, c * A_j - s * A_i
return A
def jacobi_angles( *Ms, **kwargs ):
r"""
Simultaneously diagonalize using Jacobi angles
@article{SC-siam,
HTML = "ftp://sig.enst.fr/pub/jfc/Papers/siam_note.ps.gz",
author = "Jean-Fran\c{c}ois Cardoso and Antoine Souloumiac",
journal = "{SIAM} J. Mat. Anal. Appl.",
title = "Jacobi angles for simultaneous diagonalization",
pages = "161--164",
volume = "17",
number = "1",
month = jan,
year = {1995}}
(a) Compute Givens rotations for every pair of indices (i,j) i < j
- from eigenvectors of G = gg'; g = A_ij - A_ji, A_ij + A_ji
- Compute c, s as \sqrt{x+r/2r}, y/\sqrt{2r(x+r)}
(b) Update matrices by multiplying by the givens rotation R(i,j,c,s)
(c) Repeat (a) until stopping criterion: sin theta < threshold for all ij pairs
"""
assert len(Ms) > 0
m, n = Ms[0].shape
assert m == n
sweeps = kwargs.get('sweeps', 500)
threshold = kwargs.get('eps', 1e-8)
rank = kwargs.get('rank', m)
R = eye(m)
for _ in xrange(sweeps):
done = True
for i in xrange(rank):
for j in xrange(i+1, m):
G = zeros((2,2))
for M in Ms:
g = np.array([ M[i,i] - M[j,j], M[i,j] + M[j,i] ])
G += np.outer(g,g) / len(Ms)
# Compute the eigenvector directly
t_on, t_off = G[0,0] - G[1,1], G[0,1] + G[1,0]
theta = 0.5 * np.arctan2( t_off, t_on + np.sqrt( t_on*t_on + t_off * t_off) )
c, s = np.cos(theta), np.sin(theta)
if abs(s) > threshold:
done = False
# Update the matrices and V
for M in Ms:
givens_double_rotate(M, i, j, c, s)
#assert M[i,i] > M[j, j]
R = givens_rotate(R, i, j, c, s)
if done:
break
R = R.T
L = np.zeros((m, len(Ms)))
err = 0
for i, M in enumerate(Ms):
# The off-diagonal elements of M should be 0
L[:,i] = diag(M)
err += norm(M - diag(diag(M)))
return R, L, err
Solution 3
I am not aware of any direct solution. But why not just getting the eigenvalues and the eigenvectors of the first matrix, and using the eigenvectors to transform all other matrices to the diagonal form? Something like:
eigvals, eigvecs = np.linalg.eig(matrix1)
eigvals2 = np.diagonal(np.dot(np.dot(transpose(eigvecs), matrix2), eigvecs))
You can the add the columns to an array via hstack
if you like.
UPDATE: As pointed out below, this is only valid if no degenerate eigenvalues occur. Otherwise one would have to check first for the degenerate eigenvalues, then transform the 2nd matrix to a blockdiagonal form, and diagonalize eventual blocks bigger than 1x1 separately.
Solution 4
If you know something about the size of the eigenvalues of the two matrices in advance, you can diagonalize a linear combination of the two matrices, with coefficients chosen to break the degeneracy. For example, if the eigenvalues of both lie between -10 and 10, you could diagonalize 100*M1 + M2. There's a slight loss of precision, but for many purposes it's good enough--and quick and easy!
Anaphory
Computational researcher (geography and diversity linguistics) in my day life, and some of my spare time.
Updated on June 14, 2022Comments
-
Anaphory almost 2 years
I have a m × n × n
numpy.ndarray
of m simultaneously diagonalizable square matrices and would like to usenumpy
to obtain their simultaneous eigenvalues.For example, if I had
from numpy import einsum, diag, array, linalg, random U = linalg.svd(random.random((3,3)))[2] M = einsum( "ij, ajk, lk", U, [diag([2,2,0]), diag([1,-1,1])], U)
the two matrices in
M
are simultaneously diagonalizable, and I am looking for a way to obtain the arrayarray([[2., 1.], [2., -1.], [0., 1.]])
(up to permutation of the lines) from
M
. Is there a built-in or easy way to get this? -
Anaphory about 11 yearsBecause these eigenvectors are not necessarily eigenvectors of the second matrix, so the diagonal might not be the eigenvalues (!) of the second matrix. If as in the example in my question 2 is a double eigenvalue of the first matrix, I get two “random” eigenvectors for that eigenvalue. These are not necessarily eigenvectors for 1 and -1 respectively of the second matrix, the cases where they are are a zero set.
-
Bálint Aradi about 11 yearsI agree, in case of degenerated eigenvalues things are somewhat more complicated as you would end up with a blockdiagonal matrix. But then, you can diagonalize those blocks bigger than 1x1 separately. It would be still faster as to diagonalize the whole 2nd matrix.
-
Anaphory about 11 yearsSplitting the partly diagonalized matrices into the blocks and then diagonalizing the blocks is easier described than written in python, in particular considering numerical inaccuracies, and that's why I asked for a good way to do this.
-
Anaphory about 11 yearsThere is still a problem with this approach which I cannot pinpoint yet. Use with caution!
-
Ziqian Xie over 7 yearsI tried this program but the result is not right, I think it should be -t_on + np.sqrt(t_on * t_on + t_off * t_off) in the line where theta is calculated, G is a symmetric matrix [[a, b],[b, c]], the largest eigenvalue lambda is (a+c+sqrt((a-c) ** 2+4 * b ** 2)/2, let the eigenvector corresponds to this eigenvalue be [[1],[t]] (without normalization), then a + b*t = lambda, t = (lambda - a)/b = (-a+c+sqrt((a-c) ** 2+4 * b ** 2)/2/b = (-t_on + sqrt(t_on ** 2 + t_off **2)/t_off. You can print out the square sum of the off diagonal elements in every outer loop to see if it decreases monotonically
-
Ziqian Xie over 7 yearsAlso I found that using arctan2 may have numerical issue and doesn't converge. I am using x, y = 1/(1+t ** 2), t/(1+t ** 2), c = sqrt(0.5 + x/2), s = y/2/c instead.
-
Ziqian Xie over 7 yearscorrection: x, y = 1/sqrt(1 + t ** 2), t/sqrt(1 + t ** 2)
-
frankgut over 6 years-1 This code is not correct. For example, it fails to simultaneously diagonalize the identity matrix {{1, 0}, {0, 1}} and the X matrix {{0, 1}, {1, 0}}. It should return an R such as {{1, 1}, {1, -1}}/sqrt(2), but instead it returns an identity matrix for R.