Simultaneously diagonalize matrices with numpy

12,360

Solution 1

I am sure there is significant room for improvement in my solution, but I have come up with the following set of three functions doing the calculation for me in a semi-robust way.

def clusters(array,
             orig_indices = None,
             start = 0,
             rtol=numpy.allclose.__defaults__[0],
             atol=numpy.allclose.__defaults__[1]):
    """For an array, return a permutation that sorts the numbers and the sizes of the resulting blocks of identical numbers."""
    array = numpy.asarray(array)
    if not len(array):
        return numpy.array([]),[]
    if orig_indices is None:
        orig_indices = numpy.arange(len(array))
    x = array[0]
    close = abs(array-x) <= (atol + rtol*abs(x))
    first = sum(close)
    r_perm, r_sizes = clusters(
        array[~close],
        orig_indices[~close],
        start+first,
        rtol, atol)
    r_sizes.insert(0, first)
    return numpy.concatenate((orig_indices[close], r_perm)), r_sizes

def permutation_matrix(permutation, dtype=dtype):
    n = len(permutation)
    P = numpy.zeros((n,n), dtype)
    for i,j in enumerate(permutation):
        P[j,i]=1
    return P

def simultaneously_diagonalize(tensor, atol=numpy.allclose.__defaults__[1]):
    tensor = numpy.asarray(tensor)
    old_shape = tensor.shape
    size = old_shape[-1]
    tensor = tensor.reshape((-1, size, size))
    diag_mask = 1-numpy.eye(size)

    eigvalues, diagonalizer = numpy.linalg.eig(tensor[0])
    diagonalization = numpy.dot(
        numpy.dot(
            matrix.linalg.inv(diagonalizer),
            tensor).swapaxes(0,-2),
        diagonalizer)
    if numpy.allclose(diag_mask*diagonalization, 0):
        return diagonalization.diagonal(axis1=-2, axis2=-1).reshape(old_shape[:-1])
    else:
        perm, cluster_sizes = clusters(diagonalization[0].diagonal())
        perm_matrix = permutation_matrix(perm)
        diagonalization = numpy.dot(
            numpy.dot(
                perm_matrix.T,
                diagonalization).swapaxes(0,-2),
            perm_matrix)
        mask = 1-scipy.linalg.block_diag(
            *list(
                numpy.ones((blocksize, blocksize))
                for blocksize in cluster_sizes))
        print(diagonalization)
        assert(numpy.allclose(
                diagonalization*mask,
                0)) # Assert that the matrices are co-diagonalizable
        blocks = numpy.cumsum(cluster_sizes)
        start = 0
        other_part = []
        for block in blocks:
            other_part.append(
                simultaneously_diagonalize(
                    diagonalization[1:, start:block, start:block]))
            start = block
        return numpy.vstack(
            (diagonalization[0].diagonal(axis1=-2, axis2=-1),
             numpy.hstack(other_part)))

Solution 2

There is a fairly simple and very elegant simultaneous diagonalization algorithm based on Givens rotation that was published by Cardoso and Soulomiac in 1996:

Cardoso, J., & Souloumiac, A. (1996). Jacobi Angles for Simultaneous Diagonalization. SIAM Journal on Matrix Analysis and Applications, 17(1), 161–164. doi:10.1137/S0895479893259546

I've attached a numpy implementation of the algorithm at the end of this response. Caveat: It turns out simultaneous diagonalization is a bit of a tricky numerical problem, with no algorithm (to the best of my knowledge) that guarantees global convergence. However, the cases in which it does not work (see the paper) are degenerate and in practice I have never had the Jacobi angles algorithm fail on me.

#!/usr/bin/env python2.7
# -*- coding: utf-8 -*-
"""
Routines for simultaneous diagonalization
Arun Chaganty <[email protected]>
"""

import numpy as np
from numpy import zeros, eye, diag
from numpy.linalg import norm

def givens_rotate( A, i, j, c, s ):
    """
    Rotate A along axis (i,j) by c and s
    """
    Ai, Aj = A[i,:], A[j,:]
    A[i,:], A[j,:] = c * Ai + s * Aj, c * Aj - s * Ai 

    return A

def givens_double_rotate( A, i, j, c, s ):
    """
    Rotate A along axis (i,j) by c and s
    """
    Ai, Aj = A[i,:], A[j,:]
    A[i,:], A[j,:] = c * Ai + s * Aj, c * Aj - s * Ai 
    A_i, A_j = A[:,i], A[:,j]
    A[:,i], A[:,j] = c * A_i + s * A_j, c * A_j - s * A_i 

    return A

def jacobi_angles( *Ms, **kwargs ):
    r"""
    Simultaneously diagonalize using Jacobi angles
    @article{SC-siam,
       HTML =   "ftp://sig.enst.fr/pub/jfc/Papers/siam_note.ps.gz",
       author = "Jean-Fran\c{c}ois Cardoso and Antoine Souloumiac",
       journal = "{SIAM} J. Mat. Anal. Appl.",
       title = "Jacobi angles for simultaneous diagonalization",
       pages = "161--164",
       volume = "17",
       number = "1",
       month = jan,
       year = {1995}}

    (a) Compute Givens rotations for every pair of indices (i,j) i < j 
            - from eigenvectors of G = gg'; g = A_ij - A_ji, A_ij + A_ji
            - Compute c, s as \sqrt{x+r/2r}, y/\sqrt{2r(x+r)}
    (b) Update matrices by multiplying by the givens rotation R(i,j,c,s)
    (c) Repeat (a) until stopping criterion: sin theta < threshold for all ij pairs
    """

    assert len(Ms) > 0
    m, n = Ms[0].shape
    assert m == n

    sweeps = kwargs.get('sweeps', 500)
    threshold = kwargs.get('eps', 1e-8)
    rank = kwargs.get('rank', m)

    R = eye(m)

    for _ in xrange(sweeps):
        done = True
        for i in xrange(rank):
            for j in xrange(i+1, m):
                G = zeros((2,2))
                for M in Ms:
                    g = np.array([ M[i,i] - M[j,j], M[i,j] + M[j,i] ])
                    G += np.outer(g,g) / len(Ms)
                # Compute the eigenvector directly
                t_on, t_off = G[0,0] - G[1,1], G[0,1] + G[1,0]
                theta = 0.5 * np.arctan2( t_off, t_on + np.sqrt( t_on*t_on + t_off * t_off) )
                c, s = np.cos(theta), np.sin(theta)

                if abs(s) > threshold:
                    done = False
                    # Update the matrices and V
                    for M in Ms:
                        givens_double_rotate(M, i, j, c, s)
                        #assert M[i,i] > M[j, j]
                    R = givens_rotate(R, i, j, c, s)

        if done:
            break
    R = R.T

    L = np.zeros((m, len(Ms)))
    err = 0
    for i, M in enumerate(Ms):
        # The off-diagonal elements of M should be 0
        L[:,i] = diag(M)
        err += norm(M - diag(diag(M)))

    return R, L, err

Solution 3

I am not aware of any direct solution. But why not just getting the eigenvalues and the eigenvectors of the first matrix, and using the eigenvectors to transform all other matrices to the diagonal form? Something like:

eigvals, eigvecs = np.linalg.eig(matrix1)
eigvals2 = np.diagonal(np.dot(np.dot(transpose(eigvecs), matrix2), eigvecs))

You can the add the columns to an array via hstack if you like.

UPDATE: As pointed out below, this is only valid if no degenerate eigenvalues occur. Otherwise one would have to check first for the degenerate eigenvalues, then transform the 2nd matrix to a blockdiagonal form, and diagonalize eventual blocks bigger than 1x1 separately.

Solution 4

If you know something about the size of the eigenvalues of the two matrices in advance, you can diagonalize a linear combination of the two matrices, with coefficients chosen to break the degeneracy. For example, if the eigenvalues of both lie between -10 and 10, you could diagonalize 100*M1 + M2. There's a slight loss of precision, but for many purposes it's good enough--and quick and easy!

Share:
12,360
Anaphory
Author by

Anaphory

Computational researcher (geography and diversity linguistics) in my day life, and some of my spare time.

Updated on June 14, 2022

Comments

  • Anaphory
    Anaphory almost 2 years

    I have a m × n × n numpy.ndarray of m simultaneously diagonalizable square matrices and would like to use numpy to obtain their simultaneous eigenvalues.

    For example, if I had

    from numpy import einsum, diag, array, linalg, random
    U = linalg.svd(random.random((3,3)))[2]
    
    M = einsum(
        "ij, ajk, lk",
        U, [diag([2,2,0]), diag([1,-1,1])], U)
    

    the two matrices in M are simultaneously diagonalizable, and I am looking for a way to obtain the array

    array([[2.,  1.],
           [2., -1.],
           [0.,  1.]])
    

    (up to permutation of the lines) from M. Is there a built-in or easy way to get this?

  • Anaphory
    Anaphory about 11 years
    Because these eigenvectors are not necessarily eigenvectors of the second matrix, so the diagonal might not be the eigenvalues (!) of the second matrix. If as in the example in my question 2 is a double eigenvalue of the first matrix, I get two “random” eigenvectors for that eigenvalue. These are not necessarily eigenvectors for 1 and -1 respectively of the second matrix, the cases where they are are a zero set.
  • Bálint Aradi
    Bálint Aradi about 11 years
    I agree, in case of degenerated eigenvalues things are somewhat more complicated as you would end up with a blockdiagonal matrix. But then, you can diagonalize those blocks bigger than 1x1 separately. It would be still faster as to diagonalize the whole 2nd matrix.
  • Anaphory
    Anaphory about 11 years
    Splitting the partly diagonalized matrices into the blocks and then diagonalizing the blocks is easier described than written in python, in particular considering numerical inaccuracies, and that's why I asked for a good way to do this.
  • Anaphory
    Anaphory about 11 years
    There is still a problem with this approach which I cannot pinpoint yet. Use with caution!
  • Ziqian Xie
    Ziqian Xie over 7 years
    I tried this program but the result is not right, I think it should be -t_on + np.sqrt(t_on * t_on + t_off * t_off) in the line where theta is calculated, G is a symmetric matrix [[a, b],[b, c]], the largest eigenvalue lambda is (a+c+sqrt((a-c) ** 2+4 * b ** 2)/2, let the eigenvector corresponds to this eigenvalue be [[1],[t]] (without normalization), then a + b*t = lambda, t = (lambda - a)/b = (-a+c+sqrt((a-c) ** 2+4 * b ** 2)/2/b = (-t_on + sqrt(t_on ** 2 + t_off **2)/t_off. You can print out the square sum of the off diagonal elements in every outer loop to see if it decreases monotonically
  • Ziqian Xie
    Ziqian Xie over 7 years
    Also I found that using arctan2 may have numerical issue and doesn't converge. I am using x, y = 1/(1+t ** 2), t/(1+t ** 2), c = sqrt(0.5 + x/2), s = y/2/c instead.
  • Ziqian Xie
    Ziqian Xie over 7 years
    correction: x, y = 1/sqrt(1 + t ** 2), t/sqrt(1 + t ** 2)
  • frankgut
    frankgut over 6 years
    -1 This code is not correct. For example, it fails to simultaneously diagonalize the identity matrix {{1, 0}, {0, 1}} and the X matrix {{0, 1}, {1, 0}}. It should return an R such as {{1, 1}, {1, -1}}/sqrt(2), but instead it returns an identity matrix for R.