Eigen efficient type for dense symmetric matrix

11,741

Solution 1

Packed storage of symmetric matrices is a big enemy of vectorized code, i.e. of speed. Standard practice is to store the relevant N*(N+1)/2 coefficients in the upper or lower triangular part of a full dense NxN matrix and leave the remaining (N-1)*N/2 unreferenced. All operations on the symmetric matrix are then defined by taking into account this peculiar storage. In eigen you have the concept of triangular and self-adjoint views for obtaining this.

From the eigen reference: (for real matrices selfadjoint==symmetric).

Just as for triangular matrix, you can reference any triangular part of a square matrix to see it as a selfadjoint matrix and perform special and optimized operations. Again the opposite triangular part is never referenced and can be used to store other information.

Unless memory is a big problem, I would suggest to leave the unreferenced part of the matrix empty. (More readable code, no performance problems.)

Solution 2

Yes, eigen3 has the concept of views. It doesn't do anything to the storage though. Just as an idea though, you might be able to share a larger block for two symmetric matrices of the same type:

Matrix<float,4,4> A1, A2; // assume A1 and A2 to be symmetric
Matrix<float,5,4> A;
A.topRightCorner<4,4>().triangularView<Upper>() = A1;
A.bottomLeftCorner<4,4>().triangularView<Lower>() = A2;

Its pretty cumbersome though, and I would only use it if your memory is really precious.

Solution 3

Efficient type for symmetric matrix

You just assign values to the lower/upper triangular parts of the matrix and use the Eigen triangular and selfadjoint views. However, I have tested both on small fixed-size matrices. I noticed that performance-wise, using views is not always the best choice. Consider the following code:

Eigen::Matrix2d m;
m(0,0) = 2.0;
m(1,0) = 1.0;
// m(0,1) = 1.0;
m(1,1) = 2.0;
Eigen::Vector2d v;
v << 1.0,1.0;
auto result = m.selfadjointView<Eigen::Lower>()*v;

The product in the last line is quite slow compared with the alternative solutions presented below (about 20% slower for double 2x2 matrices in my case). (The product using the full matrix, by uncommenting m(0,1) = 1.0;, and using auto result = m*v, is even faster for double 2x2 matrices).

Some alternatives.

1) Store symmetric matrix in vector

You could store your matrix in a vector of size 45. Summing 2 matrices in vector format is straightforward (just sum the vectors). But you have to write your own implementation for products.

Here is the implementation of such a matrix * vector product (dense, fixed-size) where the lower part of the matrix is stored column-wise in a vector:

template <typename T, size_t S>
Eigen::Matrix<T,S,1> matrixVectorTimesVector(const Eigen::Matrix<T,S*(S+1)/2,1>& m, const Eigen::Matrix<T,S,1>& v)
{
    Eigen::Matrix<T,S,1> ret(Eigen::Matrix<T,S,1>::Zero());
    int counter(0);
    for (int i=0; i<S; ++i)
    {
        ret[i] += m(counter++)*v(i);
        for (int j=i+1; j<S; ++j)
        {
            ret[i] += m(counter)*v(j);
            ret[j] += m(counter++)*v(i);
        }
    }
    return ret;
}

2) Store only the triangular part and implement your own operations

You could of course also implement your own product matrix * vector, where the matrix only stores the 45 elements (let's assume we store the lower triangular part). This would maybe be the most elegant solution, as it keeps the format of a matrix (instead of using a vector which represents a matrix). You can then also use Eigen functions like in the example below:

template <typename T, size_t S>
Eigen::Matrix<T,S,S> symmMatrixPlusSymmMatrix( Eigen::Matrix<T,S,S>& m1, const Eigen::Matrix<T,S,S>& m2)
{
    Eigen::Matrix<T,S,S> ret;
    ret.template triangularView<Eigen::Lower>() = m1 + m2; // no performance gap here!
    return ret;
}

In the function above (sum of 2 symmetric matrices), only the lower triangular parts of m1 and m2 are visited. Note that the triangularView does not present a performance gap in this case (I affirm this based on my benchmarks).

As for the matrix * vector product, see the example below (same performance as the product in alternative 1)). The algorithm only visits the lower triangular part of the matrix.

template <typename T, size_t S>
Eigen::Matrix<T,S,1> symmMatrixTimesVector(const Eigen::Matrix<T,S,S>& m, const Eigen::Matrix<T,S,1>& v)
{
    Eigen::Matrix<T,S,1> ret(Eigen::Matrix<T,S,1>::Zero());
    int counter(0);

    for (int c=0; c<S; ++c)
    {
        ret(c) += m(c,c)*v(c);
        for (int r=c+1; r<S; ++r)
        {
            ret(c) += m(r,c)*v(r);
            ret(r) += m(r,c)*v(c);
        }
    }
    return ret;
}

Performance gain for the product Matrix2d*Vector2d when compared to the product using the full matrix (2x2 = 4 elements) is 10% in my case.

Share:
11,741
qble
Author by

qble

Updated on June 15, 2022

Comments

  • qble
    qble almost 2 years

    Does Eigen have efficient type for store dense, fixed-size, symmetric matrix? (hey, they are ubiquitous!)

    I.e. for N=9, it should store only (1+9)*9/2==45 elements and it has appropriate operations. For instance there should be efficient addition of two symmetric matrices, which returns simmilar symmetric matrix.

    If there is no such thing, which actions (looks like this) I should make to introduce such type to Eigen? Does it has concepts of "Views"? Can I write something like "matrix view" for my own type, which would make it Eigen-friednly?

    P.S. Probably I can treat plain array as 1xN matrix using map, and do operations on it. But it is not the cleanest solution.

    • Mikhail
      Mikhail over 11 years
      There is little advantage for N=9, due to the divergence in your code coming from resolving the matrix values. You are having your memory, but are you really running out of memory or do you expect some associated computational advantage? Can you motivate your question with some usage scenario?
    • qble
      qble over 11 years
      "Can you motivate your question with some usage scenario?" - I have millions of such matrices. I need to store them in array, and do some operations on them.
    • Mikhail
      Mikhail over 11 years
      How many matrixes you can store in about 4GB of memory. Assuming doubles you can store 10 million, if we do it your way you can store 20 million. Imagine that these represents an square image. By using your inefficient matrixes, you were unable to double the length, but your computation time tripled. Differences of 2x memory capacity can be solved with ram modules. Its cheap to get 16 GB of ram, its much harder to double CPU performance. Focus on changing your model or buying new hardware. Respectfully ~
    • qble
      qble over 11 years
      @Mikhail, do you think I should recommend to all of my clients to buy additional memory? It is nonsense. And why you are telling that such matricies are inefficient? Addition of them should be pretty efficient.
    • Stefano M
      Stefano M over 11 years
      @Mikhail, If you read my answer you will see that I agree with your idea that code efficiency should be favored over memory efficiency. Nevertheless if matrices are only to be summed, the problem here is actually only if random access is to be performed on the i,j elements. It is pretty simple in fact to unfold a double for loop over two indices i=1...N, j=i...N, into a single loop.
    • Mikhail
      Mikhail over 11 years
      @StefanoM There is an interesting question here that asks if breaking vectorization is advantageous if this halves the memory loads.
    • qble
      qble over 11 years
      @Mikhail, Addition of two triangular matrix, which are encoded just as 1x((1+N)*N/2) matrix - does NOT break any vectorization.
    • Grizzly
      Grizzly over 11 years
      @Mikhail: Memory might be cheap, but bandwidth isn't. Depending on what the OP is planning to do with those matrices, the reduction in cache misses might improve the performance much more then the overhead from using the compressed storage form (if it exists, depending on the operations) would decrease it.
    • qble
      qble over 11 years
      @Grizzly, good point, I forgot to mention it. Compact data structures would improve overal time, not just memory usage amount.
    • chtz
      chtz almost 7 years
      Here is a related bug-entry for that eigen.tuxfamily.org/bz/show_bug.cgi?id=42 (mostly inactive, however).
    • fwyzard
      fwyzard over 5 years
      If you still feel like implementing something in Eigen, here is a research paper about an optimised packed storage for triangular (and banded) matrices that claims to achieve good computational performance: Toufik Baroudi, Rachid Seghir, Vincent Loechner. "Optimization of Triangular and Banded Matrix Operations Using 2d-Packed Layouts". ACM Transactions on Architecture and Code Optimization, Association for Computing Machinery, 2017, 14 (4), pp.1 - 19. <10.1145/3162016>. <hal-01633724> hal.inria.fr/hal-01633724/file/BSL17-2dpacked.pdf
  • Stefano M
    Stefano M over 11 years
    Shouldn't it be A.topRightCorner<N,N>().triangularView<Upper>()?
  • qble
    qble over 11 years
    "Packed storage of symmetric matrices is a big enemy of vectorized code, i.e. of speed" - for my case, I need just to add such matrices, I don't see how it can affect vectorization in such case. "Unless memory is a big problem" - memory is a real problem - I have millions of such matrices..
  • qble
    qble over 11 years
    "Matrix<float,5,5> A;" I think it would be more efficient to use Matrix<float,5,4> A; or Matrix<float,4,5> A; for store two 4x4 matrices. It is good point, but unfortunatly my matrices are separate entities, and can't be stored together.
  • Stefano M
    Stefano M over 11 years
    If you do not perform any 'real' matrix operation like rank1 or rank2 update, LLT factorization and so on, I would simply go for a single BIG 2D array in which you store the matrix upper triangular part as N*(N+1)/2 columns (and of course assuring that you have column-major order). Then you can access the columns of this matrix and perform sum and scale operations efficiently. Accessing the i,j matrix element is then simply a #define matter...
  • qble
    qble over 11 years
    Well, after some non-"real" operations like addition, I do perform LLT factorization. But it is not a problem to use separate square matrix for factorized matrix. And by the way, Eigen does not do in-place LLT, it performs copy in any case.
  • Stefano M
    Stefano M over 11 years
    Ok, I noticed just now that this is just a followup of a previous question . Sorry if I bother you again: forget eigen and implement it yourself, unfolding loops as in this answer but implementing a linear storage scheme
  • qble
    qble over 11 years
    I think it actually better to use Eigen to perform addition of such matrices. Even without "real" support, it is still possible to use Eigen. For instance, as I said in original question - just use 1-dimensional matrix. Convert to real 2d matrix only when needed..