Worse performance using Eigen than using my own class

17,277

Solution 1

If you're using Eigen's MatrixXd types, those are dynamically sized. You should get much better results from using the fixed size types e.g Matrix4d, Vector4d.

Also, make sure you're compiling such that the code can get vectorized; see the relevant Eigen documentation.

Re your thought on using the Direct3D extensions library stuff (D3DXMATRIX etc): it's OK (if a bit old fashioned) for graphics geometry (4x4 transforms etc), but it's certainly not GPU accelerated (just good old SSE, I think). Also, note that it's floating point precision only (you seem to be set on using doubles). Personally I'd much prefer to use Eigen unless I was actually coding a Direct3D app.

Solution 2

Make sure to have compiler optimization switched on (e.g. at least -O2 on gcc). Eigen is heavily templated and will not perform very well if you don't turn on optimization.

Solution 3

Which version of Eigen are you using? They recently released 3.0.1, which is supposed to be faster than 2.x. Also, make sure you play a bit with the compiler options. For example, make sure SSE is being used in Visual Studio:

C/C++ --> Code Generation --> Enable Enhanced Instruction Set

Solution 4

You should profile and then optimize first the algorithm, then the implementation. In particular, the posted code is quite innefficient:

for (int i=0;i<pointVector.size();i++ )
{
   Eigen::MatrixXd outcome = (rotation*scale)* (*pointVector[i])  + translation;

I don't know the library, so I won't even try to guess the number of unnecessary temporaries that you are creating, but a simple refactor:

Eigen::MatrixXd tmp = rotation*scale;
for (int i=0;i<pointVector.size();i++ )
{
   Eigen::MatrixXd outcome = tmp*(*pointVector[i])  + translation;

Can save you a good amount of expensive multiplications (and again, probably new temporary matrices that get discarded right away.

Solution 5

A couple of points.

  1. Why are you multiplying rotation*scale inside of the loop when that product will have the same value each iteration? That is a lot of wasted effort.

  2. You are using dynamically sized matrices rather than fixed sized matrices. Someone else mentioned this already, and you said you shaved off 2 sec.

  3. You are passing arguments as a vector of pointers to matrices. This adds an extra pointer indirection and destroys any guarantee of data locality, which will give poor cache performance.

  4. I hope this isn't insulting, but are you compiling in Release or Debug? Eigen is very slow in debug builds, because it uses lots of trivial templated functions that are optimized out of release but remain in debug.

Looking at your code, I am hesitant to blame Eigen for performance problems. However, most linear algebra libraries (including Eigen) are not really designed for your use case of lots of tiny matrices. In general, Eigen will be better optimized for 100x100 or larger matrices. You very well may be better off using your own matrix class or the DirectX math helper classes. The DirectX math classes are completely independent from your video card.

Share:
17,277
george
Author by

george

Updated on June 09, 2022

Comments

  • george
    george almost 2 years

    A couple of weeks ago I asked a question about the performance of matrix multiplication.

    I was told that in order to enhance the performance of my program I should use some specialised matrix classes rather than my own class.

    StackOverflow users recommended:

    • uBLAS
    • EIGEN
    • BLAS

    At first I wanted to use uBLAS however reading documentation it turned out that this library doesn't support matrix-matrix multiplication.

    After all I decided to use EIGEN library. So I exchanged my matrix class to Eigen::MatrixXd - however it turned out that now my application works even slower than before. Time before using EIGEN was 68 seconds and after exchanging my matrix class to EIGEN matrix program runs for 87 seconds.

    Parts of program which take the most time looks like that

    TemplateClusterBase* TemplateClusterBase::TransformTemplateOne( vector<Eigen::MatrixXd*>& pointVector, Eigen::MatrixXd& rotation ,Eigen::MatrixXd& scale,Eigen::MatrixXd& translation )
    {   
        for (int i=0;i<pointVector.size();i++ )
        {
            //Eigen::MatrixXd outcome =
            Eigen::MatrixXd outcome = (rotation*scale)* (*pointVector[i])  + translation;
            //delete  prototypePointVector[i];      // ((rotation*scale)* (*prototypePointVector[i])  + translation).ConvertToPoint();
            MatrixHelper::SetX(*prototypePointVector[i],MatrixHelper::GetX(outcome));
            MatrixHelper::SetY(*prototypePointVector[i],MatrixHelper::GetY(outcome));
            //assosiatedPointIndexVector[i]    = prototypePointVector[i]->associatedTemplateIndex = i;
        }
    
        return this;
    }
    

    and

    Eigen::MatrixXd AlgorithmPointBased::UpdateTranslationMatrix( int clusterIndex )
    {
        double membershipSum = 0,outcome = 0;
        double currentPower = 0;
        Eigen::MatrixXd outcomePoint = Eigen::MatrixXd(2,1);
        outcomePoint << 0,0;
        Eigen::MatrixXd templatePoint;
        for (int i=0;i< imageDataVector.size();i++)
        {
            currentPower =0; 
            membershipSum += currentPower = pow(membershipMatrix[clusterIndex][i],m);
            outcomePoint.noalias() +=  (*imageDataVector[i] - (prototypeVector[clusterIndex]->rotationMatrix*prototypeVector[clusterIndex]->scalingMatrix* ( *templateCluster->templatePointVector[prototypeVector[clusterIndex]->assosiatedPointIndexVector[i]]) ))*currentPower ;
        }
    
        outcomePoint.noalias() = outcomePoint/=membershipSum;
        return outcomePoint; //.ConvertToMatrix();
    }
    

    As You can see, these functions performs a lot of matrix operations. That is why I thought using Eigen would speed up my application. Unfortunately (as I mentioned above), the program works slower.

    Is there any way to speed up these functions?

    Maybe if I used DirectX matrix operations I would get better performance ?? (however I have a laptop with integrated graphic card).

  • David Rodríguez - dribeas
    David Rodríguez - dribeas almost 13 years
    +1, Sound advice, also, in general all compiler optimizations should be turned on for this type of tests.
  • george
    george almost 13 years
    I use Eigen 3.0.1, however I didn't turn on "Enable Enhanced instruction set". I'll try this
  • george
    george almost 13 years
    It will be hard to use fixed size type becuase almost all my matrices ave size [2,1] - two rows and one columns. So far I only found 2x2 3x3 fixed sizes
  • timday
    timday almost 13 years
    Eigen would call that a Vector2d, rather than a matrix. I'm surprised if it's not already defined (since Vector2d is mentioned in the Vectorization docs above as being 16 bytes and as being SSE compatible). If you have to define your own, all it'll be is a typedef Matrix<double, 2, 1> Vector2d;
  • george
    george almost 13 years
    Changing MatrixXd into Vector2d and Matrix2d I got only 2s better time. No it is 85 s instead of 87 s.Still slower than if I use my own matrix class. Strange :(
  • Hanno S.
    Hanno S. about 12 years
    Also Eigen does much bounds and alignment checking when NDEBUG or EIGEN_NO_DEBUG is not defined.
  • Christian Aichinger
    Christian Aichinger over 11 years
    You can create fixed-size matrices of any dimension using Eigen::Matrix<double, n_rows, n_cols>.
  • Ruslan
    Ruslan over 7 years
    Note that -o2 means "output file is 2". For optimization options with GCC use -O2 and the like (note the capital 'O' instead of small 'o').
  • Mark
    Mark over 4 years
    It apparently doesn't even perform well even when you turn on optimization (at least for small matrices) stackoverflow.com/questions/58071344/…