How to transpose a matrix in an optimal way using blas?

10,571

BLAS doesn't have a matrix transpose routine built in. The CUDA SDK includes a matrix transpose example with a paper which discusses optimal strategy for performing a transpose. Your best strategy is probably to use row major inputs to CUBLAS with the transpose input version of the calls, then perform the intermediate calculations in column major, and lastly perform a transpose operation afterwards using the SDK transpose kernel.


Edited to add that CUBLAS added a transpose routine in CUBLAS version 5, geam, which can performed matrix transposition in GPU memory and should be regarded as optimal for whatever architecture you are using.

Share:
10,571
Martin Kristiansen
Author by

Martin Kristiansen

Updated on June 15, 2022

Comments

  • Martin Kristiansen
    Martin Kristiansen almost 2 years

    I'm doing some calculations, and doing some analysis on the forces and weakness of different BLAS implementations. however I have come across a problem.

    I'm testing cuBlas, doing linAlg on the GPU would seem like a good idea, but there is one problem.

    The cuBlas implementation using column-major format, and since this is not what I need in the end, I'm curious if there is a way in with one can make BLAS do matrix-transpose?