C++ Memory Efficient Solution for Ax=b Linear Algebra System

c++ boost linear-algebra lapack umfpack

10,326

Solution 1

Short answer: Don't use Boost's LAPACK bindings, these were designed for dense matrices, not sparse matrices, use UMFPACK instead.

Long answer: UMFPACK is one of the best libraries for solving Ax=b when A is large and sparse.

Below is sample code (based on umfpack_simple.c) that generates a simple A and b and solves Ax = b.

#include <stdlib.h>
#include <stdio.h>
#include "umfpack.h"

int    *Ap; 
int    *Ai;
double *Ax; 
double *b; 
double *x; 

/* Generates a sparse matrix problem: 
   A is n x n tridiagonal matrix
   A(i,i-1) = -1;
   A(i,i) = 3; 
   A(i,i+1) = -1; 
*/
void generate_sparse_matrix_problem(int n){
  int i;  /* row index */ 
  int nz; /* nonzero index */
  int nnz = 2 + 3*(n-2) + 2; /* number of nonzeros*/
  int *Ti; /* row indices */ 
  int *Tj; /* col indices */ 
  double *Tx; /* values */ 

  /* Allocate memory for triplet form */
  Ti = malloc(sizeof(int)*nnz);
  Tj = malloc(sizeof(int)*nnz);
  Tx = malloc(sizeof(double)*nnz);

  /* Allocate memory for compressed sparse column form */
  Ap = malloc(sizeof(int)*(n+1));
  Ai = malloc(sizeof(int)*nnz);
  Ax = malloc(sizeof(double)*nnz);

  /* Allocate memory for rhs and solution vector */
  x = malloc(sizeof(double)*n);
  b = malloc(sizeof(double)*n);

  /* Construct the matrix A*/
  nz = 0;
  for (i = 0; i < n; i++){
    if (i > 0){
      Ti[nz] = i;
      Tj[nz] = i-1;
      Tx[nz] = -1;
      nz++;
    }

    Ti[nz] = i;
    Tj[nz] = i;
    Tx[nz] = 3;
    nz++;

    if (i < n-1){
      Ti[nz] = i;
      Tj[nz] = i+1;
      Tx[nz] = -1;
      nz++;
    }
    b[i] = 0;
  }
  b[0] = 21; b[1] = 1; b[2] = 17;
  /* Convert Triplet to Compressed Sparse Column format */
  (void) umfpack_di_triplet_to_col(n,n,nnz,Ti,Tj,Tx,Ap,Ai,Ax,NULL);

  /* free triplet format */ 
  free(Ti); free(Tj); free(Tx);
}


int main (void)
{
    double *null = (double *) NULL ;
    int i, n;
    void *Symbolic, *Numeric ;
    n = 500000;
    generate_sparse_matrix_problem(n);
    (void) umfpack_di_symbolic (n, n, Ap, Ai, Ax, &Symbolic, null, null);
    (void) umfpack_di_numeric (Ap, Ai, Ax, Symbolic, &Numeric, null, null);
    umfpack_di_free_symbolic (&Symbolic);
    (void) umfpack_di_solve (UMFPACK_A, Ap, Ai, Ax, x, b, Numeric, null, null);
    umfpack_di_free_numeric (&Numeric);
    for (i = 0 ; i < 10 ; i++) printf ("x [%d] = %g\n", i, x [i]);
    free(b); free(x); free(Ax); free(Ai); free(Ap);
    return (0);
}

The function generate_sparse_matrix_problem creates the matrix A and the right-hand side b. The matrix is first constructed in triplet form. The vectors Ti, Tj, and Tx fully describe A. Triplet form is easy to create but efficient sparse matrix methods require Compressed Sparse Column format. Conversion is performed with umfpack_di_triplet_to_col.

A symbolic factorization is performed with umfpack_di_symbolic. A sparse LU decomposition of A is performed with umfpack_di_numeric. The lower and upper triangular solves are performed with umfpack_di_solve.

With n as 500,000, on my machine, the entire program takes about a second to run. Valgrind reports that 369,239,649 bytes (just a little over 352 MB) were allocated.

Note this page discusses Boost's support for sparse matrices in Triplet (Coordinate) and Compressed format. If you like, you can write routines to convert these boost objects to the simple arrays UMFPACK requires as input.

Solution 2

Assuming your huge matrices are sparse, which I hope they are at that size, have a look at the PARDISO project which is a sparse linear solver, this is what you'll need if you want to handle matrices as big as you said. Allows efficient storage of only non-zero values, and is much faster than solving the same system of dense matrices.

Solution 3

I assume that your matrix is dense. If it is sparse, you can find numerous specialised algorithms as already mentioned by DeusAduro and duffymo.

If you don't have a (large enough) cluster at your disposal, you want to look at out-of-core algorithms. ScaLAPACK has a few out-of-core solvers as part of its prototype package, see the documentation here and Google for more details. Searching the web for "out-of-core LU / (matrix) solvers / packages" will give you links to a wealth of further algorithms and tools. I am not an expert on those.

For this problem, most people would use a cluster, however. The package you will find on almost any cluster is ScaLAPACK, again. In addition, there are usually numerous other packages on the typical cluster, so you can pick and choose what suits your problem (examples here and here).

Before you start coding, you probably want to quickly check how long it will take to solve your problem. A typical solver takes about O(3*N^3) flops (N is dimension of matrix). If N = 100000, you are hence looking at 3000000 Gflops. Assuming that your in-memory solver does 10 Gflops/s per core, you are looking at 3 1/2 days on a single core. As the algorithms scale well, increasing the number of cores should reduce the time close to linearly. On top of that comes the I/O.

Solution 4

Have a look at the list of freely available software for the solution of linear algebra problems, compiled by Jack Dongarra and Hatem Ltaief.

I think that for the problem size you're looking at, you probably need an iterative algorithm. If you don't want to store the matrix A in a sparse format, you can use a matrix-free implementation. Iterative algorithms typically do not need to access individual entries of the matrix A, they only need to compute matrix-vector products Av (and sometimes A^T v, the product of the transposed matrix with the vector). So if the library is well-designed, it should be enough if you pass it a class that knows how to do matrix-vector products.

Solution 5

Not sure about C++ implementations, but there are several things you can do if memory is an issue depending on the type of matrix you're dealing with:

If your matrix is sparse or banded, you can use a sparse or bandwidth solver. These don't store zero elements outside the band.
You can use a wavefront solver, which stores the matrix on disk and only brings in the matrix wavefront for decomposition.
You can avoid solving the matrix altogether and use iterative methods.
You can try Monte Carlo methods of solution.

View more solutions

10,326

Danf

Updated on June 04, 2022

Comments

Danf about 2 years

I am using Numeric Library Bindings for Boost UBlas to solve a simple linear system. The following works fine, except it is limited to handling matrices A(m x m) for relatively small 'm'.

In practice I have a much larger matrix with dimension m= 10^6 (up to 10^7).
Is there existing C++ approach for solving Ax=b that uses memory efficiently.

#include<boost/numeric/ublas/matrix.hpp>
#include<boost/numeric/ublas/io.hpp>
#include<boost/numeric/bindings/traits/ublas_matrix.hpp>
#include<boost/numeric/bindings/lapack/gesv.hpp>
#include <boost/numeric/bindings/traits/ublas_vector2.hpp>

// compileable with this command


//g++ -I/home/foolb/.boost/include/boost-1_38 -I/home/foolb/.boostnumbind/include/boost-numeric-bindings solve_Axb_byhand.cc -o solve_Axb_byhand -llapack


namespace ublas = boost::numeric::ublas;
namespace lapack= boost::numeric::bindings::lapack;


int main()
{
    ublas::matrix<float,ublas::column_major> A(3,3);
    ublas::vector<float> b(3);


    for(unsigned i=0;i < A.size1();i++)
        for(unsigned j =0;j < A.size2();j++)
        {
            std::cout << "enter element "<<i << j << std::endl;
            std::cin >> A(i,j);
        }

    std::cout << A << std::endl;

    b(0) = 21; b(1) = 1; b(2) = 17;

    lapack::gesv(A,b);

    std::cout << b << std::endl;


    return 0;
}

cyberconte almost 15 years

Pointing out the obvious, a matrix that size array is 4x10^12 to 4x10^14 bytes, or 4 to 400 terabytes for a single matrix alone. (unless, as noted below, its sparse)

dmckee --- ex-moderator kitten almost 15 years

Not to mention the O(m^3) time complexity of the naive solution! Even the clever one Knuth talks about is O(m^2.7ish)... If these matricies aren't sparse you need a cluster and a first class numeric analyist...
Luka Rahne almost 15 years

+1 for sparse matrix idea. I found numerus libraries and ther comparisons in PARDISO paper about comparing varous sparse matrix libraries ftp.numerical.rl.ac.uk/pub/reports/ghsRAL200505.pdf This can be used to find other recognised sparse matrix libraries.
Danf almost 15 years

@duffymo: thanks. I have looked at iterative approach implementation in C++ but they still require storing it in matrix. freenet-homepage.de/guwi17/ublas/examples If I am wrong, Do you know any mem efficient implementation in C++ for iterative?
duffymo almost 15 years

Correct, foolishbrat. I should have remembered that. I'd investigate parallel algorithms, because the problem of partitioning the work out to N processors and knitting it back together to get the result is germane to the problem of moving it temporarily out to disk.
stephan almost 15 years

Caveat: the above O(3*N^3) assumes that you use complex numbers. For real numbers, divide everything by 6, i.e. somewhere around O(0.5 * N^3).