Numpy linear regression with regularization

python numpy machine-learning linear-regression

12,859

The problem is:

features.transpose().dot(features) may not be invertible. And numpy.linalg.inv works only for full-rank matrix according to the documents. However, a (non-zero) regularization term always makes the equation nonsingular.

By the way, you are right about the implementation. But it is not efficient. An efficient way to solve this equation is the least squares method.

np.linalg.lstsq(features, labels) can do the work for np.linalg.pinv(features).dot(labels).

In a general way, you can do this

def get_model(A, y, lamb=0):
    n_col = A.shape[1]
    return np.linalg.lstsq(A.T.dot(A) + lamb * np.identity(n_col), A.T.dot(y))

12,859

Author by

Marshall Farrier

Updated on July 25, 2022

Comments

Marshall Farrier almost 2 years
I'm not seeing what is wrong with my code for regularized linear regression. Unregularized I have simply this, which I'm reasonably certain is correct:
```
import numpy as np

def get_model(features, labels):
    return np.linalg.pinv(features).dot(labels)
```
Here's my code for a regularized solution, where I'm not seeing what is wrong with it:
```
def get_model(features, labels, lamb=0.0):
    n_cols = features.shape[1]
    return linalg.inv(features.transpose().dot(features) + lamb * np.identity(n_cols))\
            .dot(features.transpose()).dot(labels)
```
With the default value of 0.0 for lamb, my intention is that it should give the same result as the (correct) unregularized version, but the difference is actually quite large.

Does anyone see what the problem is?