Can I make a logarithmic regression on sklearn?

15,467

Solution 1

If I understand correctly, you want to fit the data with a function like y = a * exp(-b * (x - c)) + d.

I am not sure if sklearn can do it. But you can use scipy.optimize.curve_fit() to fit your data with whatever the function you define.(scipy):

For your case, I experimented with your data and here is the result:

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

my_data = np.genfromtxt('yourdata.csv', delimiter=',')
my_data = my_data[my_data[:,0].argsort()]
xdata = my_data[:,0].transpose()
ydata = my_data[:,1].transpose()

# define a function for fitting
def func(x, a, b, c, d):
    return a * np.exp(-b * (x - c)) + d

init_vals = [50, 0, 90, 63]
# fit your data and getting fit parameters
popt, pcov = curve_fit(func, xdata, ydata, p0=init_vals, bounds=([0, 0, 90, 0], [1000, 0.1, 200, 200]))
# predict new data based on your fit
y_pred = func(200, *popt)
print(y_pred)

plt.plot(xdata, ydata, 'bo', label='data')
plt.plot(xdata, func(xdata, *popt), '-', label='fit')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.show()

plot from the code above

I found that the initial value for b is critical for fitting. I estimated a small range for it and then fit the data.

If you have no priori knowledge of the relationship between x and y, you can use the regression methods provided by sklearn, like linear regression, Kernel ridge regression (KRR), Nearest Neighbors Regression, Gaussian Process Regression etc. to fit nonlinear data. Find the documentation here

Solution 2

You are looking at exponentially distributed data.

You can transform your y-variable by log and then use linear regression. This works because large values of y are compressed more than smaller values.

import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import expon

x = np.linspace(1, 10, 10)
y = np.array([30, 20, 12, 8, 7, 4, 3, 2, 2, 1])
y_fit = expon.pdf(x, scale=2)*100

fig = plt.figure()
ax = fig.add_subplot(111)
ax.scatter(x, y)
ax.plot(x, y_fit)
ax.set_ylabel('y (blue)')
ax.grid(True)

ax2 = ax.twinx()
ax2.scatter(x, np.log(y), color='red')
ax2.set_ylabel('log(y) (red)')

plt.show()

enter image description here

Solution 3

To use sklearn, you can first remodel your case y = Aexp(-BX) to ln(Y) = ln(A) - BX, and then use LinearRegressor to train and fit your data.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

### Read Data
df = pd.read_csv('data.csv')

### Prepare X, Y & ln(Y)
X = df.sort_values(by=['x']).loc[:, 'x':'x']
Y = df.sort_values(by=['x']).loc[:, 'y':'y']
ln_Y = np.log(Y)

### Use the relation ln(Y) = ln(A) - BX to fit X to ln(Y)
from sklearn.linear_model import LinearRegression
exp_reg = LinearRegression()
exp_reg.fit(X, ln_Y)
#### You can introduce weights as well to apply more bias to the smaller X values, 
#### I am transforming X arbitrarily to apply higher arbitrary weights to smaller X values
exp_reg_weighted = LinearRegression()
exp_reg_weighted.fit(X, ln_Y, sample_weight=np.array(1/((X - 100).values**2)).reshape(-1))

### Get predicted values of Y
Y_pred = np.exp(exp_reg.predict(X))
Y_pred_weighted = np.exp(exp_reg_weighted.predict(X))

### Plot
plt.scatter(X, Y)
plt.plot(X, Y_pred, label='Default')
plt.plot(X, Y_pred_weighted, label='Weighted')
plt.xlabel('X')
plt.ylabel('Y')
plt.legend()

plt.show()

enter image description here

Share:
15,467

Related videos on Youtube

Alvaro Hernandorena
Author by

Alvaro Hernandorena

Updated on June 04, 2022

Comments

  • Alvaro Hernandorena
    Alvaro Hernandorena almost 2 years

    I don't know if "logarithmic regression" is the right term, I need to fit a curve on my data, like a polynomial curve but going flat on the end.

    Here is an image, the blue curve is what I have (2nd order polynomial regression) and the magenta curve is what I need.

    enter image description here

    I have search a lot and can't find that, only linear regression, polynomial regression, but no logarithmic regression on sklearn. I need to plot the curve and then make predictions with that regression.

    EDIT

    Here is the data for the plot image that I posted:

    x,y
    670,75
    707,46
    565,47
    342,77
    433,73
    472,46
    569,52
    611,60
    616,63
    493,67
    572,11
    745,12
    483,75
    637,75
    218,251
    444,72
    305,75
    746,64
    444,98
    342,117
    272,85
    128,275
    500,75
    654,65
    241,150
    217,150
    426,131
    155,153
    841,66
    737,70
    722,70
    754,60
    664,60
    688,60
    796,55
    799,62
    229,150
    232,95
    116,480
    340,49
    501,65
    
    • Greg Reda
      Greg Reda over 6 years
      Can you post some sample data (or code to generate example data)? Might you be able to do a transform on the underlying data and then fit your model?
    • Alvaro Hernandorena
      Alvaro Hernandorena over 6 years
      There, I added the data
  • Alvaro Hernandorena
    Alvaro Hernandorena over 6 years
    ok, so no sklearn needed? but how do I do a prediction based on that?
  • binjip
    binjip over 6 years
    You can still use scikit-learn LinearRegression for the regression. Or you can check out the statsmodels library. Say you want to make a prediction yhat = alpha+beta*x0. You would have to transform yhat back into your space, i.e. np.exp(yhat)
  • binjip
    binjip over 6 years
    I just found this great explanation.
  • Alvaro Hernandorena
    Alvaro Hernandorena over 6 years
    Ok so I think I understand. Take my data and make it linear by applying log function, then make a lineal regression on that transformed data, predict , and finally transform predicted value applying exp function. Is that right??
  • Alvaro Hernandorena
    Alvaro Hernandorena over 6 years
    Yeah I think that's it, thanks I will try it. By the way; is there a scipy method to give it data and make it decide what model use? Automatically make lineal, polynomial, logarithmic, etc, check what's best and apply that model? Or I have to do it manually??
  • Jack Chi
    Jack Chi over 6 years
    @AlvaroHernandorena I don't think there is a method to do it automatically. But you can write a script by yourself by defining a few functions and then follow the codes in my answers.
  • Alvaro Hernandorena
    Alvaro Hernandorena over 6 years
    tnx for ur answer, I ended up doing that, the curve_fit with custom func, it didn't work at first it just keep says that it couldn't find the parameters, until I started to play with the bounds, and after a while I understood that 'a + d' is y when x = 0, and 'b' also is important so I set bounds using relations from the data (I found max and min values of x and y on the data and use that; a = 3*maxX, b = 10*maxX , c = minY *3 ) and now is working perfectly. thank you again!
  • binjip
    binjip over 6 years
    That's correct. You should also plot the log-transformed data to see if the fit is truly linear. You might still need to use poly fit but the fit will be much better than with the original data.
  • taga
    taga about 5 years
    Is there something like logarithmic transformation, like polynomial features? stackoverflow.com/questions/54949969/…
  • Redhwan
    Redhwan over 3 years
    For me, I change this line, it worked fine: exp_reg_weighted.fit(X, ln_Y, sample_weight=np.array(1/((X - 100)**2)).reshape(-1))
  • ijoseph
    ijoseph over 3 years
    This could be improved by plotting y_pred for x-values beyond those that happen to exist in xdata. e.g. 1000 predicted datapoints: x_pred = np.linspace(min(xdata), max(xdata), num=1000); y_pred = func(x_pred, *popt); plt.plot(x_pred, y_pred, '-', label='fit')