How do I apply scikit-learn's LogisticRegression for some decimal data?

25,696

Use the predict_proba method to get probabilities. predict gives class labels.

>>> lr = LogisticRegression()
>>> X = np.random.randn(3, 4)
>>> y = [1, 0, 0]
>>> lr.fit(X, y)
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, penalty='l2', random_state=None, tol=0.0001)
>>> lr.predict_proba(X[0])
array([[ 0.49197272,  0.50802728]])

(If you had read the documentation, you would have found this out.)

Share:
25,696
WoooHaaaa
Author by

WoooHaaaa

Updated on July 09, 2022

Comments

  • WoooHaaaa
    WoooHaaaa almost 2 years

    I've the training data set like this:

    0.00479616 |  0.0119904 |  0.00483092 |  0.0120773 | 1
    0.51213136 |  0.0113404 |  0.02383092 |  -0.012073 | 0
    0.10479096 |  -0.011704 |  -0.0453692 |  0.0350773 | 0
    

    The first 4 columns is features of one sample and the last column is its output.

    I use scikit this way :

      data = np.array(data)
      lr = linear_model.LogisticRegression(C=10)
    
      X = data[:,:-1]
      Y = data[:,-1]
      lr.fit(X, Y)
    
      print lr
      # The output is always 1 or 0, not a probability number.
      print lr.predict(data[0][:-1])
    

    I thought Logistic Regression always should gives a probability number between 0 and 1.

  • WoooHaaaa
    WoooHaaaa over 10 years
    Thanks so much, do you know how to evaluate the quality of the predictions ? The easiest way ...
  • Fred Foo
    Fred Foo over 10 years
    @MrROY: in the most recent version of scikit-learn, 0.14a1, there's a function log_loss in sklearn.metrics which gives the negative log-likelihood of predict_proba output.
  • user3378649
    user3378649 about 10 years
    Does X[0] represent predicted event based on the first column, or the overall columns (X has 3 columns here)
  • Fred Foo
    Fred Foo about 10 years
    The rows in the predict_proba output correspond to rows in X. The columns correspond to classes.