scikit-learn return value of LogisticRegression.predict_proba
4.65761066e-03 + 9.95342389e-01 = 1
9.75851270e-01 + 2.41487300e-02 = 1
9.99983374e-01 + 1.66258341e-05 = 1
The first column is the probability that the entry has the -1
label and the second column is the probability that the entry has the +1
label. Note that classes are ordered as they are in self.classes_.
If you would like to get the predicted probabilities for the positive label only, you can use logistic_model.predict_proba(data)[:,1]
. This will yield you the [9.95342389e-01, 2.41487300e-02, 1.66258341e-05]
result.
Zelphir Kaltstahl
Updated on August 03, 2020Comments
-
Zelphir Kaltstahl over 3 years
What exactly does the
LogisticRegression.predict_proba
function return?In my example I get a result like this:
[[ 4.65761066e-03 9.95342389e-01] [ 9.75851270e-01 2.41487300e-02] [ 9.99983374e-01 1.66258341e-05]]
From other calculations, using the sigmoid function, I know, that the second column are probabilities. The documentation says, that the first column are
n_samples
, but that can't be, because my samples are reviews, which are texts and not numbers. The documentation also says, that the second column aren_classes
. That certainly can't be, since I only have two classes (namely+1
and-1
) and the function is supposed to be about calculating probabilities of samples really being of a class, but not the classes themselves.What is the first column really and why it is there?
-
Zelphir Kaltstahl about 8 yearsI totally didn't see that! Thanks for the quick clarification. I now wonder more than before what the documentation is talking about.
-
Sander van den Oord about 8 yearsThe documentation says the following: returns the probability of the sample for each class in the model. @Zelphir: you saw in the docs: [n_samples, n_classes]. This refers to the output: it will return a matrix, where the rows are the samples, and the columns the classes (-1, 1). As Iulian said: you will get for every row a probability prediction for class being -1 and a probabilty for class being 1.
-
Reihan_amn over 5 yearsHow do we check the order of the classes? I mean how do you know that the first column is the probability of the class of -1?
-
akalanka about 5 yearsIs there a way to determine the probability score for the sample from the probability for classes?
-
Whole Brain over 3 years@Reihan_amn If you read the pydoc, or if you take a look at the source code, of predict_proba(), you can read :
Returns p : array of shape (n_samples, n_classes) [..] The class probabilities of the input samples. The order of the classes corresponds to that in the attribute 'classes_'.