Two-sample Kolmogorov-Smirnov Test in Python Scipy
Solution 1
You are using the one-sample KS test. You probably want the two-sample test ks_2samp
:
>>> from scipy.stats import ks_2samp
>>> import numpy as np
>>>
>>> np.random.seed(12345678)
>>> x = np.random.normal(0, 1, 1000)
>>> y = np.random.normal(0, 1, 1000)
>>> z = np.random.normal(1.1, 0.9, 1000)
>>>
>>> ks_2samp(x, y)
Ks_2sampResult(statistic=0.022999999999999909, pvalue=0.95189016804849647)
>>> ks_2samp(x, z)
Ks_2sampResult(statistic=0.41800000000000004, pvalue=3.7081494119242173e-77)
Results can be interpreted as following:
You can either compare the
statistic
value given by python to the KS-test critical value table according to your sample size. Whenstatistic
value is higher than the critical value, the two distributions are different.Or you can compare the
p-value
to a level of significance a, usually a=0.05 or 0.01 (you decide, the lower a is, the more significant). If p-value is lower than a, then it is very probable that the two distributions are different.
Solution 2
This is what the scipy docs say:
If the K-S statistic is small or the p-value is high, then we cannot reject the hypothesis that the distributions of the two samples are the same.
Cannot reject doesn't mean we confirm.
Related videos on Youtube
Akavall
I like programming, machine learning, statistics, all kinds of problem solving, and I play chess. My github
Updated on April 23, 2021Comments
-
Akavall about 3 years
I can't figure out how to do a Two-sample KS test in Scipy.
After reading the documentation scipy kstest
I can see how to test where a distribution is identical to standard normal distribution
from scipy.stats import kstest import numpy as np x = np.random.normal(0,1,1000) test_stat = kstest(x, 'norm') #>>> test_stat #(0.021080234718821145, 0.76584491300591395)
Which means that at p-value of 0.76 we can not reject the null hypothesis that the two distributions are identical.
However, I want to compare two distributions and see if I can reject the null hypothesis that they are identical, something like:
from scipy.stats import kstest import numpy as np x = np.random.normal(0,1,1000) z = np.random.normal(1.1,0.9, 1000)
and test whether x and z are identical
I tried the naive:
test_stat = kstest(x, z)
and got the following error:
TypeError: 'numpy.ndarray' object is not callable
Is there a way to do a two-sample KS test in Python? If so, how should I do it?
Thank You in Advance
-
cval almost 12 yearsCould you post the line and traceback?
-
-
Akavall almost 12 yearsThat's exactly what I was looking for. Thank You Very Much!
-
FaCoffee about 7 yearsHow do you interpret these results? Can you say the samples come from the same distribution just by looking at
statistic
andp-value
? -
user2738815 about 7 years@FaCoffee This is what the scipy docs say: "If the K-S statistic is small or the p-value is high, then we cannot reject the hypothesis that the distributions of the two samples are the same."
-
King Reload about 7 yearscould you explain your answer in further detail? thanks in advance!
-
MD Abid Hasan about 6 years@KingReload It means when the p value is very small, that says the probability of these two samples Not coming from the same distribution is very low. In another word, the probability of these two sample coming from same distribution is very high. But you can not be 100% sure about that hence p values are never zero. (Sometimes they show as 0, but actually, it's never zero). That's why it is said that We failed to reject the null hypothesis instead of We are accepting the null hypothesis. Accepting null hypothesis = distributions of the two samples are the same
-
superhero about 6 yearsp-value high very likely they come from the same distribution, p-value small likely they don't. @MDAbidHasan has it backwards. Indeed, the example in the documentation they give an example:
For an identical distribution, we cannot reject the null hypothesis since the p-value is high, 41%: >>> >>> rvs4 = stats.norm.rvs(size=n2, loc=0.0, scale=1.0) >>> stats.ks_2samp(rvs1, rvs4) (0.07999999999999996, 0.41126949729859719)
-
Rajesh Ahir almost 2 yearsWhat value of K-S statistic can be considered as 'small'? For P-value, we can consider the significance level (i.e. 0.05 or 0.01) to interpret the p-value.