Two-sample Kolmogorov-Smirnov Test in Python Scipy

python numpy scipy statistics distribution

94,378

Solution 1

You are using the one-sample KS test. You probably want the two-sample test ks_2samp:

>>> from scipy.stats import ks_2samp
>>> import numpy as np
>>> 
>>> np.random.seed(12345678)
>>> x = np.random.normal(0, 1, 1000)
>>> y = np.random.normal(0, 1, 1000)
>>> z = np.random.normal(1.1, 0.9, 1000)
>>> 
>>> ks_2samp(x, y)
Ks_2sampResult(statistic=0.022999999999999909, pvalue=0.95189016804849647)
>>> ks_2samp(x, z)
Ks_2sampResult(statistic=0.41800000000000004, pvalue=3.7081494119242173e-77)

Results can be interpreted as following:

You can either compare the statistic value given by python to the KS-test critical value table according to your sample size. When statistic value is higher than the critical value, the two distributions are different.
Or you can compare the p-value to a level of significance a, usually a=0.05 or 0.01 (you decide, the lower a is, the more significant). If p-value is lower than a, then it is very probable that the two distributions are different.

Solution 2

This is what the scipy docs say:

If the K-S statistic is small or the p-value is high, then we cannot reject the hypothesis that the distributions of the two samples are the same.

Cannot reject doesn't mean we confirm.

94,378

Akavall

I like programming, machine learning, statistics, all kinds of problem solving, and I play chess. My github

Updated on April 23, 2021

Comments

Akavall about 3 years
I can't figure out how to do a Two-sample KS test in Scipy.

After reading the documentation scipy kstest

I can see how to test where a distribution is identical to standard normal distribution
```
from scipy.stats import kstest
import numpy as np

x = np.random.normal(0,1,1000)
test_stat = kstest(x, 'norm')
#>>> test_stat
#(0.021080234718821145, 0.76584491300591395)
```
Which means that at p-value of 0.76 we can not reject the null hypothesis that the two distributions are identical.

However, I want to compare two distributions and see if I can reject the null hypothesis that they are identical, something like:
```
from scipy.stats import kstest
import numpy as np

x = np.random.normal(0,1,1000)
z = np.random.normal(1.1,0.9, 1000)
```
and test whether x and z are identical

I tried the naive:
```
test_stat = kstest(x, z)
```
and got the following error:
```
TypeError: 'numpy.ndarray' object is not callable
```
Is there a way to do a two-sample KS test in Python? If so, how should I do it?

Thank You in Advance
- cval almost 12 years
  
  Could you post the line and traceback?
Akavall almost 12 years

That's exactly what I was looking for. Thank You Very Much!
FaCoffee about 7 years

How do you interpret these results? Can you say the samples come from the same distribution just by looking at statistic and p-value?
user2738815 about 7 years

@FaCoffee This is what the scipy docs say: "If the K-S statistic is small or the p-value is high, then we cannot reject the hypothesis that the distributions of the two samples are the same."
King Reload about 7 years

could you explain your answer in further detail? thanks in advance!
MD Abid Hasan about 6 years

@KingReload It means when the p value is very small, that says the probability of these two samples Not coming from the same distribution is very low. In another word, the probability of these two sample coming from same distribution is very high. But you can not be 100% sure about that hence p values are never zero. (Sometimes they show as 0, but actually, it's never zero). That's why it is said that We failed to reject the null hypothesis instead of We are accepting the null hypothesis. Accepting null hypothesis = distributions of the two samples are the same
superhero about 6 years

p-value high very likely they come from the same distribution, p-value small likely they don't. @MDAbidHasan has it backwards. Indeed, the example in the documentation they give an example: For an identical distribution, we cannot reject the null hypothesis since the p-value is high, 41%: >>> >>> rvs4 = stats.norm.rvs(size=n2, loc=0.0, scale=1.0) >>> stats.ks_2samp(rvs1, rvs4) (0.07999999999999996, 0.41126949729859719)
Rajesh Ahir almost 2 years

What value of K-S statistic can be considered as 'small'? For P-value, we can consider the significance level (i.e. 0.05 or 0.01) to interpret the p-value.