Velvet Star Monitor

Standout celebrity highlights with iconic style.

general

Two-Sample Kolmogorov-Smirnov Test

Writer Andrew Mclaughlin
$\begingroup$

I am trying to understand the Two-Sample Kolmogorov-Smirnov Test. Somehow no where are good examples connecting math and real example especially having to different distributions. Does someone knows a example to find or can give me one?

I added an example into my question and would like to check whether I have the same understanding and how do I calculate the p-value now?:

ID Sample X Sample Y Cum F(X) Cum F(Y) Diff
1 4 1 0.026490066 0.008196721 0.018293345
2 28 18 0.21192053 0.155737705 0.056182825
3 24 25 0.370860927 0.360655738 0.010205189
4 21 5 0.509933775 0.401639344 0.108294431
5 23 13 0.662251656 0.508196721 0.154054934
6 12 7 0.741721854 0.56557377 0.176148084
7 7 20 0.78807947 0.729508197 0.058571273
8 23 13 0.940397351 0.836065574 0.104331777
9 9 20 1 1 0
Sum 151 122 D-stat 0.176148084
Count 9 9 D-crit 0.64021448 Significance No No H_0 the samples come from P, Yes H_1 the samples do not come from P

To explain in math I did the following:

I have two samples (X and Y) and I would like to test if their distributions are the same.

  • $X = Sample$ X
  • $Y = Sample$ Y
  • $F(X_i) = \frac{X_i}{N};$ Observed cumulative frequency distribution of a random sample of n observations; (No.of observations ≤ X)/(sum observations)
  • $F(Y_i) = \frac{X_i}{N};$ Observed cumulative frequency distribution of a random sample of n observations; No.of observations ≤ Y)/(sum observations)
  • $F(Y_i) = \frac{Y_i}{N};$ Observed
  • $n_X = \sum_{i=1}^{n}{X_i}$; $n_Y = \sum_{i=1}^{n}{Y_i}$
  • $D-stat = max(F(X) - F(Y))$
  • $D-cri = c(\alpha)\sqrt(\frac{n_X+n_Y}{n_X*n_Y})$
  • Hypothesis check: if D-Stat > D-Crit H0 will be rejected
  • 95% significance level, alpha 0.05, $c(\alpha)$ = 1.3581
$\endgroup$ 3

1 Answer

$\begingroup$

This process is also described on the English Wikipedia.

Construct CDFs:

  • Sort. \begin{align*} X&: (4, 7, 9, 12, 21, 23, 23, 24, 28) \\ Y&: (1, 5, 7, 13, 13, 18, 20, 20, 25) \end{align*}
  • Construct CDFs. These should be your Cum F(...)s \begin{align*} CDF(X) &= \begin{cases} 0 & \phantom{4\leq{}} x <4 \\ \frac{1}{9} & 4\leq x<7 \\ \frac{2}{9} & 7\leq x<9 \\ \frac{1}{3} & 9\leq x<12 \\ \frac{4}{9} & 12\leq x<21 \\ \frac{5}{9} & 21\leq x<23 \\ \frac{7}{9} & 23\leq x<24 \\ \frac{8}{9} & 24\leq x<28 \\ 1 & 28 \leq x \end{cases} \\ CDF(Y) &= \begin{cases} 0 & \phantom{1\leq{}}x < 1 \\ \frac{1}{9} & 1\leq x<5 \\ \frac{2}{9} & 5\leq x<7 \\ \frac{1}{3} & 7\leq x<13 \\ \frac{5}{9} & 13\leq x<18 \\ \frac{2}{3} & 18\leq x<20 \\ \frac{8}{9} & 20\leq x<25 \\ 1 & 25 \leq x \end{cases} \end{align*}Let's plot these.CDFs plotted on same axes
  • Now we compute $|\mathrm{CDF}(X) - \mathrm{CDF}(Y)|$, marking the global maximum.$$ |\mathrm{CDF}(X) - \mathrm{CDF}(Y)| = \begin{cases} 0 & \phantom{1\leq{}}x < 1 \\ \frac{1}{9} & 1\leq x<4 \\ 0 & 4\leq x<5 \\ \frac{1}{9} & 5\leq x<9 \\ 0 & 9\leq x<12 \\ \frac{1}{9} & 12\leq x<18 \\ \frac{2}{9} & 18\leq x<20 \\ \frac{4}{9} \ast & 20\leq x<21 \\ \frac{1}{3} & 21\leq x<23 \\ \frac{1}{9} & 23\leq x<24 \\ 0 & 24\leq x<25 \\ \frac{1}{9} & 25\leq x<28 \\ 0 & 28\leq x \end{cases} $$
  • So your test statistic is $4/9 = 0.\overline{4}$. As you have calculated, the critical value at the $\alpha = 0.05$ level is $0.64021{\dots}$. Since the test statistic is less than the critical value, the null hypothesis (that the two samples are drawn from the same distribution) is not rejected.
  • $p$-values are typically either provided by software or found in tables. For example, KolmogorovSmirnovTest[] in Mathematica 11.3 finds the $p$-value for this test statistic for samples of sizes $(9,9)$ is $0.27396{\dots}$. The R {stats} package implements the test and $p$-value computation in ks.test. Python's SciPy implements these calculations as scipy.stats.ks_2samp(). There is even an Excel implementation called KS2TEST.
$\endgroup$

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy