Kolmogorov–Smirnov test (K–S test or KS test)
The Kolmogorov-Smirnov statistic determines the probability of two samples coming from the same distribution. It does not make any assumptions about the underlying distribution of the data, making it a nonparametric test.
The K-S test compares the empirical cumulative distribution functions (CDF) of two samples. The empirical CDF of a sample is a function that gives the fraction of the sample that is less than or equal to each value.
A level of significance needs to be specified to perform the K-S test, which is the probability of rejecting the null hypothesis (that the two samples come from the same distribution) when it is actually true. The null hypothesis is rejected, and it is determined that the two samples come from different distributions if the K-S test statistic is greater than the critical value for the specified level of significance.
The value of test statistic 'D' is calculated as:
Dn,m =Maximum |P(X)−Q(X)|
Where −
P(X) = cumulative distribution function of sample from P
Q(X) = cumulative distribution function of sample from Q
n is the number of observations from P and m is number of observations from Q.
The K-S test is a popular method for assessing the similarity between two samples. Although it also works with discrete distributions, it is most helpful for comparing samples from continuous distributions.
KS test is used for numerical features. The default threshold is 0.05 in AryaXAI.