Chi-square test
The chi-square test is a statistical hypothesis test used to determine if there is a significant discrepancy between the expected and observed frequencies in one or more categories.
It is widely used to assess for homogeneity, the goodness of fit, and the independence of two categorical variables. The chi-square test is based on the chi-square distribution, a distribution of the sum of the squares of k independent standard normal random variables.
Under the null hypothesis for the chi-square test, there is no difference between the expected and observed frequencies, while the alternative hypothesis is that there is a substantial difference. There are several kinds of chi-square tests:
- Chi-square test for independence, which asks a question of relationship
- The chi-square goodness of fit test, which determines how well the sample data fits a distribution from a population with a normal distribution
- Chi-square test for homogeneity, which analyses the proportions of responses from two or more populations in relation to a dichotomous variable or a variable with more than two outcome categories
The formula for chi-square test is
χ2 = ∑(Oi – Ei)2/Ei
Where, Oi = the observed value
Ei = the expected value
Chi-square test is used for categorical features and with minimum number of labels in a feature is greater than 2. The default threshold is 0.05 in AryaXAI.