Seminar on Research Methodology NON-PARAMETIC TESTS, CORRELATION & REGRESSION ANALYSES MULTIVARIATE ANALYSES & FACTOR ANALYS ES
PARAMETRE Parameter is any numerical quantity that characterize a given population or some aspects of it. Most common statistics parameters are mean, median, mode, standard deviation.
PARAMETRIC TEST The test in which, the population constants like mean, SD, standard error, correlation coefficient, proportion etc. and data tend to follow one assumed or established distribution such as normal, binomial, poisson etc.
ASSUMPTIONS OF PARAMETRIC TESTS The general assumptions of parametric tests are The populations are normally distributed (follow normal distribution curve) The selective population is representative of general population The data is in interval or ratio scale
NON-PARAMETRIC TEST The test in which no constant of a population is used. Data do not follow any specific distribution and no assumption are made in these tests. Eg: to classify good, better and best we just allocate arbitrary numbers or marks to each category.
ASSUMPTIONS OF NON-PARAMETRIC TESTS Non-parametric tests can applied when: Data don’t follow any specific distribution and no assumption about the population are made Data measured on any scale
Testing Normality Normality : This assumption is only broken if there are large and obvious departure from normality This can be checked by Inspecting a histogram Skewness and Kurtosis
COMMONLY USED NON-PARAMERTIC TETS Chi-Square test McNemar test The Sign Test Wilcoxon Signed-Ranks Test Mann-Whitney U or Wilcoxon rank sum test The Kruskal Wallis or H test Friedman ANOVA The Spearman rank correlation test Cochran’s Q test
Chi-square test offers an alternate method of testing the significance of difference between two proportions Chi-square test involves the calculation of chi-square. Chi-square is derived from the greek letter ‘chi’ ( 𝒳 ) CHI-SQUARE TEST
Con… Chi-square was developed by Karl pearson. Chi-square test is a non-parametric test . It follows a specific distribution known as Chi-square distribution.
Con… The three essential requirements for Chi-square test are : A random sample Qualitative data Lowest expected frequency not less than 5
Important Terms Degree of freedom It denotes the extend of independence (freedom) enjoyed by a given set of observed frequencies Suppose we are given a set of “n” observed frequencies which are subjected to “k” independent constrains(restriction) then,
d.f. = (no. of frequencies)-(no. of independent constrains on them) In other terms, d.f.= (r-1)(c-1) Where r = the no. of rows c = the no. of columns
Contingency Table When the table is prepared by enumeration of qualitative data by entering the actual frequencies, and if that table represents occurrence of two sets of events, it is also called an association table.
Important characteristics of a chi-square test It is based on frequencies and not on the parameters like mean and SD. The test is used for testing the hypothesis and is not useful for estimation It can also be applied to a complex contingency table with several classes and such as is a very useful test in research work.
Con… No rigid assumptions are necessary in regard to the type of population, no need of parameter values and relatively less mathematical details are involved.
Chi square distribution If , ,… are independent normal variants and each is distributed normally with mean zero and SD unity, then , …… = ∑ is distributed as chi square ( )with n degrees of freedom (d.f.) where n is large.
The chi square curve for d.f. N=1,5 and 9.
Calculation of Chi-square value The calculation of Chi-square value is as follows : Make the contingency tables Note the frequencies observed (O) in each class of one event, row-wise and the number in each group of the other event, column-wise. Determine the expected number (E) in each group of the sample or the cell of table on the assumption of null hypothesis.
Con… The hypothesis that there was no difference between the effect of the two frequencies, and then proceed to test the hypothesis in quantitative terms is called the Null hypothesis . Find the difference between the observed and the expected frequencies in each cell (O – E). Calculate the Chi-square values by the formula Sum up the Chi-square values of all the cells to get the total Chi-square value.
Con… Calculate the degrees of freedom which are related to the number of categories in both the events . The formula adopted in case of contingency table is Degrees of freedom (d.f.) = (c – 1 ) (r – 1) Where c is the number of columns and r is the number of rows
Alternative formula In the case of (2×2) table. If we write the cell frequencies and marginal total in case of a (2×2) table thus, a b (a+b) c d ( c+d ) (a+c) (b+d) N
APPLICATIONS OF A CHI SQUARE TEST This test can be used in Goodness of fit of distributions test of independence of attributes test of homogeneity .
TEST OF GOODNESS OF FIT OF DISTRIBUTIONS : This test enables us to see how well does the assumed theoretical distribution (such as Binomial distribution, Poisson distribution or Normal distribution) fit to the observed data. The Chi Square test formula for goodness of fit is: = ∑ Where, o = observed frequency e = expected frequency
Con… If chi square (calculated) > chi square (tabulated), with (n-1) d.f, then null hypothesis is rejected otherwise accepted . And if null hypothesis is accepted, then it can be concluded that the given distribution follows theoretical distribution.
TEST OF INDEPENDENCE OF ATTRIBUTES Test enables us to explain whether or not two attributes are associated . For instance, we may be interested in knowing whether a new medicine is effective in controlling fever or not, chi square test is useful . In such a situation, we proceed with the null hypothesis that the two attributes (viz., new medicine and control of fever) are independent which means that new medicine is not effective in controlling fever.
Con… (calculated) > (tabulated) at a certain level of significance for given degrees of freedom, the null hypothesis is rejected, i.e . two variables are dependent.(i.e., the new medicine is effective in controlling the fever)
CON… if, (calculated) < (tabulated) ,the null hypothesis is accepted, i.e. 2 variables are independent.(i.e., the new medicine is not effective in controlling the fever). when null hypothesis is rejected, it can be concluded that there isa significant association between two attributes.
TEST OF HOMOGENITY This test can also be used to test whether the occurance of events follow uniformity or not e.g. the admission of patients in government hospital in all days of week is uniform or not can be tested with the help of chi square test (calculated) < (tabulated), then null hypothesis is accepted, and it can be concluded that there is a uniformity in the occurance of the events. (uniformity in the admission of patients through out the week)
CONDITIONS FOR THE APPLICATION OF CHI SQUARE TEST The data must be in the form of frequencies The frequency data must have a precise numerical value and must be organised into categories or groups . Observations recorded and used are collected on a random basis . All the items in the sample must be independent .
Con… No group should contain very few items, say less than 10. In case where the frequencies are less than 10, regrouping is done by combining the frequencies of adjoining groups so that the new frequencies become greater than 10. (Some statisticians take this number as 5, but 10 is regarded as better by most of the statisticians .) The overall number of items must also be reasonably large. It should normally be at least 50.
YATE’S CORRECTION If in the 2×2 contingency table, the expected frequencies are small say less than 5, then chi square test can’t be used. In that case, the direct formula of the chi square test is modified and given by Yate’s correction for continuity
Additive property Several values of can be added together and if the degrees of freedom are also added, this number gives the d.f of the total value of . Thus if a number of value is obtained from a number of samples of similar data, then because of the additive nature of we can combine the various values of by just simply adding them.
Such addition of various values of gives one value of which helps in forming a better idea about the significance of the problem under consideration. Eg: The table shows the value of from different investigations carried to examine the effectiveness of a recent invented medicine for checking malaria.
By adding all the values of , we obtain a value equal to 18.0 Adding the various d.f as given in the table, we obtain the value 5 Now we can state that the value for 5 d.f is 18.0
Let as take the hypothesis that, the new medicine is not effective. The table value of for 5 d.f at 5% level of significance is 11.071.
Our calculated value is higher than this table value ,means the difference is significant and is not due to chance. As such the hypothesis is rejected.
Conversion of chi-square into phi coefficient Chi-square tells about the significance of a relation between variables. It provides no answer regarding the magnitude of the relation. In this case we use phi coefficient, which is a non-parametric measure of coefficient of correlation, as under: ϕ =
CONVERSION OF CHI-SQUARE INTO COEFFICIENT OF CONTINGENCY (C) In case of a contingency table of higher order than 2×2 table to study the magnitude of the relation or the degree of association between two attributes, we convert the chi-square into coefficient of contingency. C =
Con… While finding out the value of C we proceed on the assumption of null hypothesis that the two attributes are independent and exhibit no association. It is also known as coefficient of Mean Square contingency This measure comes under the category of non-parametric measure of relationship.
CAUTION IN USING TEST Neglect of frequencies of non-occurrence Failure to equilise the sum of observed and the sum of the expected frequencies Wrong determination of the degrees of freedom Wrong computations
LIMITATIONS OF A CHI SQUARE TEST The data is from a random sample . This test applied in a four fould table, will not give a reliable result with one degree of freedom if the expected value in any cell is less than 5. in such case, Yate’s correction is necessry . i.e. reduction of the mode of (o – e) by half . Even if Yate’s correction, the test may be misleading if any expected frequency is much below 5. in that case another appropriate test should be applied.
Con… In contingency tables larger than 2×2 , Yate’s correction cannot be applied . This test doesn’t indicate the cause and effect, it only tells the probability of occurance of association by chance. This test tells the presence or absence of an association between the events but doesn’t measure the strength of association.
CON… The test is to be applied only when the individual observations of sample are independent which means that the occurrence of one individual observation (event) has no effect upon the occurrence of any other observation (event) in the sample under consideration.