PERMUTATION APPROACH TO NON PARAMETRIC HYPOTHESIS TEST [Autosaved].pptx
AnantaBasnet
6 views
20 slides
Sep 27, 2024
Slide 1 of 20
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
About This Presentation
About statistics
Size: 232.5 KB
Language: en
Added: Sep 27, 2024
Slides: 20 pages
Slide Content
PERMUTATION APPROACH TO NON PARAMETRIC HYPOTHESIS TEST A presentation in Actuarial Seminar By:- Sapath Dahal School of Mathematical Sciences, Balkhu Date:- 2081 / 05 / 02
TABLE OF CONTENT Introduction to permutation Permutation approach to Non-Parametric Hypothesis Key Concepts involved Steps Involved Example Conclusion
WHAT IS PERMUTATION ? A permutation is an arrangement of all the members of a set into some sequence or order. If the set is already ordered, then the rearrangement of its elements is a permutation of the set. For example :Consider {A,B,C},the permutation of this set will include arrangements like {ABC,ACB,BAC,BCA,CAB,CBA}.
PERMUTATION APPROACH TO NON-PARAMETRIC HYPOTHESIS TESTS What is a hypothesis test? A hypothesis test is a statistical method used to determine whether there is enough evidence in a sample of data to support or reject a specific hypothesis about a population parameter. It involves comparing observed data to what is expected under a null hypothesis, which assumes no effect. Before moving forward,what is parameter of a distribution? A parameter of a population distribution is a fixed value that describes a specific characteristic of the entire population. A parameter of a population distribution is a fixed value that describes a specific characteristic of the entire population. Unlike data from a sample, these parameters don't change; they are constant for the population.
PERMUTATION TEST FOR NON-PARAMETRIC HYPOTHESIS Usually, when the sample size is small and the distribution is non-normal, we apply permutation approach for testing the hypothesis. The permutation approach is fairly simple relative to other traditional parametric hypothesis tests.
KEY CONCEPTS INVOLVED NULL HYPOTHESIS( ) : The default assumption that there is no difference or effect. ALTERNATIVE HYPOTHESIS( ) : This hypothesis contradicts the null hypothesis. Test-statistic : A measure that quantifies the difference between the groups being compared, such as the difference in means. p-value : The probability of observing an effect at least as extreme as the one in the original dataset, under the assumption that the null hypothesis is true.
STEPS INVOLVED Specify and . Choose test-statistic. Determine distribution of test-statistic. Convert test-statistic value into p-value.
EXAMPLE 1 There are two children, they play with their toys every evening. They have 10 toys in total. One evening, they decided on which of the toys were their favourite . They could not reach on an una nimous decision. So, They stated the hypotheses; Null Hypothesis( : All of their toys are their favourite . Alternative Hypothesis : They actually have some favourite toys.
EXAMPLE 1 First, they split 5 toys each in two different groups and then, they ranked them on the basis of how much they liked it. Since, the test is fairly simple. They choose the test-statistic to be mean difference. Now, they take mean of both groups and subtract them to get the initial test-statistic . Initial test-statistic=0.6 Group A 8 7 5 9 8 Group B 7 9 8 4 6
EXAMPLE 1 Now, they permutate the groups. In simple words, they shuffle the toys and again calculate the test statistic. Test-statistic=0.8 And, now they repeat this permutation process over and over. Group A 9 7 4 9 8 Group B 7 7 8 5 6
EXAMPLE 1 The test -statistics values, they get from doing this for 9 more times are: TEST-STATISTIC VALUES 0.8 0.6 0.5 0.9 0.3 0.6 0.7 0.5 0.7 0.6
EXAMPLE 1 Now, they calculated the p-value from the test-statistic. For, the p-value, they chose the number of test-statistics greater than the initial test statistic and divided it by the total number of test-statistics. p-value=0.36=36% They had chosen their significance level to be 10%. Since p-value>significance level. They accepted their null hypothesis i.e. they liked all their toys equally.
EXAMPLE 2 Here ,we are using R-Studio. A dataset named iris has details about flowers including the sizes of their sepals and petals and their species. We are going to select 10 observations from iris dataset of ‘Sepal Width’ and their respective ‘Species’ for the permutation test .
SETTING THE HYPOTHESIS NULL HYPOTHESIS( ) : There is no difference in the mean sepal width between setosa and versicolor species of iris. ALTERNATIVE HYPOTHESIS( ) : There is significant difference in the mean sepal width between setosa and versicolor species of iris.
R CODE 1.Loading the data… iris 2.Selecting 10 observations from each species… set.seed (42 ) # for reproducibility setosa_subset <- iris[ iris$Species == " setosa ", ][1:10, ] versicolor_subset <- iris[ iris$Species == "versicolor", ][1:10, ]
R CODE 3.Combining the two subsets into single dataset… iris_subset <- rbind(setosa_subset, versicolor_subset) 4.Calculating the observed mean difference between sepal width of setosa and versicolor species of iris… observed_difference <- mean( setosa_subset$Sepal.Width ) - mean( versicolor_subset$Sepal.Width ) Here ,mean is the test-statistic chosen. Observed difference= 0.44
R CODE 5.Beginning with the permutation test… num_permutations <- 1000 permuted_differences <- numeric( num_permutations ) for( i in 1:num_permutations){ permuted_species <-sample( iris_subset$Species ) permuted_setosa <- iris_subset [ permuted_species == " setosa ", ] permuted_versicolor <- iris_subset [ permuted_species == "versicolor", ] permuted_difference <- mean( permuted_setosa$Sepal.Width ) - mean( permuted_versicolor$Sepal.Width ) permuted_differences [ i ] <- permuted_difference } Here ,we are permuting the set thousand times!!! Permuted diff. of mean of setosa and versicolor species are stored.
R CODE 6.Calculating the p-value p_value <- sum(abs( permuted_differences ) >= abs( observed_difference )) / num_permutations p_value We get the p-value=0.005
CONCLUSION Since ,The p-value=0.005<assumed significance level(alpha=0.05),we reject null hypothesis. Hence ,there is significant difference in the mean sepal width between setosa and versicolor species of iris.