Non parametric study; Statistical approach for med student

971 views 46 slides Jul 04, 2019
Slide 1
Slide 1 of 46
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46

About This Presentation

Non parametric study; Statistical approach for med student


Slide Content

Non-Parametric statistics Speaker:- Dr. Rupendra Kumar Bharti Department of Pharmacology IGMC, Shimla, HP

Contents….. Introduction and definition Application and purpose Non- parametric models Methods Various reference

Hypothesis Testing Procedures EPI 809 / Spring 2008 Many More Tests Exist!

A statistical method wherein the data is not required to fit a normal distribution. Nonparametric statistics uses data that is often ordinal, meaning it does not rely on numbers, but rather a ranking or order of sorts.

Parametric Test Procedures 1. Involve Population Parameters (Mean) 2. Have Stringent Assumptions (Normality) 3. Examples: Z Test, t Test, χ 2 Test, F test EPI 809 / Spring 2008

Nonparametric Test Procedures 1. Do Not Involve Population Parameters Example: Probability Distributions, Independence 2. Data Measured on Any Scale (Ratio or Interval, Ordinal or Nominal) 3. Example: Wilcoxon Rank Sum Test EPI 809 / Spring 2008

Advantages of Nonparametric Tests 1. Used With All Scales 2. Easier to Compute 3. Make Fewer Assumptions 4. Need Not InvolvePopulation Parameters 5. Results May Be as Exactas Parametric Procedures EPI 809 / Spring 2008

Disadvantages of Nonparametric Tests 1. May Waste Information Parametric model more efficient if data Permit 2. Difficult to Compute by hand for Large Samples 3. Tables Not Widely Available EPI 809 / Spring 2008

Non-parametric methods are widely used for studying populations that take on a ranked order ( such as movie reviews receiving one to four stars ). The use of non-parametric methods may be necessary when data have a ranking but no clear numerical interpretation, such as when assessing preferences . In terms of levels of measurement , non-parametric methods result in "ordinal" data. Another justification for the use of non-parametric methods is simplicity. In certain cases, even when the use of parametric methods is justified, non-parametric methods may be easier to use. Due both to this simplicity and to their greater robustness, non-parametric methods are seen by some statisticians as leaving less room for improper use and misunderstanding. Applications and purpose

Non-parametric models differ from parametric models in that the model structure is not specified a priori but is instead determined from data. The term non-parametric is not meant to imply that such models completely lack parameters but that the number and nature of the parameters are flexible and not fixed in advance. Non-parametric models

A histogram is a simple nonparametric estimate of a probability distribution. Kernel density estimation provides better estimates of the density than histograms. Nonparametric regression and semiparametric regression methods have been developed based on kernels , splines , and wavelets . Data envelopment analysis provides efficiency coefficients similar to those obtained by multivariate analysis without any distributional assumption. KNNs classify the unseen instance based on the K points in the training set which are nearest to it. Types…..

Non-parametric (or distribution-free ) inferential statistical methods are mathematical procedures for statistical hypothesis testing which, unlike parametric statistics , make no assumptions about the probability distributions of the variables being assessed. The most frequently used tests include • Anderson–Darling test : tests whether a sample is drawn from a given distribution. • Statistical Bootstrap Methods : estimates the accuracy/sampling distribution of a statistic. • Cochran's Q : tests whether k treatments in randomized block designs with 0/1 outcomes have identical effects. • Cohen's kappa : measures inter-rater agreement for categorical items. Methods

• Kendall's W : a measure between 0 and 1 of inter-rater agreement. • Kolmogorov–Smirnov test : tests whether a sample is drawn from a given distribution, or whether two samples are drawn from the same distribution. • Friedman two-way analysis of variance by ranks : tests whether k treatments in randomised block designs have identical effects • Kaplan–Meier : estimates the survival function from lifetime data, modeling censoring • Kendall's tau : measures statistical dependence between two variables. • Mann–Whitney U or Wilcoxon rank sum test : tests whether two samples are drawn from the same distribution, as compared to a given alternative hypothesis. In fact this is a "semi" non-parametric test since it is held under assumption that the two samples have the same scale parameter value.

McNemar's test : tests whether, in 2 × 2 contingency tables with a dichotomous trait and matched pairs of subjects, row and column marginal frequencies are equal • Median test : tests whether two samples are drawn from distributions with equal medians • Pitman's permutation test : a statistical significance test that yields exact p values by examining all possible rearrangements of labels. • Kruskal-Wallis one-way analysis of variance by ranks : tests whether >2 independent samples are drawn from the same distribution • Kuiper's test : tests whether a sample is drawn from a given distribution, sensitive to cyclic variations such as day of the week • Logrank test : compares survival distributions of two right-skewed, censored samples

• Rank products : detects differentially expressed genes in replicated microarray experiments • Siegel–Tukey test : tests for differences in scale between two groups • Sign test : tests whether matched pair samples are drawn from distributions with equal medians • Spearman's rank correlation coefficient : measures statistical dependence between two variables using a monotonic function • Squared ranks test : tests equality of variances in two or more samples • Wald–Wolfowitz runs test : tests whether the elements of a sequence are mutually independent/random • Wilcoxon signed-rank test : tests whether matched pair samples are drawn from populations with different mean ranks

Commonly used tests Commonly used Non Parametric Tests are: Chi Square test McNemar test The Sign Test Wilcoxon Signed-Ranks Test Mann–Whitney U or Wilcoxon rank sum test The Kruskal Wallis or H test Friedman ANOVA The Spearman rank correlation test Cochran's Q test

Chi Square test First used by Karl Pearson Simplest & most widely used non-parametric test in statistical work. Calculated using the formula- χ2 = ∑ ( O – E ) 2 E O = observed frequencies E = expected frequencies Greater the discrepancy b/w observed & expected frequencies, greater shall be the value of χ2. Calculated value of χ2 is compared with table value of χ2 for given degrees of freedom. Karl Pearson  (1857–1936)

Chi Square test Application of chi-square test: Test of association (smoking & cancer, treatment & outcome of disease, vaccination & immunity) Test of proportions (compare frequencies of diabetics & non-diabetics in groups weighing 40-50kg, 50-60kg, 60-70kg & >70kg.) The chi-square for goodness of fit (determine if actual numbers are similar to the expected/theoretical numbers)

Chi Square test Attack rates among vaccinated & unvaccinated children against measles : Prove protective value of vaccination by χ2 test at 5% level of significance Group Result Total Attacked Not-attacked Vaccinated (observed) 10 90 100 Unvaccinated (observed) 26 74 100 Total 36 164 200

Chi Square test Group Result Total Attacked Not-attacked Vaccinated (Expected) 18 82 100 Unvaccinated (Expected) 18 82 100 Total 36 164 200

Chi Square test χ 2 value = ∑ (O-E) 2 /E (10-18) 2 + (90-82) 2 + (26-18) 2 + (74-82) 2 18 82 18 82 64/18 + 64/82 + 64/18 + 64/82 =8.67 calculated value (8.67) > 3.84 (expected value corresponding to P=0.05) Null hypothesis is rejected. Vaccination is protective.

Chi Square test Yates’ correction: applies when we have two categories (one degree of freedom) Used when sample size is ≥ 40, and expected frequency of <5 in one cell Subtracting 0.5 from the difference between each observed value and its expected value in a 2 × 2 contingency table χ 2 = ∑ [O- E-0.5] 2 E

Fisher’s Exact Test Used when the Total number of cases is <20 or The expected number of cases in any cell is ≤1 or More than 25% of the cells have expected frequencies <5. Ronald A. Fisher  (1890–1962)

McNemar Test McNemar Test: used to compare before and after findings in the same individual or to compare findings in a matched analysis (for dichotomous variables) Example : comparing the attitudes of medical students toward confidence in statistics analysis before and after the intensive statistics course. McNemar

Sign Test Used for paired data, can be ordinal or continuous Simple and easy to interpret Makes no assumptions about distribution of the data Not very powerful To evaluate H we only need to know the signs of the differences If half the differences are positive and half are negative, then the median = 0 (H is true). If the signs are more unbalanced, then that is evidence against H .

Sign Test Children in an orthodontia study were asked to rate how they felt about their teeth on a 5 point scale. Survey administered before and after treatment. How do you feel about your teeth? Wish I could change them Don’t like, but can put up with them No particular feelings one way or the other I am satisfied with them Consider myself fortunate in this area

child Rating before Rating after 1 1 5 2 1 4 3 3 1 4 2 3 5 4 4 6 1 4 7 3 5 8 1 5 9 1 4 10 4 4 11 1 1 12 1 4 13 1 4 14 2 4 15 1 4 16 2 5 17 1 4 18 1 5 19 4 4 20 3 5 Use the sign test to evaluate whether these data provide evidence that orthodontic treatment improves children’s image of their teeth.

child Rating before Rating after change 1 1 5 4 2 1 4 3 3 3 1 -2 4 2 3 1 5 4 4 6 1 4 3 7 3 5 2 8 1 5 4 9 1 4 3 10 4 4 11 1 1 12 1 4 3 13 1 4 3 14 2 4 2 15 1 4 3 16 2 5 3 17 1 4 3 18 1 5 4 19 4 4 20 3 5 2 First, for each child, compute the difference between the two ratings

child Rating before Rating after change sign 1 1 5 4 + 2 1 4 3 + 3 3 1 -2 - 4 2 3 1 + 5 4 4 6 1 4 3 + 7 3 5 2 + 8 1 5 4 + 9 1 4 3 + 10 4 4 11 1 1 12 1 4 3 + 13 1 4 3 + 14 2 4 2 + 15 1 4 3 + 16 2 5 3 + 17 1 4 3 + 18 1 5 4 + 19 4 4 20 3 5 2 + The sign test looks at the signs of the differences 15 children felt better about their teeth (+ difference in ratings) 1 child felt worse (- diff.) 4 children felt the same (difference = 0) If H were true we’d expect an equal number of positive and negative differences. (P value from table 0.004)

Wilcoxon signed-rank test Nonparametric equivalent of the paired t-test. Similar to sign test, but take into consideration the magnitude of difference among the pairs of values.  (Sign test only considers the direction of difference but not the magnitude of differences.) WILCOXON

Wilcoxon signed-rank test The 14 difference scores in BP among hypertensive patients after giving drug A were: -20, -8, -14, -12, -26, +6, -18, -10, -12, -10, -8, +4, +2, -18 The statistic T is found by calculating the sum of the positive ranks, and the sum of the negative ranks. The smaller of the two values is considered.

Wilcoxon signed-rank test Score Rank +2 1 +4 2 +6 3 -8 4.5 Sum of positive ranks = 6 -8 4.5 -10 6.5 Sum of negative ranks = 99 -10 6.5 -12 8 -14 9 T= 6 -16 10 -18 11.5 -18 11.5 -20 13 -26 14 For N = 14, and α = .05, the critical value of T = 21. If T is equal to or less than T critical, then null hypothesis is rejected i.e., drug A decreases the BP among hypertensive patients.

Mann-Whitney U test Mann-Whitney U – similar to Wilcoxon signed-ranks test except that the samples are independent and not paired. Null hypothesis: the population means are the same for the two groups. Rank the combined data values for the two groups. Then find the average rank in each group.

Mann-Whitney U test Then the U value is calculated using formula U= N1*N2+ Nx(Nx+1) _ Rx (where Rx is larger rank 2 total) To be statistically significant, obtained U has to be equal to or LESS than this critical value.

Mann-Whitney U test 10 dieters following Atkin’s diet vs. 10 dieters following Jenny Craig diet Hypothetical RESULTS: Atkin’s group loses an average of 34.5 lbs. J. Craig group loses an average of 18.5 lbs. Conclusion: Atkin’s is better?

Mann-Whitney U test When individual data is seen Atkin’s, change in weight (lbs): +4, +3, 0, -3, -4, -5, -11, -14, -15, -300 J. Craig, change in weight (lbs) -8, -10, -12, -16, -18, -20, -21, -24, -26, -30

Jenny Craig diet -30 -25 -20 -15 -10 -5 5 10 15 20 5 10 15 20 25 30 P e r c e n t Weight Change

Atkins diet -300 -280 -260 -240 -220 -200 -180 -160 -140 -120 -100 -80 -60 -40 -20 20 5 10 15 20 25 30 P e r c e n t Weight Change

Mann-Whitney U test RANK the values, 1 being the least weight loss and 20 being the most weight loss. Atkin’s +4, +3, 0, -3, -4, -5, -11, -14, -15, -300  1, 2, 3, 4, 5, 6, 9, 11, 12, 20 J. Craig -8, -10, -12, -16, -18, -20, -21, -24, -26, -30 7, 8, 10, 13, 14, 15, 16, 17, 18, 19

Mann-Whitney U test Sum of Atkin’s ranks:  1+ 2 + 3 + 4 + 5 + 6 + 9 + 11+ 12 + 20=73 Sum of Jenny Craig’s ranks: 7 + 8 +10+ 13+ 14+ 15+16+ 17+ 18+19=137 Jenny Craig clearly ranked higher. Calculated U value (18) < table value (27), Null hypothesis is rejected.

Kruskal-Wallis One-way ANOVA It’s more powerful than Chi-square test. It is computed exactly like the Mann-Whitney test, except that there are more groups (>2 groups). Applied on independent samples with the same shape (but not necessarily normal).

Friedman ANOVA Friedman ANOVA: When either a matched-subjects or repeated-measure design is used and the hypothesis of a difference among three or more (k) treatments is to be tested, the Friedman ANOVA by ranks test can be used.

Spearman rank-order correlation Use to assess the relationship between two ordinal variables or two skewed continuous variables. Nonparametric equivalent of the Pearson correlation. It is a relative measure which varies from -1 (perfect negative relationship) to +1 (perfect positive relationship). Charles Spearman (1863–1945)

Cochran's Q test Cochran's Q test is a non-parametric statistical test to verify if k treatments have identical effects where the response variable can take only two possible outcomes (coded as 0 and 1)

Reference…. urphy, Kevin (2012). Machine Learning: A Probabilistic Perspective. MIT. p. 16. ISBN   978-0262018029 . Stuart A., Ord J.K, Arnold S. (1999), Kendall's Advanced Theory of Statistics: Volume 2A—Classical Inference and the Linear Model, sixth edition, §20.2–20.3 ( Arnold ). Bagdonavicius, V., Kruopis, J., Nikulin, M.S. (2011). "Non-parametric tests for complete data", ISTE & WILEY: London & Hoboken. ISBN 978-1-84821-269-5 . Corder, G. W.; Foreman, D. I. (2014). Nonparametric Statistics: A Step-by-Step Approach. Wiley. ISBN   978-1118840313 . Gibbons, Jean Dickinson and Chakraborti, Subhabrata (2003). Nonparametric Statistical Inference, 4th Ed. CRC. ISBN 0-8247-4052-1 . Hettmansperger, T. P.; McKean, J. W. (1998). Robust nonparametric statistical methods. Kendall's Library of Statistics 5 (First ed.). London: Edward Arnold. New York: John Wiley and Sons, Inc. pp. xiv+467 pp. ISBN   0-340-54937-8 . MR   1604954 . also ISBN 0-471-19479-4 . Wasserman, Larry (2007). All of nonparametric statistics, Springer. ISBN 0-387-25145-6 . Special thanks for Dr. Ramesh

Thanks for appreciation