Non-parametric tests:correlation.pptx

131 views 21 slides Apr 13, 2023
Slide 1
Slide 1 of 21
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21

About This Presentation

A take on correlation/regression and non-parametric tests


Slide Content

Correlation

Inferential statistics Comparison of two (or more) variables Qual. Vs Qual eg. htn vs smoking__ (count/proportions) Quant. Vs Qual eg. BP vs sex Quant. Vs quant. eg. BP vs weight__ (metric/interval data) Drawing inference from the sample for our population of interest

Scatter plots A way of portraying a relationship between two quantitative variables Linear Non-linear No relationship Correlation and regression

Regression and correlation Analyze the association between two quantitative variables Assume independent observations Assume a linear relationship Allow hypothesis testing of relationship– drawing inferences on the population Regression: gives the ‘best-fit’ line to the data Correlation: gives a measure of scatter of data points around this line

Regression line: y = b x + a Least squares method: line is fitted to minimize the sum of the squares of vertical distances of the observed values from the line

The regression equation Gives the ‘best fit’ line to the data The regression coefficient ‘b’: Measures the relationship between variables ‘amount of change in the variable y for a unit change in x’ Positive for a direct relationship and negative for an inverse one

Correlation While regression equation measures the average relationship between two variables Correlation gives the strength or goodness of fit of the relationship Correlation coefficient (Pearson’s) r : lies between -1 to +1

Coefficient of determination : r 2 Interpreted as the percentage of total variation in the dependent variable (y) explained by the regression line or just alone by the variation in the particular independent variable (x) r 2 of 1 would imply that 100 percent of variation is explained by variation in x Values less than 1 imply that other ‘unknown’ variables exist which can explain y to a certain extent

Hypothesis testing The sample statistics b and r used to make inferences on the population parameters Assumptions for valid inferences: Independent data (two scatter points are independent) Linear relationship in mean of y vs x Distribution of y normal for each x Variances the same at each x Confidence intervals and p values are obtained based on t distribution

When the assumptions do not hold Residual analysis Polynomial regression: y = a + b x + c x 2 Data transformations Rank correlation: if data transformation fails

Spearman rank correlation coefficient r s /ρ Rank data/ ordinal data

Significance test on Spearman’s ρ The test statistic is ρ / r s itself If the calculated coefficient is within the limits +/- r c (critical value) given in the table for ‘n’ pairs (10) two sided significance level α (5 %) then the null hypothesis (that there is no actual correlation) can’t be rejected For the example the value is +/- 0.6485, so the its concluded that there is no difference between the ranks assigned by the two assessors

Non-parametric methods

Wilcoxon rank-sum test/ Mann-Whitney U test Used when normality assumption doesn’t hold esp. for small samples Hypothesis test for assessing the assumption that one of the sets of samples have a larger value than others Ranks are assigned to the values used for comparison Assumptions: Sample is randomly drawn Observations are independent

Steps Rank all the values irrespective of the particular group Sum the ranks in each group Original values Ranks W1=52 W2=101

U statistic Decision is based on the value of U For one tailed: u1 or u2 For two tailed: u = min (u1;u2) Reject the null hypothesis whenever the test statistic u/u1/u2 is less than critical value

Comparing two paired groups: Wilcoxon signed-rank test Paired tests are used when the the observations between groups are dependent in some way: Variable is measured before-after an intervention Subjects are recruited as matched pairs (such as for age, sex, co-morbidities) ‘twins’ or siblings recruited as pairs ‘right-left pairs’– ex different treatment for right and lefty eye Assumption: each pair chosen is random and independent

Wilcoxon signed-rank test Non-parametric test for paired data sets Tests the hypothesis that there is no difference between two paired groups Steps: Calculate difference between each matched pair keeping track of the sign Rank the absolute value of differences for ‘positive’ and ‘negative’ differences ignoring the sign Calculate the sums of two groups ‘positive’ and ‘negative’ differently Calculate test statistic and compute the p-value

Kruskal-Wallis test Similar to one-way ANOVA and extension of Mann Whitney U test Non-parametric test for comparing the medians between more than two groups of observation for a given variable Ranks are given to all the observations f/b Sum of the ranks for each group are calculated Test statistic: H follows a chi square distribution with df = k-1

Summary: non-parametric tests Nonparametric tests are less powerful: ‘some information is discarded while using ranks’ Sample size: compute the sample size for parametric test and add 15% Nonparametric tests are usually not reported with CIs Nonparametric tests are not readily extended to regression models

Variable Parametric test (paired test) Non-parametric test (paired test) Quantitative variable; 2 groups Mean or median Unpaired t test (paired t test) Mann Whitney U test (Wilcoxon signed rank test) Quantitative variable; > 2 groups Mean or median One Way ANOVA (repeated measures ANOVA) Kruskal Wallis test (Friedman test) Categorical variable/ proportions Chi square test (Mc Nemar test)
Tags