A take on correlation/regression and non-parametric tests
Size: 884.41 KB
Language: en
Added: Apr 13, 2023
Slides: 21 pages
Slide Content
Correlation
Inferential statistics Comparison of two (or more) variables Qual. Vs Qual eg. htn vs smoking__ (count/proportions) Quant. Vs Qual eg. BP vs sex Quant. Vs quant. eg. BP vs weight__ (metric/interval data) Drawing inference from the sample for our population of interest
Scatter plots A way of portraying a relationship between two quantitative variables Linear Non-linear No relationship Correlation and regression
Regression and correlation Analyze the association between two quantitative variables Assume independent observations Assume a linear relationship Allow hypothesis testing of relationship– drawing inferences on the population Regression: gives the ‘best-fit’ line to the data Correlation: gives a measure of scatter of data points around this line
Regression line: y = b x + a Least squares method: line is fitted to minimize the sum of the squares of vertical distances of the observed values from the line
The regression equation Gives the ‘best fit’ line to the data The regression coefficient ‘b’: Measures the relationship between variables ‘amount of change in the variable y for a unit change in x’ Positive for a direct relationship and negative for an inverse one
Correlation While regression equation measures the average relationship between two variables Correlation gives the strength or goodness of fit of the relationship Correlation coefficient (Pearson’s) r : lies between -1 to +1
Coefficient of determination : r 2 Interpreted as the percentage of total variation in the dependent variable (y) explained by the regression line or just alone by the variation in the particular independent variable (x) r 2 of 1 would imply that 100 percent of variation is explained by variation in x Values less than 1 imply that other ‘unknown’ variables exist which can explain y to a certain extent
Hypothesis testing The sample statistics b and r used to make inferences on the population parameters Assumptions for valid inferences: Independent data (two scatter points are independent) Linear relationship in mean of y vs x Distribution of y normal for each x Variances the same at each x Confidence intervals and p values are obtained based on t distribution
When the assumptions do not hold Residual analysis Polynomial regression: y = a + b x + c x 2 Data transformations Rank correlation: if data transformation fails
Spearman rank correlation coefficient r s /ρ Rank data/ ordinal data
Significance test on Spearman’s ρ The test statistic is ρ / r s itself If the calculated coefficient is within the limits +/- r c (critical value) given in the table for ‘n’ pairs (10) two sided significance level α (5 %) then the null hypothesis (that there is no actual correlation) can’t be rejected For the example the value is +/- 0.6485, so the its concluded that there is no difference between the ranks assigned by the two assessors
Non-parametric methods
Wilcoxon rank-sum test/ Mann-Whitney U test Used when normality assumption doesn’t hold esp. for small samples Hypothesis test for assessing the assumption that one of the sets of samples have a larger value than others Ranks are assigned to the values used for comparison Assumptions: Sample is randomly drawn Observations are independent
Steps Rank all the values irrespective of the particular group Sum the ranks in each group Original values Ranks W1=52 W2=101
U statistic Decision is based on the value of U For one tailed: u1 or u2 For two tailed: u = min (u1;u2) Reject the null hypothesis whenever the test statistic u/u1/u2 is less than critical value
Comparing two paired groups: Wilcoxon signed-rank test Paired tests are used when the the observations between groups are dependent in some way: Variable is measured before-after an intervention Subjects are recruited as matched pairs (such as for age, sex, co-morbidities) ‘twins’ or siblings recruited as pairs ‘right-left pairs’– ex different treatment for right and lefty eye Assumption: each pair chosen is random and independent
Wilcoxon signed-rank test Non-parametric test for paired data sets Tests the hypothesis that there is no difference between two paired groups Steps: Calculate difference between each matched pair keeping track of the sign Rank the absolute value of differences for ‘positive’ and ‘negative’ differences ignoring the sign Calculate the sums of two groups ‘positive’ and ‘negative’ differently Calculate test statistic and compute the p-value
Kruskal-Wallis test Similar to one-way ANOVA and extension of Mann Whitney U test Non-parametric test for comparing the medians between more than two groups of observation for a given variable Ranks are given to all the observations f/b Sum of the ranks for each group are calculated Test statistic: H follows a chi square distribution with df = k-1
Summary: non-parametric tests Nonparametric tests are less powerful: ‘some information is discarded while using ranks’ Sample size: compute the sample size for parametric test and add 15% Nonparametric tests are usually not reported with CIs Nonparametric tests are not readily extended to regression models
Variable Parametric test (paired test) Non-parametric test (paired test) Quantitative variable; 2 groups Mean or median Unpaired t test (paired t test) Mann Whitney U test (Wilcoxon signed rank test) Quantitative variable; > 2 groups Mean or median One Way ANOVA (repeated measures ANOVA) Kruskal Wallis test (Friedman test) Categorical variable/ proportions Chi square test (Mc Nemar test)