Statistical analysis

PrincyFrancisM 52,643 views 77 slides Oct 11, 2018
Slide 1
Slide 1 of 77
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77

About This Presentation

Nursing Research and statistics


Slide Content

STATISTICAL ANALYSIS Princy Francis M I st Yr MSc(N) JMCON

DEFINITION Statistical analysis is the organisation and analysis of quantitative or qualitative data using statistical procedures, including both descriptive and inferential statistics. It’s the science of collecting, exploring and presenting large amounts of data to discover underlying patterns and trends.

DEFINITION Statistics is a branch of science that deals with the collection, organisation, analysis of data and drawing of inferences from the samples to the whole population. Sample is a small portion of population which truly represents the population with respect to the study characteristic of the population.

PURPOSES Summarize Explore the meaning of deviations in data Compare or contrast descriptively Test the proposed relationships in a theoretical model Infer that the findings from sample are indicative Examine causality. Predict or infer from the sample to a theoretical model.

ELEMENTS OF STATISTICAL ANALYSIS Understand the complex relationship among the correlates of the disease under study. The analysis should start with simple comparison of proportions and means. Interpretation of result should be guided by clinical and biological consideration .

STATISTICAL MEASURES Mean Mode Median Interquartile Range Standard Deviation.

MEAN The mean is the average of all numbers

Example Mean of 10, 20, 30, 40 25

MEDIAN When all the observations are arranged in ascending or descending orders of magnitude, the middle one is the median. For raw data, If n is the total number of observations, the value of the [ ] th item will be called median . if n is the even number, the mean of n/2 th item and [ ] th item will be median. Example : Median of given data 10, 20, 30 is 20  

MODE The Mode is the value of a series which appears most frequently than any other . For grouped data, Mode, M 0 = L +{ } x c Where, L is lower limit of modal class, C is class interval is difference between modal frequency and its preceding class 2 is difference between modal frequency and following class frequency. Example: mode of given data 80, 90, 86, 80, 72, 80, 96 is 80  

INTERQUARTILE RANGE The interquartile range (IQR), is a measure of statistical dispersion, being equal to the difference between 75th and 25th percentiles, or between upper and lower quartiles. IQR = Q3 − Q1.

Example Interquartile range of following data 30, 20, 40, 60 , 50 Q1 =[ ] th item = 1.5 th item = 20+ 0.5 (30-20) = 25 Q3 = 3[ ] th item = 50 +0.5x (60-50) = 55. IQR = 30  

STANDARD DEVIATION The standard deviation is the most useful and most popular measure of dispersion. The standard deviation is defined as the positive square root of the arithmetic mean of the square of the deviations of given observations from their arithmetic mean. The standard deviation is denoted by ‘ ’.  

STANDARD DEVIATION Formula

EXAMPLE Standard deviation of data 10, 20, 30, 40, 50 where n= 5 , = 30 = 1000/4 = = 15. 811  

STANDARD NORMAL DISTRIBUTION CURVE AND MEAN, MEDIAN, INTERQUARTILE RANGE AND STANDARD DEVIATION

TYPES PARAMETRIC STATISTICAL ANALYSIS NONPARAMETRIC STATISTICAL ANALYSIS

PARAMETRIC STATISTICAL ANALYSIS Most commonly used type of statistical analysis. This analysis is referred to as parametric statistical analysis because the findings are inferred to the parameters of a normally distributed populations. Numerical data (quantitative variables) that are normally distributed are analysed with parametric tests.

ASSUMPTIONS The assumption of normality which specifies that the means of the sample group are normally distributed The assumption of equal variance which specifies that the variances of the samples and of their corresponding population are equal. The data can be treated as random samples

NONPARAMETRIC STATISTICAL ANALYSIS Nonparametric statistical analysis or distribution free techniques It can be used in studies that do not meet the first two assumptions. Most nonparametric techniques are not as powerful as their parametric counter parts.

If the distribution of the sample is skewed towards one side or the distribution is unknown due to the small sample size, non-parametric statistical techniques are used. Non-parametric tests are used to analyse ordinal and categorical data.

EXPLORATORY DATA ANALYSIS AND CONFIRMATORY DATA ANALYSIS John Tukey Exploratory data analysis to obtain a preliminary indication of the nature of the data and to search data for hidden structure or models. Confirmatory data analysis involves traditional inferential statistics , which you can use to make an inference about a population or a process based on evidence from the study sample.

STATISTICAL ANALYSIS DECISION MAKING Two group comparison   Mean Parametric Independent 2 sample t test Nonparametric Mann Witney U test   Percentage Chi-Square Test One group comparison Mean Single mean One sample t test Mean difference Parametric Paired t test Non parametric Wilcoxan Signed Scale test  More than 2 group comparison Mean Parametric ANOVA Non parametric Kruskal Walli’s test Percentage Chi square test 

PARAMETRIC STATISTICAL ANALYSIS Student's t-test Z test Analysis of variance (ANOVA)

Student's  t -test Developed by Prof.W.S.Gossett Student's  t -test is used to test the null hypothesis that there is no difference between the means of the two groups One-sample  t -test Independent Two Sample T Test (the unpaired  t -test) The paired  t -test

One-sample  t -test To test if a sample mean (as an estimate of a population mean) differs significantly from a given population mean. The mean of one sample is compared with population mean where  = sample mean,  u  = population mean and S = standard deviation, n = sample size  

Example A random sample of size 20 from a normal population gives a sample mean of 40, standard deviation of 6. Test the hypothesis is population mean is 44. Check whether there is any difference between mean. H : There is no significant difference between sample mean and population mean H 1 : There is no significant difference between sample mean and population mean mean = 40 , , n = 20 and S = 6  

t calculated = 2.981 t table value = 2.093 t calculated > t table value ; Reject H 0.

Independent Two Sample T Test (the unpaired  t -test) To test if the population means estimated  by two independent samples  differ significantly. Two different samples with same mean at initial point and compare mean at the end

t = Where 1 - 2   is the difference between the means of the two groups and S denotes the standard deviation.  

Example Mean Hb level of 5 male are 10, 11, 12.5, 10.5, 12 and 5 female are 10, 17.5, 14.2,15 and 14.1 . Test whether there is any significant difference between Hb values. H : There is no significant difference between Hb Level H 1 : There is no significant difference between Hb level. t =  

= 11.2 , 14.16 , = 1.075, = 7.293 t calculated = 2.287, t table = 2.306, t calculated > t table value ; reject H 0.   X 1 X 2 X 1 - X 2 - (X 1 - ) 2 (X 2 - ) 2 10 11 12.5 10.5 12 10 17.5 14.2 15 14.1 -1.2 - 0.2 1.3 -0.7 0.8 -4.16 3.34 0.04 0.84 -0.06 1.44 0.04 1.69 0.49 0.64 17.305 11.156 0.0016 0.706 0.0036 = 56 70.8     4.3 29.172 X 1 X 2 10 11 12.5 10.5 12 10 17.5 14.2 15 14.1 -1.2 - 0.2 1.3 -0.7 0.8 -4.16 3.34 0.04 0.84 -0.06 1.44 0.04 1.69 0.49 0.64 17.305 11.156 0.0016 0.706 0.0036 70.8     4.3 29.172

The paired  t -test To test if the population means estimated  by two dependent samples  differ significantly . A usual setting for paired  t -test is when measurements are made on the same subjects before and after a treatment. where   is the mean difference and S d denotes the standard deviation of the difference.  

Example Systolic BP of 5 patients before and after a drug therapy is Before 160, 150, 170, 130, 140 After 140, 110, 120, 140, 130 Test whether there is any significant difference between BP level. H : There is no significant difference between BP Level before and after drug H 1 : There is no significant difference between BP level before and after drug

= 22, S d = 23.875 t calculated = 2.060, t table = 2.567, t calculated < t table value ; Accept H 0.   Before After d d- (d- ) 2 160 150 170 130 140 140 110 120 140 130 20 40 50 -10 10 -2 18 28 -32 -12 4 324 784 1024 144     = 110   2280 Before After d 160 150 170 130 140 140 110 120 140 130 20 40 50 -10 10 -2 18 28 -32 -12 4 324 784 1024 144       2280

Z test Generally, z-tests are used when we have large sample sizes (n > 30), whereas t-tests are most helpful with a smaller sample size (n < 30). Both methods assume a normal distribution of the data, but the z-tests are most useful when the standard deviation is known. z = (x – μ) / (σ / √n)

ANALYSIS OF VARIANCE (ANOVA) R. A. Fischer. The Student's  t -test cannot be used for comparison of three or more groups. The purpose of ANOVA is to test if there is any significant difference between the means of two or more groups. The analysis of variance is the systematic algebraic procedure of decomposing the overall variation in the responses observed in an experiment into variation. Two variances – (a) between-group variability and (b) within-group variability that is variation existing between the samples and variations existing within the sample. The within- group variability (error variance) is the variation that cannot be accounted for in the study design. The between-group (or effect variance) is the result of treatment

A simplified formula for the  F  statistic is where  MST  is the mean squares between the groups and  MSE  is the mean squares within groups

NONPARAMETRIC STATISTICAL ANALYSIS CHI-SQUARE TEST THE WILCOXON'S SIGNED RANK TEST MANN-WHITNEY U TEST KRUSKAL-WALLIS TEST

CHI-SQUARE TEST Tests to analyse the categorical data The chi-square test is a widely used test in statistical decision making. The test is first used by Karl pearson in 1900. The Chi-square test compares the frequencies and tests whether the observed data differ significantly from that of the expected data.

CHI-SQUARE TEST It is calculated by the sum of the squared difference between observed (O) and the expected (E) data (or the deviation, d) divided by the expected data by the following formula:

Example Attack rates among vaccinated and not vaccinated against measles are given in the following table. Test the association between association between vaccination and attack of measles Groups   Attacked Not attacked Vaccinated Not vaccinated 10 26 90 74

H : There is no significant association between vaccination and attack of measles H 1 : There is significant association between vaccination and attack of measles

Chi square table value = 3.841 , chi square calculated value = 8.672 calculated > table value ; Reject H 0.   Oi E i Oi - E i (Oi - E i ) 2 (Oi - E i ) 2 / E i 10 90 26 74 18 82 18 82 -8 8 8 -8 64 64 64 64 3.556 0.780 3.556 0.780         8.672 Oi E i Oi - E i (Oi - E i ) 2 (Oi - E i ) 2 / E i 10 90 26 74 18 82 18 82 -8 8 8 -8 64 64 64 64 3.556 0.780 3.556 0.780        

THE WILCOXON'S SIGNED RANK TEST Wilcoxon's rank sum test ranks all data points in order, calculates the rank sum of each sample and compares the difference in the rank sums. For testing whether the differences observed in the values of the quantitative variable between two correlated samples (before and after design ) are statistically different or not This test corresponds to the paired t test.

Method H : There is no difference in the paired values, on an average, between the two groups. H 1: There is difference in the paired values, on an average, between the two groups. Compute the difference between each group of paired values in the two group. Rank the difference from smallest, without considering the sign of difference. After giving ranks, the corresponding sign should be attached. T+ (Sum of ranks of positive sign) and T- (Sum of ranks between negative sign). T is taken as smallest of T+ and T-. Then Wstat is the smallest value of T- and T+ . Find the W critical value from Wilcoxon’s Signed rank Table . if Wstat < W Critical Value ; Reject H0.

EXAMPLE IQ values of 8 malnourished children of 4 years age before and after giving some nutritious diet for 3 months are given below Before 40 60 55 65 43 70 80 60 After 50 80 50 70 40 60 90 85

H : There is no difference in the paired values H 1: There is difference in the paired values Before 40 60 55 65 43 70 80 60 After 50 80 50 70 40 60 90 85 Difference -10 -20 5 -5 3 10 -10 -15 Absolute difference 10 20 5 5 3 10 10 15 Rank 5 8 2.5 2.5 1 5 5 7

T+ = 8.5, T- = 27.5. T = 8.5 Wstat = 8.5, Wcritic = 3 Wstat > W Critical Value ; Accept H0.

If Assuming normal distribution for the differences, test statistic is, Z = {|T-m| -0.5} / SD Where T = smaller of T+ and T- , m= mean sum of ranks {n(n+1)}/4 and SD = If Z is less than 1.96, H0 is accepted and if Z>1.96 , H0 is rejected  

MANN-WHITNEY U TEST For testing whether two independent samples with respect to a quantitative variable come from the same population or not. Wilcoxon’s Rank Sum test. It is used to test the null hypothesis that two samples have the same median or, alternatively, whether observations in one sample tend to be larger than observations in the other. This test is alternative of t test for two independent samples

METHOD H : The average values in the two groups are the same H 1: The average values in the two groups are the different Let n 1 is the sample size of one group and n 2 is the sample size of second group, Rank all the values in the two groups take together. Tied values should be given same ranks. The ranksum of each group is taken and U stat is calculated using U stat = Rank Sum - {n(n +1)/2 }. Both U 1 and U 2 is calculated and smaller value is taken as U stat . and U critical value is calculated from the Mann- Whitney U test table if U stat < U Critical value ; ; Reject H0.

Example Treatment A Treatment B 3 4 2 6 2 5 9 7 5 10 6 8

H : The average values in the 2 treatment are the same H 1: The average values in the 2 treatment are the different U stat = Rank Sum - {n(n +1)/2 }. Ranks 1 2 3 4 5 6 7 8 9 10 11 12 Values 2 2 3 4 5 5 6 6 7 8 9 10 Rank 1.5 1.5 3 4 5.5 5.5 7.5 7.5 9 10 11 12

U A = 23 – 21 = 2, U B = 55- 21 =34 so U stat = 2 (lowest value) U critic = 5 Ustat < U Critical value ; Reject H .

Assuming that the ranks are randomly distributed in the two groups, the test statisticis Z = {|m-T| -0.5} / SD Where T = smaller of T 1 and T 2. T 1 = sum of the ranks of smaller group, T 2 = {(n 1 +n 2 )(n 1 +n 2 +1 ) / 2} – T 1 , m= mean sum of ranks { n 1 ( n 1 +n 2 +1)}/2 SD = If Z is less than 1.96, H0 is accepted if Z>1.96 , H0 is rejected at 5% level of significance  

KRUSKAL-WALLIS TEST The Kruskal–Wallis test is a non-parametric test to analyse the variance. It is for the comparison among several independent samples. For testing whether several independent samples of a quantitative variable come from the same population or not It corresponds to one way analysis of variance in parametric methods.

It analyses if there is any difference in the median values of three or more independent samples. The data values are ranked in an increasing order, and the rank sums calculated followed by calculation of the test Where n is the total of sample sizes in all the groups and Ri is the sum of the ranks in the i th group.

Method H : The average values in the different groups are the same H 1: The average values in the different groups are the different Rank the all values taking all the group together. The chisquare table is used to get table value at 5% level of significane if Hstat is < H table value ; reject H

Example Sample 1 Sample 2 Sample 3 8 10 9 12 11 13 10 9 13 14 9 16 13 8 9 13 17 15

H : The average values in the three groups are the same H 1: The average values in the three groups are the different Rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Values 8 8 9 9 9 9 10 10 11 12 13 13 13 13 14 15 16 17 Tied rank 1.5 1.5 4.75 4.75 4.75 4.75 7.5 7.5 9 10 12.5 12.5 12.5 12.5 15 16 17 18

H = {12/18x19 [ (45.25 2 /6 ) + (61.5 2 /6) + (65.5 2 /6 )]} – 3x19 H calculated = 56.99 , table value = 5.99 H stat > table value ; Reject H   Sample 1 Rank 1 Sample 2 Rank 2 Sample 3 Rank 3 8 10 9 12 11 13 1.5 7.5 4.75 10 9 12.5 10 9 13 14 9 16 7.5 4.75 12.5 15 4.75 17 13 8 9 13 17 15 12.5 1.5 4.75 12.5 18 16   = 45.25   = 61.5   = 65.25 Sample 1 Rank 1 Sample 2 Rank 2 Sample 3 Rank 3 8 10 9 12 11 13 1.5 7.5 4.75 10 9 12.5 10 9 13 14 9 16 7.5 4.75 12.5 15 4.75 17 13 8 9 13 17 15 12.5 1.5 4.75 12.5 18 16      

WAYS TO RULE OUT ALTERNATIVE EXPLANATIONS FOR OUTCOMES BY USING STATISTICAL ANALYSIS Testing null hypothesis Determining the probability of type I and type II error Calculating and reporting tests of effect size Ensuring data meet the fundamental assumptions of the statistical test

Testing null hypothesis When attempting to determine if an outcome is related to a cause, it is necessary to know if the outcomes or results could have occurred by chance alone. This cannot be done with certainity , but researchers can determine the probability that the hypothesis is true. Accepting a null hypothesis is a statement that there are no differences in the outcomes based on the intervention or observation(that is, there is no cause and effect relationship). Using a null hypothesis enables the researcher to quantify and report the probability that the outcome was due to random error.

Determining the probability of type I and type II error Before accepting the results as evidence for practice, however the probability that an error was made should be evaluated. This coupled with the results of the hypothesis test, enables the researcher to quantify the role of error in the outcome. The relationship between Type I and Type II error is paradoxical – as one is controlled, the risk of other increases. Both types of error should be avoided

Calculating and reporting tests of effect size Effect size refers to how much impact the intervention or variable is expected to have on the outcome. Large effect sizes enhance the confidence of the findings. When a treatment exerts a dramatic effect, then the validity of the findings is not so called into question. On the other hand, when effect sizes are very small, then the potential for effects from extraneous variables is more likely and the results may have less validity

Ensuring data meet the fundamental assumptions of the statistical test Data analysis is based on many assumptions about the nature of the data, the statistical procedures that are used to conduct the analysis and the match between the data and the procedure If assumption is violated, the result can be an inaccurate estimate of the real relationship. In accurate conclusions lead to an error, which in turn affects the validity of a study.

RESOURCES FOR STATISTICAL ANALYSIS PROGRAM Packaged computer programs can perform the data analysis and provide with the results of analysis on a computer printout. SPSS, SAS and Biomedical Data Processing (BMDP) If the analysis selected are inappropriate for the data, the computer program is often unable to detect that error and proceed to perform the analysis

STATISTICAL ANALYSIS SYSTEM Comprehensive software developed by North Carolina University. This software is divided into many modules and its licensing is flexible, based upon the need for functions. This system contains a very large variety of statistical methods and is the software of choice of many major businesses, including the entire pharmaceutical industry. SAS has also developed a PC SAS, which is compatible with the personal computer and has a user-friendly windows interface.

PITFALLS OF STATISTICAL ANALYSIS Statistics can be used, intentionally or unintentionally, to reach faulty conclusions. Misleading information is unfortunately the norm in advertising. The drug companies, for example, are well known to indulge in misleading information. Data dredging Survey questions It is therefore important that to understand not just the numbers but the meaning behind the numbers. Statistics is a tool, not a substitute for in-depth reasoning and analysis

APPLICATION OF STATISTICAL ANALYSIS IN NURSING FIELD To analyze a trend in the vital statistics of a particular patient.  Research in nursing processes and procedures A statistical analysis of patient outcomes Trends in nursing

JOURNAL ABSTRACT Use of Statistical Analysis in The New England Journal of Medicine A sorting of the statistical methods used by authors of the 760 research and review articles in Volumes 298 to 301 of The New England Journal of Medicine indicates that a reader who is conversant with descriptive statistics (percentages, means, and standard deviations) has statistical access to 58 per cent of the articles. Understanding t-tests increases this access to 67 per cent. The addition of contingency tables gives statistical access to 73 per cent of the articles. Familiarity with each additional statistical method gradually increases the percentage of accessible articles. Original Articles use statistical techniques more extensively than other articles in the Journal.

Statistical analysis and design in marketing journal articles The use of statistical analysis in 922 articles from the 1980 through 1985 issues of the Journal of The Academy of Marketing Science (JAMS), the Journal of Marketing (JM), the Journal of Marketing Research (JMR), and the Journal of Consumer Research (JCR) was analyzed . A reader with no statistical background can understand 31, 56, 9, and 21 percent of the articles respectively in these four journals. Knowledge of regression and analysis of variance is important in comprehending many of the articles. 38 percent of the JAMS articles and 25, 57 and 56 percent, respectively, of the other three journals make use of these statistical techniques.

ASSIGNMENT Mean and Standard deviation of weight (Kg) of 100 School going(A) and 100 children not going to school(B) of 5 years of age in slum areas are given below Which test is used to find the statistical significance? Population Sample size Mean SD A 100 17.4 3 B 100 13.2 2.5

REFERENCES Indrayan A. Basic methods of medical research. NewDelhi : AITBS Publishers; 2006. Kader P . Nursing Research: Principles, process and issues. Second edition. Newyork : Palgrave Macmillan; 2006. Sundaram RK, Dwivedi SN, Sreenivas V. Medical Statistics : Principles and methods. Second edition. New Delhi: Wolter Kluwer publication; 2015 Rao SSSP. Biostatistics. Third edition. New Delhi: Prentice Hall India Pvt Ltd;2004