univariate and bivariate analysis in spss

2,292 views 91 slides May 19, 2020
Slide 1
Slide 1 of 91
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80
Slide 81
81
Slide 82
82
Slide 83
83
Slide 84
84
Slide 85
85
Slide 86
86
Slide 87
87
Slide 88
88
Slide 89
89
Slide 90
90
Slide 91
91

About This Presentation

this slide will help to perform various tests in spss targeting univariate and bivariate analysis along with the way of entering and analyzing multiple responses.


Slide Content

Univariate and bivariate analysis in SPSS Subodh Khanal Asst. professor Pakihawa Campus Institute of Agriculture and Animal Science Email: [email protected]

Type of analysis we will be doing Univariate analysis: only one variable is taken for analysis Bivariate analysis: when two variables are used Multivariate analysis: when more than 2 variables are used.

What is descriptive statistics? Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample and the measures . Together with simple graphics analysis, they form the basis of virtually every quantitative analysis of data . Descriptive Statistics are used to present quantitative descriptions in a manageable form. 

Univariate Analysis/Descriptive Statistics Descriptive Statistics The Range Min/Max Average Median Mode Variance Standard Deviation Histograms and Normal Distributions Frequencies/percent

When to use Descriptives ? : open ended continuous variables (interval ,ratio) Percent/frequencies ( categorical variables) Go to analyse >>>>>go to descriptive analysis>>>>> choose either descriptive or percent

Univariate analysis Information Analysis Nominal Frequency, Percent Ordinal Frequency, Percent Interval Mean, Mode, Median, Range, Standard deviation Ratio Mean, Mode, Median, Range, Standard deviation For closed ended: use frequency, percent

Uni-variate analysis Go to analyze> descriptive statistics 1. descriptive( mean, std deviation, range) 2. frequency

Frequencies Click frequencies

Select the categorical variable and click this arrow If you want to see charts click charts and select suitable charts Click continue and click OK

Now see output, see only valid % because frequency and percent may have counted missing values also. Also very useful in data discrepancy check ( eg . eligibility, wild codes ) and missing system

Presentation I recommend you to use excel for making graphs. Label table above and figure below.

Descriptive (for numerical value) Click descriptive Select variables from LHS to RHS and click options Select options> click continue >click ok

See Z value of skeweness and Kurtosis (statistic/S.E.)

What is inferential statistics? Technique used to draw conclusion about a population by testing the data taken from the sample of a population. Includes testing hypothesis and deriving estimates. It focuses on making statements about population.

What is hypothesis testing? Hypothesis=assumption about population parameter (say population mean) In hypothesis testing, the 1 st step is to state the assumed or hypothesized value of population parameter. The assumption we want to test is null hypothesis (H0) also called as hypothesis of null difference.

The null hypothesis State the assumption to be tested. e.g. the average weight of vet intern students is 58 kg. Begin with assumption that the null hypothesis is true.

The alternate hypothesis (H1) Is opposite of null hypothesis. The average weight of students is not equal to 58 kg.

Procedure of hypothesis testing Step 1: Set up a hypothesis. Step 2: Set up a suitable significance level The confidence with which the researcher accepts or rejects null hypothesis. Denoted by alpha. In practice we use 5%, 1% and 0.1 level of significance.

e.g. when we take level of significance of 5% (alpha=0.05) There are 5 chances out of 100 that we would reject null hypothesis. In other words, out of 100, 95% chances are there that the null hypothesis will be accepted. We are 95% confident that we have make a right decision.

Step 3: determine suitable test statistic (t test, f test, chi square) Step 4: check critical value and check sample result. Step 5; make decision.

Type 1 and type 2 error : there are four possible results The hypothesis is true and our test accepts it. The hypothesis is false and our test rejects it. The hypothesis is true but our test rejects it. The hypothesis is false but our test accepts it. The last 2 possibilities are errors. The 3 rd is: type 1 error: rejecting true null hypothesis. The 4 th is : type 2 error: accepting false null hypothesis

One tailed and two tailed test

Bivariate analysis

When to use chi square test

You will be presented with the following  Crosstabs  dialogue box:

Kappa =degree of agreement Ranges -1 to 1 ≤ 0 as indicating no agreement and 0.01–0.20 as none to slight, 0.21–0.40 as fair, 0.41– 0.60 as moderate, 0.61–0.80 as substantial, and 0.81–1.00 as almost perfect agreement.

e.g. for using KAPPA Two policemen judging normal or suspicious behavior in CCtv footage Many situations in the healthcare industry rely on multiple people to collect research or clinical laboratory data. The question of consistency, or  agreement  among the individuals collecting data immediately arises due to the variability among human observers. Well-designed research studies must therefore include procedures that measure agreement among the various data collectors. 

Click column

The output will be seen as This table allows us to understand that both males and females prefer to learn using online materials versus books

When reading this table we are interested in the results of the " Pearson Chi-Square " row. We can see here that χ(1) = 0.487,  p  = .485. This tells us that there is no statistically significant association between Gender and Preferred Learning Medium; that is, both Males and Females equally prefer online learning versus books . HOW??????

Phi and Cramer's V are both tests of the strength of association. We can see that the strength of association between the variables is very weak . >0.25: very strong >0.15: strong >0.1: moderate >0.05: weak >0: no or very weak

HOW TO PRESENT

T-test The t-test assesses whether the means of two groups are  statistically  different from each other. This analysis is appropriate whenever you want to compare the means of two groups

Statistical Analysis of the t-test The formula for the t-test is a ratio. The top part of the ratio is just the difference between the two means or averages. The bottom part is a measure of the variability or dispersion of the scores. 

When both variable is continuous independent=gold standard value

Click define groups Put the designated code numbers. Click continue and click OK

Categorical variable has more than 2 options.

Click post hoc

Generally LSD is selected for less than 5 treatments and Duncans for more than 5 Click continue Click options Click descriptive Click continue and click OK

There is no significant difference

Significant difference was noted on weekly data of feed intake. (F=245.74, P<0.001).

Calculate lsd value See t distribution critical value table For f1 DF is 4 (see ANOVA within groups) The critical value for α =0.05 at df 4 is 2.776 Take square root of mse i.e of 5.6 i.e. s=2.366 Number of observation=number of replicates i.e =2 So LSD= txs ( √  2)/ √n =2.776x2.366x1.41/1.41=6.556

Statistical difference between 1,3 and 1,4. Feed intake in treatment 1 is significantly higher than 3 and 4 but at par with 2.

Treatment Weekly feed intake 1 mean±se a 2 mean±se a 3 mean±se b 4 mean±se c Grand mean mean±se CV Calculate manually Lsd 245.47*** Note: ***=p<0.001, different alphabets represents that the values are significantly different.

Previous was of LSD. This is of DUNCAN Least feed intake was seen in t3 (significantly lower than other treatments). As 1 and 2 are in same subset they are not significantly different 3=a, 4=b, 1 and 2=c

How to report?

How to make scatter diagram in excel Choose your two variables Select them Click insert In charts select scatter diagram You will see blue dots Right click on it Click add trend line Click display r square and equation on charts

F1 and w1 are selected Click insert In charts select scatter diagram and click the first one

You will obtain such graph. Right click on it Click add trendline

Click linear, display equation on chart, display R2 Each unit increase in X increases Y by 0.868 units , 91.5% variation in Y is explained by X.

How to enter multiple response? First of all for all responses, make dichotomous variables; if ticked (yes=1) and crossed ( no=0)

What are the potential 5 NUS you are growing? Make each response as dichotomies…. Treat each response as a variable i.e. Grow niguro Yes (1) no (0) Grow sisno Yes (1) no (0) Grow Allo Yes (1) no (0) Grow bamboo Yes (1) no (0) Grow koiralo Yes (1) no (0) Enter the values as shown in next slide

Import the data in SPSS or simply you can change it in SPSS also First run frequencies of those newly created variables

Choose Analyze→Multiple Response→Define Variable Sets.

In the Set Definition list, select each variable you want to include in your new multiple dataset, and then click the arrow to move the selections to the Variables in Set list. Select dichotomies Put the numeric value that you have used for yes e.g. 1 (in counted value) Name the variable and write its label Click add button Click close

Choose Analyze→Multiple Response→Frequencies . The new special variable should appear. Now click frequencies

Select $NUS (the new variable) from multiple response sets to table (s) for , click OK

See % of response and use for your future use

This may look confusing at first, but it’s really pretty easy. Ten people bought 24 pieces of fruit. Nine pieces of fruit were apples — 37.5% of the fruit. Nine out of ten people bought apples — 90% of the people. So , the difference is the denominator. What makes this table special is that what you usually care about is the people with multiple responses.