INTRODUCTION TO DATA ANALYSIS USING SPSS

drmasriabdullasi1 0 views 38 slides Sep 29, 2025
Slide 1
Slide 1 of 38
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38

About This Presentation

Data analysis is the process of examining, cleaning, and interpreting data to identify patterns, test hypotheses, and draw conclusions. SPSS, or the Statistical Package for the Social Sciences, is one of the most widely used software tools for this purpose. It provides a user-friendly interface and ...


Slide Content

INTRODUCTION TO DATA ANALYSIS USING SPSS Dr. Masri bin Abdul Lasi

T E N T A T I VE P R O G RAM Statistic O v e r vi e w Data Management & Transformation Reliability & Validity Chi-Square test T-test Anova Correlation Multiple R eg r e ss i on Analysis Regression

Data Management & Transformation Ensure response rate E x a m ine V er i f y Explo r e O r g ani z e Data entry

The Right Technique in Data Analysis? What is the purpose of the analysis? - Descriptive, compare group, relationship What is the level of measurement? - Parametric and Non-parametric How many variables are involved? - Univariate, bivariate, multivariate What kind of tests? Descriptive or Inferential. If inferential set the significance level

INFERENTIAL Test of Differences Purpose: To evaluate the differences between 2 or more groups with respect to a variable of interest Techniques depended on: Levels of measurement of the variable Types of data (parametric or non-parametric) Number of groups One or more than two groups Independence of the groups If more than two groups Independence or related groups

One or more than 2 groups On e More than two groups I ndependen t or related Independen t R e l ate d Nominal: Frequency Χ 2 t e st K-S Runs Binomial Ordinal: Mann-Whitney Continuous: t -test z -test O ne - w a y A N O V A Nominal: Χ 2 t e st Mc N e m a r Ordinal: W il c o x o n Sign Rank Continuous: Paired t -test Nominal: Χ 2 t e st Ordinal: Mann-Whitney M ed i a n K-S Kruskal-Wallis ANOVA Continuous: 1 - w a y A N O V A I ndependen t or related Independent R e l ate d Nominal: Χ 2 t e st Ordinal: Sign Wilcoxon Mc N e m a r Friedman 2 - w a y A N O V A Continuous: Paired t -test Factorial 2 - w a y A N O V A Test the Difference

Relationship Purpose: To establish relationship between variables Technique depended on: Whether or not exist dependent variable(s) Number of dependent and independent variables Types of data (parametric or non-parametric) Levels of measurement of the variable

Relationship Tests One Scale of DV P arametr ic Scale of IV P arametr ic Multiple R eg r e ss i o n Analysis (MRA) N on-param etr ic M R A w / dummy Log li near S c a l e of IV N on-parametr ic Conjoint ana l ys i s N on-parametr ic S c a l e of IV P arametr ic Discriminant Analysis Logit & probit Scale of DV More than 1 S c a l e of IV P arametr ic P arametr ic N on-parametr ic Canonical Correlation LISREL M u l t i v a r i a t e ANOVA Any DV? Y es No V a r i ab l e Cluster analysis Factor Analysis MDS Factor Analysis Latent Structure MDS Cluster ana l ys i s P arametr ic N on-parametr ic How many DV?

Chi-square Test Non- parametric m easu r e m e n t Test of di f f e r e n ce s Two or more nominal variables Steps: go to descriptive go to analyze go to crosstab enter variables into row and column go to statistics tick chi-square

enter grouping enter variable go to ind e p e nd e n t - sample t-test go to c o mpa r e means go to analyze (Only two variables a n a l y z e each time) Compa r e means

Analysis of Variance (ANOVA) go to analyze go to compare means go to one- wa y- AN O V A enter variable enter factor

Correlation Analysis go to analyze go to correlate go to bivariate enter variable

Multiple Regression Analysis (One Level) go to analyze go to regression go to linear enter DV enter IV Method: enter Statistic: tick: model fit R square change Descriptive C ollineari ty diagnostic Durbin Watson Plot Scatterplot

Chi-square test is used to determine whether there is a significant association between two categorical variables. The test is useful in a wide range of fields, including: Medical Research: In medical research, chi-square test is used to determine the relationship between risk factors and diseases. For example, a study might use the test to determine whether smoking is associated with an increased risk of lung cancer. Social Sciences: In social sciences, chi- square test is used to examine the relationship between demographic variables, such as gender, age, and education level, and various social phenomena, such as political opinions or attitudes towards certain issues. Marketing Research: In marketing research, chi-square test is used to determine whether there is an association between consumer preferences and demographic factors, such as age, income, and geographic location. Quality Control: In quality control, chi- square test is used to determine whether the distribution of defects or errors in a manufacturing process is consistent with what is expected. Genetics: In genetics, chi-square test is used to determine whether there is an association between different genes and inherited traits. Chi-square Test

Chi-square result H 1 : There are significant different in distribution of business forms in the three states in northern Malaysia STATE * FORM Crosstabulation Count FORM Proprietor Partnership Pte Ltd Ltd Total STATE Perlis 12 5 28 1 46 Kedah 51 16 31 1 99 Chi-Square Te P st e s nang Total 20 83 16 24 V a l u e 37 5 65 A s y 7 m p . S i g . ( 2 2 - s 1 i ded ) 83 df Pearson Chi-Square 22.675 6 .001 Likelihood Ratio 21.745 6 .001 Linear-by-Linear Association .275 1 .600 N of Valid Cases 210 a 3 cells (25.0%) have expected count less than 5. The minimum expected count is 1.53. Interpretation: (  2 = 22.67, p < .01), there is a significant different in distribution of business forms between the three states in northern Malaysia. Therefore, H 1 is accepted.

T - T e s t T-test is a statistical test used to compare the means of two groups and determine whether they are statistically different from each other. The test is useful in a wide range of fields, including: Medical Research: In medical research, t-test is used to compare the means of two groups of patients, such as those receiving different treatments, to determine whether there is a significant difference in their outcomes. Psychology: In psychology, t-test is used to compare the means of two groups of participants, such as those receiving a placebo versus an active intervention, to determine whether the intervention has a significant effect. Education: In education, t-test is used to compare the means of two groups of students, such as those receiving different teaching methods or interventions, to determine whether there is a significant difference in their academic performance. Finance: In finance, t-test is used to compare the means of two groups of investments, such as those with different rates of return, to determine whether there is a significant difference in their performance. Quality Control: In quality control, t-test is used to compare the means of two groups of products or processes, such as those with different manufacturing methods or materials, to determine whether there is a significant difference in their quality. If a p-value reported from a t test is less than 0.05 , then that result is said to be statistically significant. If a p-value is greater than 0.05, then the result is insignificant.

Group Statistics STATE N Mean SD S. E. of Mean AUTO Perlis 46 1.3696 .48802 .07195 Kedah 99 1.5051 .50252 .05051 INNO Perlis 46 1.5435 .50361 .07425 Kedah 99 1.7677 .42446 .04266 Independent Samples Test t-test for Equality of Means t df Sig. (2-tailed) Mean Diff. S. E. Diff. AUTO Equal variances assumed -1.525 143 .130 -.1355 .08886 Equal variances not assumed -1.541 90.208 .127 -.1355 .08791 INNO Equal variances assumed -2.787 143 .006 -.2242 .08045 Equal variances not assumed -2.618 75.817 .011 -.2242 .08564 t – test H 1 : Bumiputera SMEs in Perlis and Kedah show significant different in autonomy and innovative orientation Interpretation: If significant level set at p < .05, then there is statistical significant different between autonomy and innovative orientation among Bumi SMEs in Perlis and Kedah. Therefore, H 1 partially accepted.

AN O V A ANOVA (Analysis of Variance) is a statistical test used to determine whether there are significant differences between the means of three or more groups. ANOVA is used to analyze the variance between groups and within groups to determine whether the differences in means are due to chance or if they are statistically significant. The value of ANOVA is the F-statistic, which is used to test whether there is a significant difference between the means of three or more groups. The F-statistic is a ratio of the variance between groups to the variance within groups. In ANOVA, the null hypothesis is that there is no significant difference between the means of the groups, and the alternative hypothesis is that there is a significant difference between at least one pair of means. The F-statistic is calculated by dividing the variance between groups by the variance within groups. If the F-statistic is large, then the variance between groups is large relative to the variance within groups, which suggests that there may be a significant difference between the means of the groups. To determine whether the F-statistic is statistically significant, we need to calculate the p-value, which is the probability of obtaining a F- statistic as extreme as the one calculated, assuming that the null hypothesis is true. If the p-value is less than the significance level (typically 0.05), we reject the null hypothesis and conclude that there is a significant difference between the means of the groups.

AN O V A H 1 : There is significant different in autonomy and innovative orientation among Bumiputera SMEs in the three states in northern Malaysia S u m o f S quare s df Mean Square F Si g . AUTO Between Groups .749 2 .375 1.519 .221 Within Groups 51.065 207 .247 Total 51.814 209 INNO Between Groups 4.393 2 2.196 10.074 .000 Within Groups 45.131 207 .218 Total 49.524 209 Interpretation: ( F = 10.074, p < .01). If significant level is set at p < .05, then there is statistical significant different in innovative but not statistical significant in autonomy orientation among Bumi SMEs in the three states in Northern Malaysia. Therefore, H 1 is partially accepted.

C O RR EL A T I O N 1 Correlation is a statistical measure that describes the relationship between two variables. Correlation indicates the extent to which two variables are associated or related to each other. 2 In other words, correlation measures the degree to which changes in one variable are associated with changes in another variable. 3 Correlation can be positive, negative, or zero. Positive correlation means that when one variable increases, the other variable also increases. 4 Negative correlation means that when one variable increases, the other variable decreases. Zero correlation means that there is no relationship between the two variables. 5 Correlation is typically measured using a correlation coefficient, which is a numerical value that ranges from -1 to 1. The correlation coefficient provides information about the strength and direction of the relationship between two variables. A correlation coefficient of 1 indicates a perfect positive correlation, a correlation coefficient of -1 indicates a perfect negative correlation, and a correlation coefficient of 0 indicates no correlation.

Correlation H 1 : Autonomy and innovative orientation among Bumiputera SMEs in northern Malaysia are related significantly Correlations Autonomy Pearson Correlation Autonomy 1 Innovative .072 Sig. (2-tailed) . .297 N 210 210 Innovative Pearson Correlation .072 1 Sig. (2-tailed) .297 . N 21 210 Interpretation: ( r = .072, p < .297) if significant level is set at p < .05, then there is no statistical significant correlation between autonomy and innovativeness. Therefore, H 1 rejected.

Interpretation of Findings Statistics A utonom y 210 Inno v at ive 210 N V a lid Missing Mean 3.0444 5.6603 Median 3.0000 6.0000 Mode 2.00 6.00 Std. Deviation 1.21533 .97658 Skewness .458 -.908 Std. Error of Skewness .168 .168 Kurtosis -.225 1.057 Std. Error of Kurtosis .334 .334 Minimum 1.00 2.00 Maximum 7.00 7.00 a Multiple modes exist. The smallest value is shown S T A T E F requen cy % Valid % Cumulative % Valid Perlis 46 21.9 21.9 21.9 Kedah 99 47.1 47.1 69.0 Penang 65 31.0 31.0 100.0 Total 210 100.0 100.0

Multiple regression analysis Multiple regression analysis is a statistical method used to examine the relationship between a dependent variable and two or more independent variables. The aim of multiple regression analysis is to create a model that can predict the value of the dependent variable based on the values of the independent variables. In multiple regression analysis, the dependent variable is the variable that is being predicted, while the independent variables are the variables that are used to predict the dependent variable. The multiple regression model is a linear equation that includes the values of the independent variables as predictors and coefficients that determine the strength of the relationship between the independent variables and the dependent variable. Multiple regression analysis involves several steps, including: Data preparation: Collect data for the dependent variable and the independent variables, and prepare the data for analysis. Model selection: Determine which independent variables to include in the model based on their relationship with the dependent variable and their significance. Model fitting: Use statistical methods to estimate the coefficients of the independent variables in the model. Model evaluation: Evaluate the goodness of fit of the model using statistical measures such as R-squared, adjusted R-squared, and the F-test. Interpretation: Interpret the coefficients of the independent variables in the model and use the model to make predictions about the dependent variable.

R- Squared Value What qualifies as a “good” R-Squared value will depend on the context. In some fields, such as the social sciences, even a relatively low R-Squared such as 0.5 could be considered relatively strong. In other fields, the standards for a good R-Squared reading can be much higher, such as 0.9 or above .

Multiple Regression Analysis H 1 : Autonomy and innovative orientation are positively related to performance among Bumiputera SMEs in northern Malaysia Model Summary R Model 1 .171 R Square Adj R Square S. E. E. Change Statistics R Sq Chg F Change df 1 df 2 Sig. F Chg .2 9 .1 1.5286 5 .02 6 2.730 2 205 .00 8 a Predictors: (Constant), Autonomy Dimension, Innovativeness Dimension Coefficients Si g . Ustd Coef. B S t d C oef . t Beta Coll Stat T o l eran ce V I F Model S. E. 1 (Constant) 4.053 .752 5.390 .000 Innovative .112 .113 .071 1.989 .024 .911 1.098 Autonomy .190 .090 .151 2.126 .005 .944 1.059 a Dependent Variable: Performance

MULTIPLE REGRESSION A N A L Y SIS… C ON T . Consider Some Multiple Regression Assumptions: Normality – Verify Skewness < 2.0 or histogram Linearity – Verify p- p plot of std. regress residuals Homocedasticity – Verify scatterplot of residuals Free from error term – Durbin Watson between 1.5 – 2.5 Free from multicollinearity – Correlation < .70, - Verify VIF < 10, Tol < .10 6. Get rid of all outliers

Normal P-P Plot of Reg Std Resid Dependent Variable: ROS 1 . 00 . 7 5 0.00 0.00 .25 .50 Observed Cum Prob Expected Cum Prob 1 . 00 . 7 5 . 5 . 2 5 Scatterplot Dependent Variable: ROS Regression Standardized Predicted Value 3 2 1 - 3 - 2 - 1 0 3 2 1 -1 -2 Regression Standardized Residual 1 . 1 . 50 1 . 25 . 7 5 -1.75 -1.25 -.75 -.25 .25 - 1 . 5 - 1 . - . 5 . . 5 Histogram Dependent Variable: ROS Frequency 14 12 10 8 6 4 2 S t d . D e v = . 8 3 M ea n = . 0 N = 107 .

R EG R ESSIO N A N A L YSIS …C O N T . Interpretation: A 3-stage analysis Coefficient of determination showed by R 2 and adjusted R 2 . ( R 2 = .29 , adj R 2 = .10 , p < .01). This means that 29% variation in performance can explained by variation in overall IV (autonomy and innovative as one IV). 10% of the variation in performance is explained by variation in multiple IV (autonomy and innovative). p < .01 means the relationship between IV and DV is represented by the sample more than 99% and only less than 1% can be explained by chance. Then go to coefficient table. Beta coefficient showed in coefficient table. Unstandardized Coeff.beta is the actual beta (slope value of regression curve) can be more than 1. Standardized Beta is beta value between to 1). Innovative B = .112, p < .05, and autonomy B = .190, p < .01. Autonomy and innovative is positively related to performance. Therefore, H 1 is accepted.

THANK YOU AND ALL THE BEST!
Tags