Business Statistics for Managers with SPSS[1].pptx

profgnagarajan 43 views 90 slides Sep 21, 2024
Slide 1
Slide 1 of 90
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80
Slide 81
81
Slide 82
82
Slide 83
83
Slide 84
84
Slide 85
85
Slide 86
86
Slide 87
87
Slide 88
88
Slide 89
89
Slide 90
90

About This Presentation

BSM with SPSS


Slide Content

Business Statistics for Managers Dr. D. Pradeep Kumar

Statistics for Business and Economics Chapter 1 Statistics, Data, & Statistical Thinking

Contents The Science of Statistics Types of Statistical Applications in Business Fundamental Elements of Statistics Processes Types of Data Collecting Data The Role of Statistics in Managerial Decision Making

Learning Objectives Introduce the field of statistics Demonstrate how statistics applies to business Establish the link between statistics and data Identify the different types of data and data-collection methods Differentiate between population and sample data Differentiate between descriptive and inferential statistics

1.1 The Science of Statistics

What Is Statistics? Why? Collecting Data e.g., Survey Presenting Data e.g., Charts & Tables Characterizing Data e.g., Average Data Analysis Decision- Making © 1984-1994 T/Maker Co. © 1984-1994 T/Maker Co.

What Is Statistics? Statistics is the science of data. It involves collecting, classifying, summarizing, organizing, analyzing, and interpreting numerical information.

1.2 Types of Statistical Applications in Business

Application Areas Economics Forecasting Demographics Sports Individual & Team Performance Engineering Construction Materials Business Consumer Preferences Financial Trends

Statistics: Two Processes Describing sets of data and Drawing conclusions (making estimates, decisions, predictions, etc. about sets of data based on sampling)

Statistical Methods Statistical Methods Descriptive Statistics Inferential Statistics

Descriptive Statistics Involves Collecting Data Presenting Data Characterizing Data Purpose Describe Data  X = 30.5 S 2 = 113 25 50 Q1 Q2 Q3 Q4 $

Involves Estimation Hypothesis Testing Purpose Make decisions about population characteristics Inferential Statistics Population?

1.3 Fundamental Elements of Statistics

Fundamental Elements Experimental unit Object upon which we collect data Population All items of interest Variable Characteristic of an individual experimental unit Sample Subset of the units of a population P in P opulation & P arameter S in S ample & S tatistic

Fundamental Elements Statistical Inference Estimate or prediction or generalization about a population based on information contained in a sample Measure of Reliability Statement (usually qualified) about the degree of uncertainty associated with a statistical inference

Four Elements of Descriptive Statistical Problems The population or sample of interest One or more variables (characteristics of the population or sample units) that are to be investigated Tables, graphs, or numerical summary tools Identification of patterns in the data

Five Elements of Inferential Statistical Problems The population of interest One or more variables (characteristics of the population units) that are to be investigated The sample of population units The inference about the population based on information contained in the sample A measure of reliability for the inference

1.4 Processes

Process A process is a series of actions or operations that transforms inputs to outputs. A process produces or generates output over time.

Process A process whose operations or actions are unknown or unspecified is called a black box . Any set of output (object or numbers) produced by a process is called a sample .

1.5 Types of Data

Types of Data Quantitative data are measurements that are recorded on a naturally occurring numerical scale. Qualitative data are measurements that cannot be measured on a natural numerical scale; they can only be classified into one of a group of categories.

Types of Data Types of Data Quantitative Data Qualitative Data

Quantitative Data Measured on a numeric scale. Number of defective items in a lot. Salaries of CEOs of oil companies. Ages of employees at a company. 3 52 71 4 8 943 120 12 21

Qualitative Data Classified into categories. College major of each student in a class. Gender of each employee at a company. Method of payment (cash, check, credit card). $ Credit

1.6 Collecting Data

Obtaining Data Data from a published source Data from a designed experiment Data from a survey Data collected observationally

Obtaining Data Published source : book, journal, newspaper, Web site Designed experiment : researcher exerts strict control over units Survey : a group of people are surveyed and their responses are recorded Observation study : units are observed in natural setting and variables of interest are recorded

Samples A representative sample exhibits characteristics typical of those possessed by the population of interest. A random sample of n experimental units is a sample selected from the population in such a way that every different sample of size n has an equal chance of selection.

Random Sample Every sample of size n has an equal chance of selection.

1.7 The Role of Statistics in Managerial Decision Making

Statistical Thinking Statistical thinking involves applying rational thought and the science of statistics to critically assess data and inferences. Fundamental to the thought process is that variation exists in populations and process data. A random sample of n experimental units is a sample selected from the population in such a way that every different sample of size n has an equal chance of selection.

Nonrandom Sample Errors Selection bias results when a subset of the experimental units in the population is excluded so that these units have no chance of being selected for the sample. Nonresponse bias results when the researchers conducting a survey or study are unable to obtain data on all experimental units selected for the sample. Measurement error refers to inaccuracies in the values of the data recorded. In surveys, the error may be due to ambiguous or leading questions and the interviewer’s effect on the respondent.

Real-World Problem

Statistical Computer Packages Typical Software SPSS MINITAB Excel Need Statistical Understanding Assumptions Limitations

Key Ideas Types of Statistical Applications Descriptive 1. Identify population and sample (collection of experimental units ) 2. Identify variable(s) 3. Collect data 4. Describe data

Key Ideas Types of Statistical Applications Inferential 1. Identify population (collection of all experimental units ) 2. Identify variable(s) 3. Collect sample data ( subset of population) 4. Inference about population based on sample 5. Measure of reliability for inference

Key Ideas Types of Data 1. Quantitative (numerical in nature) 2. Qualitative (categorical in nature)

Key Ideas Data-Collection Methods 1. Observational 2. Published source 3. Survey 4. Designed experiment

Key Ideas Problems with Nonrandom Samples 1. Selection bias 2. Nonresponse bias 3. Measurement error

The mean, median, and mode are measures of central tendency that are used to identify the core position of a data set. They are applied in different situations depending on the type of data and the level of measurement:    Nominal data: The mode is the only appropriate measure of central tendency to use. The mode is the most frequent value in the data set.    Ordinal data: The median or mode is usually the best choice. The median is the value in the middle of the data set.    Interval or ratio data: The mean, median, and mode can all be used. The mean is the average value.    Skewed distribution: The median is often the best measure of central tendency.    Symmetrical distribution for continuous data: The mean, median, and mode are all equal.    Data with extreme scores: The median is preferred because a single outlier can have a big effect on the mean.    Data with missing or undetermined values: The median is preferred.         The mean is the most commonly used measure of central tendency, but the best measure depends on the type of data.   

43 Measures of Central Tendency Greg C Elvers, Ph.D.

44 Measures of Central Tendency A measure of central tendency is a descriptive statistic that describes the average, or typical value of a set of scores There are three common measures of central tendency: the mode the median the mean

45 The Mode The mode is the score that occurs most frequently in a set of data

46 Bimodal Distributions When a distribution has two “modes,” it is called bimodal

47 Multimodal Distributions If a distribution has more than 2 “modes,” it is called multimodal

48 When To Use the Mode The mode is not a very useful measure of central tendency It is insensitive to large changes in the data set That is, two data sets that are very different from each other can have the same mode

49 When To Use the Mode The mode is primarily used with nominally scaled data It is the only measure of central tendency that is appropriate for nominally scaled data

50 The Median The median is simply another name for the 50 th percentile It is the score in the middle; half of the scores are larger than the median and half of the scores are smaller than the median

51 How To Calculate the Median Conceptually, it is easy to calculate the median There are many minor problems that can occur; it is best to let a computer do it Sort the data from highest to lowest Find the score in the middle middle = (N + 1) / 2 If N, the number of scores, is even the median is the average of the middle two scores

52 Median Example What is the median of the following scores: 10 8 14 15 7 3 3 8 12 10 9 Sort the scores: 15 14 12 10 10 9 8 8 7 3 3 Determine the middle score: middle = (N + 1) / 2 = (11 + 1) / 2 = 6 Middle score = median = 9

53 Median Example What is the median of the following scores: 24 18 19 42 16 12 Sort the scores: 42 24 19 18 16 12 Determine the middle score: middle = (N + 1) / 2 = (6 + 1) / 2 = 3.5 Median = average of 3 rd and 4 th scores: (19 + 18) / 2 = 18.5

Median Example for Discrete frequency x: 1 2 3 4 5 6 7 8 9 F: 8 10 11 16 20 25 15 9 6 x f CF 1 8 8 2 10 8+10=18 3 11 18+11=29 4 16 45 5 20 65 6 25 90 7 15 105 8 9 114 9 6 120 The median class is 65 N= Σ fi =120 N/2= 120/2=60 The CF just greater than (N/2=60) is 65 Median=5 Median for continuous frequency distribution Wages : 2000-3000 3000-4000 4000-5000 5000-6000 6000-7000 No.of workers : 3 5 20 10 5 wages no.of workers cf 2000-3000 3 3 3000-4000 5 8 4000-5000 20 28 5000-6000 10 38 6000-7000 5   43 N= Σ fi =43 N/2= 43/2=21.5 The CF just greater than ( N/2=21.5) is 28 The corresponding interval is 4000-5000 Median= L+h /2( N/2-c.f) L = limit of the median class f = frequency of Median class h =Magnitude of Median class CF = The cf of the class preceeding the median class Median= 4000+(1000/2)(21.5-8) 4000+500(13.5) 4675

55 When To Use the Median The median is often used when the distribution of scores is either positively or negatively skewed The few really large scores (positively skewed) or really small scores (negatively skewed) will not overly influence the median

56 The Mean The mean is: the arithmetic average of all the scores (  X)/N the number, m, that makes  (X - m) equal to 0 the number, m, that makes  (X - m) 2 a minimum The mean of a population is represented by the Greek letter  ; the mean of a sample is represented by X

57 Calculating the Mean Calculate the mean of the following data: 1 5 4 3 2 Sum the scores ( X) : 1 + 5 + 4 + 3 + 2 = 15 Divide the sum ( X = 15) by the number of scores (N = 5): 15 / 5 = 3 Mean = X = 3

Calculating the Mean for discrete data x= 1 2 3 4 5 6 7 Fi= 5 9 12 17 14 10 6 Mean=  Xifi / fi =299/73 =4.06 xi fi fi*xi 1 5 5 2 9 18 3 12 36 4 17 68 5 14 70 6 10 60 7 6 42

59 When To Use the Mean You should use the mean when the data are interval or ratio scaled Many people will use the mean with ordinally scaled data too and the data are not skewed The mean is preferred because it is sensitive to every score If you change one score in the data set, the mean will change

60 Relations Between the Measures of Central Tendency In symmetrical distributions, the median and mean are equal For normal distributions, mean = median = mode In positively skewed distributions, the mean is greater than the median In negatively skewed distributions, the mean is smaller than the median

61 Measures of Dispersion Greg C Elvers, Ph.D.

62 Definition Measures of dispersion are descriptive statistics that describe how similar a set of scores are to each other The more similar the scores are to each other, the lower the measure of dispersion will be The less similar the scores are to each other, the higher the measure of dispersion will be In general, the more spread out a distribution is, the larger the measure of dispersion will be

63 Measures of Dispersion Which of the distributions of scores has the larger dispersion? The upper distribution has more dispersion because the scores are more spread out That is, they are less similar to each other

64 Measures of Dispersion There are three main measures of dispersion: The range The semi-interquartile range (SIR) Variance / standard deviation

65 The Range The range is defined as the difference between the largest score in the set of data and the smallest score in the set of data, X L - X S What is the range of the following data: 4 8 1 6 6 2 9 3 6 9 The largest score (X L ) is 9; the smallest score (X S ) is 1; the range is X L - X S = 9 - 1 = 8

66 When To Use the Range The range is used when you have ordinal data or you are presenting your results to people with little or no knowledge of statistics The range is rarely used in scientific work as it is fairly insensitive It depends on only two scores in the set of data, X L and X S Two very different sets of data can have the same range: 1 1 1 1 9 vs 1 3 5 7 9

67 The Semi-Interquartile Range The semi-interquartile range (or SIR ) is defined as the difference of the first and third quartiles divided by two The first quartile is the 25 th percentile The third quartile is the 75 th percentile SIR = (Q 3 - Q 1 ) / 2

68 SIR Example What is the SIR for the data to the right? 25 % of the scores are below 5 5 is the first quartile 25 % of the scores are above 25 25 is the third quartile SIR = (Q 3 - Q 1 ) / 2 = (25 - 5) / 2 = 10

69 When To Use the SIR The SIR is often used with skewed data as it is insensitive to the extreme scores

70 Variance Variance is defined as the average of the square deviations:

71 What Does the Variance Formula Mean? First, it says to subtract the mean from each of the scores This difference is called a deviate or a deviation score The deviate tells us how far a given score is from the typical, or average, score Thus, the deviate is a measure of dispersion for a given score

72 What Does the Variance Formula Mean? Why can’t we simply take the average of the deviates? That is, why isn’t variance defined as: This is not the formula for variance!

73 What Does the Variance Formula Mean? One of the definitions of the mean was that it always made the sum of the scores minus the mean equal to 0 Thus, the average of the deviates must be 0 since the sum of the deviates must equal 0 To avoid this problem, statisticians square the deviate score prior to averaging them Squaring the deviate score makes all the squared scores positive

74 What Does the Variance Formula Mean? Variance is the mean of the squared deviation scores The larger the variance is, the more the scores deviate, on average, away from the mean The smaller the variance is, the less the scores deviate, on average, from the mean

75 Standard Deviation When the deviate scores are squared in variance, their unit of measure is squared as well E.g. If people’s weights are measured in pounds, then the variance of the weights would be expressed in pounds 2 (or squared pounds) Since squared units of measure are often awkward to deal with, the square root of variance is often used instead The standard deviation is the square root of variance

76 Standard Deviation Standard deviation = variance Variance = standard deviation 2

77 Computational Formula When calculating variance, it is often easier to use a computational formula which is algebraically equivalent to the definitional formula:  2 is the population variance, X is a score,  is the population mean, and N is the number of scores

78 Computational Formula Example

79 Computational Formula Example

80 Variance of a Sample Because the sample mean is not a perfect estimate of the population mean, the formula for the variance of a sample is slightly different from the formula for the variance of a population: s 2 is the sample variance, X is a score, X is the sample mean, and N is the number of scores

81 Measure of Skew Skew is a measure of symmetry in the distribution of scores Positive Skew Negative Skew Normal (skew = 0)

82 Measure of Skew The following formula can be used to determine skew:

83 Measure of Skew If s 3 < 0, then the distribution has a negative skew If s 3 > 0 then the distribution has a positive skew If s 3 = 0 then the distribution is symmetrical The more different s 3 is from 0, the greater the skew in the distribution

84 Kurtosis (Not Related to Halitosis) Kurtosis measures whether the scores are spread out more or less than they would be in a normal (Gaussian) distribution Mesokurtic (s 4 = 3) Leptokurtic (s 4 > 3) Platykurtic (s 4 < 3)

85 Kurtosis When the distribution is normally distributed, its kurtosis equals 3 and it is said to be mesokurtic When the distribution is less spread out than normal, its kurtosis is greater than 3 and it is said to be leptokurtic When the distribution is more spread out than normal, its kurtosis is less than 3 and it is said to be platykurtic

86 Measure of Kurtosis The measure of kurtosis is given by:

87 s 2 , s 3 , & s 4 Collectively, the variance (s 2 ), skew (s 3 ), and kurtosis (s 4 ) describe the shape of the distribution

Karl Pearson’s coefficient of skewness Bowley’s coefficient of skewness It is based on mean, mode and standard deviation. It is based on quartiles. It is the usual method of finding coefficient of skewness. It is usually used when difference between quartiles are given. Skewness = mean – mode Skewness = Q3 + Q1 – 2Median Coefficient of Skewness by Karl Pearson’s method = mean- mode / standard deviation Coefficient of Skewness by Bowley’s method = Q3 + Q1 – 2Median / Q3 - Q1 Tip Coefficient of Skewness by Karl Pearson’s method = mean- mode / standard deviation coefficient of Skewness by Bowley’s method = Q3 + Q1 – 2Median / Q3 - Q1 Explanation  Final Answer Karl Pearson’s method: It is the usual method of finding the coefficient of skewness. It is based on mean, mode and standard deviation. Coefficient of Skewness by Karl Pearson’s method = mean- mode / standard deviation.Bowley’s method: It is usually used when the difference between quartiles are given. It is based on quartiles. Coefficient of Skewness by Bowley’s method = Q3 + Q1 – 2Median / Q3 - Q1

Caluculate karlpearsons co-efficient for following data X: 20 30 40 50 60 70 f: 8 12 20 10 6 4 Skp =M-M0/ Mean=  Fixi /  Fi =2460/60=41 Mode=40 Standard deviation =13.7 skp =41-40/13.7=0.07 X Fi XiFi X 2 X 2 F 20 8 160 400 3200 30 12 360 900 10800 40 20 800 1600 32000 50 10 500 2500 25000 60 6 360 3600 21600 70 4 280 4900 19600       

Fixi /Fi=